Python yahoo finance sp500

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Python utility that scrapes data of stocks belonging to the S&P 500 index from Yahoo Finance

License

massiccio/yahoo-finance-scraper

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Python 2.7 utility that scrapes data of stocks belonging to the S&P 500 index from Yahoo Finance.

Tools such as Finviz let you create screeners to identify certain stocks, e.g., potentially undervalued ones. Unfortunately those tools do not let you download data. With that in mind I have written this program that downloads the following data from Yahoo Finance:

  1. Ticker
  2. Current Price
  3. 5-year PEG ratio
  4. Trailing P/E ratio
  5. Forward P/E ratio
  6. Beta
  7. Enterprise Value/EBITDA
  8. 52-week Change
  9. 52-week Change Compared to S&P 500
  10. Short % of Float
  11. Profit Margin %
  12. Quarterly Earnings Growth (yoy)
  13. Quarterly Revenue Growth (yoy)
  14. Return on Assets (ttm)
  15. Return on Equity (ttm)
  16. Total Cash
  17. Total Debt
  18. Target price estimate (low)
  19. Target price estimate (average)
  20. Target price estimate (median)
  21. Target price estimate (high)
python yahoo-finance-scraper.py --help usage: yahoo-finance-scraper.py [options] Get data about the S&P 500 companies. optional arguments: -h, --help show this help message and exit --all_tickers Download data for all companies whose ticker is specified in the -i option [default: true]. --ticker TICKER Download data for "ticker" only. -d, --download_index Download list of all S&P 500 compononents from Wikipedia [default: false]. -i INPUT, --input INPUT File containing the tickers, in CSV format. Tickers are in the 1st column [default: ./tickers.csv]. You can have any ticker available in Yahoo finance here, not only those of companies belonging to the S&P 500 index. -o OUTPUT, --output OUTPUT File where results are stored, in CSV format [default: ./result.csv]. -a, --append Do no create header [Default: false]. -p PAUSE, --pause PAUSE Interval between requests to Yahoo Finance [default: 4 seconds]. Without a pause between requests, Yahoo Finance will bounce queries. -t REQUEST_TIMEOUT, --request_timeout REQUEST_TIMEOUT Timeout for Yahoo Finance requests [default: 1 second]. The timeout value will be applied to both the connect and the read timeouts. Please refer to http://docs.python-requests.org/en/master/user/advanced/#timeouts for more details. 

The following libraries are used:

Читайте также:  Getting Started: Serving Web Content

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

This python script will take a list of the s&p500 from wiki and print the stock data from the yahoo finance api into a csv file and saving the file in the stock_dfs directory

fc410/SP500Script

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

For this project I created a script where I web scraped the S&P 500 list from the wikipedia webiste and used the yahoo finance library from Python to access the Yahoo Finance API to get the open, close, high, low, volume, date and adjusted close data from each S&P 500 company which is then stored in the stock_dfs directory in a CSV and could be accessed later for analysis.

  • Python
  • Pandas_datareader for accessing the Yahoo Finance API
  • bs4 for web scraping
  • Postman
Читайте также:  Передать значение ссылкой php

About

This python script will take a list of the s&p500 from wiki and print the stock data from the yahoo finance api into a csv file and saving the file in the stock_dfs directory

Источник

Python yahoo finance sp500

datetime to specify dates for the Pandas datareader, os is to check for, and create, directories. You already know what pandas is for!

To start our new function:

# save_sp500_tickers() def get_data_from_yahoo(reload_sp500=False): if reload_sp500: tickers = save_sp500_tickers() else: with open("sp500tickers.pickle", "rb") as f: tickers = pickle.load(f)

Here’s where I’ll just show a quick example of one way you could handle for whether or not to reload the S&P 500 list. If we ask it to, the program will re-pull the S&P 500 list, otherwise it will just use our pickle. Now we want to prepare to grab data.

Now we ne need to decide what we’re going to do with the data. What I tend to do is try to parse websites ONCE, and store the data locally. I don’t try to know in advance all of the things I might do with the data, but I know if I am going to pull it more than once, I might as well just save it (unless it’s a huge dataset, which this is not). Thus, we’re going to pull everything we can from what Yahoo returns to us for every stock and just save it. To do this, we’ll create a new directory, and, in there, store stock data per company. To begin, we need that initial directory:

if not os.path.exists('stock_dfs'): os.makedirs('stock_dfs')

You could just store these datasets in the same directory as your script, but this would get pretty messy in my opinion. Now we’re ready to pull the data. You already know how to do this, we did it in the very first tutorial!

start = dt.datetime(2010, 1, 1) end = dt.datetime.now() for ticker in tickers: # just in case your connection breaks, we'd like to save our progress! if not os.path.exists('stock_dfs/<>.csv'.format(ticker)): df = web.DataReader(ticker, 'morningstar', start, end) df.reset_index(inplace=True) df.set_index("Date", inplace=True) df = df.drop("Symbol", axis=1) df.to_csv('stock_dfs/<>.csv'.format(ticker)) else: print('Already have <>'.format(ticker))

You will likely in time want to do some sort of force_data_update parameter to this function, since, right now, it will not re-pull data it already sees hit has. Since we’re pulling daily data, you’d want to have this re-pulling at least the latest data. That said, if that’s the case, you might be better off with using a database instead with a table per company, and then just pulling the most recent values from the Yahoo database. We’ll keep things simple for now though!

Читайте также:  What is singleton class in java

Full code up to this point:

import bs4 as bs import datetime as dt import os import pandas_datareader.data as web import pickle import requests def save_sp500_tickers(): resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies') soup = bs.BeautifulSoup(resp.text, 'lxml') table = soup.find('table', ) tickers = [] for row in table.findAll('tr')[1:]: ticker = row.findAll('td')[0].text tickers.append(ticker) with open("sp500tickers.pickle", "wb") as f: pickle.dump(tickers, f) return tickers # save_sp500_tickers() def get_data_from_yahoo(reload_sp500=False): if reload_sp500: tickers = save_sp500_tickers() else: with open("sp500tickers.pickle", "rb") as f: tickers = pickle.load(f) if not os.path.exists('stock_dfs'): os.makedirs('stock_dfs') start = dt.datetime(2010, 1, 1) end = dt.datetime.now() for ticker in tickers: # just in case your connection breaks, we'd like to save our progress! if not os.path.exists('stock_dfs/<>.csv'.format(ticker)): df = web.DataReader(ticker, 'morningstar', start, end) df.reset_index(inplace=True) df.set_index("Date", inplace=True) df = df.drop("Symbol", axis=1) df.to_csv('stock_dfs/<>.csv'.format(ticker)) else: print('Already have <>'.format(ticker)) get_data_from_yahoo()

Go ahead and run this. You might want to import time and add a time.sleep(0.5) or something if Yahoo throttles you. At the time of my writing this, Yahoo did not throttle me at all and I was able to run this all the way through without any issues. It might take you a while still, however, especially depending on your machine. The good news is, however, we wont need to do it again! In practice, again, since this is daily data, however, you might do this once a day.

Also, if you have a slow internet, you don’t need to do all tickers, even just 10 would be enough, so you can just do for ticker in tickers[:10]: , or something like that to speed things up.

In the next tutorial, once you have the data downloaded, we’re going to compile the data we’re interested in into one large Pandas DataFrame.

Источник

Оцените статью