成功的量化交易——量化交易環境(8)

Halls-Moore Alpha 自營交易室


成功的量化交易——量化交易環境(8)


連接到數據庫


在配合Python使用MySQL之前,我們需要安裝mysqlclient庫。mysqlclient實際上是另一個庫的分支,稱為Python-MySQL。不幸的是,Python3不支持後一個庫,因此我們必須使用mysqlclient。在Mac OSX/UNIX類型的機器上,我們需要運行以下命令:


sudo apt-get install libmysqlclient-dev

pip install mysqlclient


現在,我們已經準備好通過Python和pandas與MySQL數據庫進行交互。


使用對象關係映射器


對於那些具有數據庫管理和開發背景的人,您可能會問,使用對象關係映射器(ORM)是否更明智。ORM允許將編程語言中的對象直接映射到數據庫中的表,這樣程序代碼就完全不知道底層存儲引擎。他們不是沒有他們的問題,但他們可以節省大量的時間。然而,節省時間通常是以犧牲性能為代價的。


一個流行的Python ORM是SQLAlchemy。它允許您在Python本身中指定數據庫模式,從而自動生成CREATE TABLE代碼。由於我們特別選擇了MySQL,並且關注性能,所以本章選擇不使用ORM。


品種檢索


讓我們從獲取與標準普爾500只大盤股相關的所有股票代碼開始,即標準普爾500指數。當然,這只是一個例子。如果你在英國進行交易,並希望使用英國國內指數,你同樣可以獲得在倫敦證交所(LSE)交易的FTSE100家公司的名單。維基百科方便地列出了標準普爾500指數的成分股。請注意,在標準普爾500指數中實際上有502個成分!我們將使用Python的request和BeautifulSoup庫來抓取網站,然後直接將內容添加到MySQL中。首先,確保安裝了庫:


pip install requests

pip install beautifulsoup4


下面的代碼將使用requests和BeautifulSoup庫將這些品種直接添加到前面創建的MySQL數據庫中。請記得將“密碼”替換為您所選擇的密碼,如下所示:


#!/usr/bin/python

# -*- coding: utf-8 -*-

# insert_symbols.py

from __future__

import print_function

import datetime

from math import ceil

import bs4

import MySQLdb as mdb

import requests

def obtain_parse_wiki_snp500():

"""
Download and parse the Wikipedia list of S&P500
constituents using requests and BeautifulSoup.
Returns a list of tuples for to add to MySQL.
"""

# Stores the current time, for the created_at record

now = datetime.datetime.utcnow()

# Use requests and BeautifulSoup to download the

# list of S&P500 companies and obtain the symbol table

response = requests.get( "http://en.wikipedia.org/wiki/List_of_S%26P_500_companies" )

soup = bs4.BeautifulSoup(response.text)

# This selects the first table, using CSS Selector syntax

# and then ignores the header row ([1:])

symbolslist = soup.select(’table’)[0].select(’tr’)[1:]

# Obtain the symbol information for each

# row in the S&P500 constituent table

symbols = []

for i, symbol in enumerate(symbolslist):

tds = symbol.select(’td’)

symbols.append(

(

tds[0].select(’a’)[0].text, # Ticker

’stock’,

tds[1].select(’a’)[0].text, # Name

tds[3].text, # Sector

’USD’, now, now

)

)

return symbols

def insert_snp500_symbols(symbols):

"""
Insert the S&P500 symbols into the MySQL database.
"""

# Connect to the MySQL instance

db_host = ’localhost’

db_user = ’sec_user’

db_pass = ’password’

db_name = ’securities_master’

con = mdb.connect(

host=db_host, user=db_user, passwd=db_pass, db=db_name

)

# Create the insert strings column_str = """ticker, instrument, name, sector,
currency, created_date, last_updated_date
"""

insert_str = ("%s, " * 7)[:-2]

final_str = "INSERT INTO symbol (%s) VALUES (%s)" % \\
(column_str, insert_str)

# Using the MySQL connection, carry out

# an INSERT INTO for every symbol

with con:

cur = con.cursor()

cur.executemany(final_str, symbols)

if __name__ == "__main__":

symbols = obtain_parse_wiki_snp500()

insert_snp500_symbols(symbols)

print("%s symbols were successfully added." % len(symbols))


在此階段,我們將在數據庫中包含標準普爾500指數的所有502個當前符號成分。我們的下一個任務是實際地從不同的來源獲得歷史定價數據,並將其與品種進行匹配。


價格檢索


為了獲得當前標準普爾500指數成分股的歷史數據,我們必須首先查詢數據庫中的所有符號列表。一旦返回了符號列表和符號id,就可以調用Yahoo Finance API並下載每個符號的歷史定價數據。一旦我們有了每個品種,我們就可以依次將數據插入數據庫。下面是實現這一點的Python代碼:


#!/usr/bin/python
# -*- coding: utf-8 -*-
# price_retrieval.py5
from __future__ import print_function
import datetime
import warnings
import MySQLdb as mdb
import requests
# Obtain a database connection to the MySQL instance
db_host = ’localhost’
db_user = ’sec_user’
db_pass = ’password’
db_name = ’securities_master’
con = mdb.connect(db_host, db_user, db_pass, db_name)
def obtain_list_of_db_tickers():
"""
Obtains a list of the ticker symbols in the database.
"""
with con:
cur = con.cursor()
cur.execute("SELECT id, ticker FROM symbol")
data = cur.fetchall()
return [(d[0], d[1]) for d in data]
def get_daily_historic_data_yahoo(
ticker, start_date=(2000,1,1),
end_date=datetime.date.today().timetuple()[0:3]


):
"""
Obtains data from Yahoo Finance returns and a list of tuples.
ticker: Yahoo Finance ticker symbol, e.g. "GOOG" for Google, Inc.
start_date: Start date in (YYYY, M, D) format
end_date: End date in (YYYY, M, D) format
"""
# Construct the Yahoo URL with the correct integer query parameters
# for start and end dates. Note that some parameters are zero-based!
ticker_tup = (
ticker, start_date[1]-1, start_date[2],
start_date[0], end_date[1]-1, end_date[2],
end_date[0]
)
yahoo_url = "http://ichart.finance.yahoo.com/table.csv"
yahoo_url += "?s=%s&a=%s&b=%s&c=%s&d=%s&e=%s&f=%s"
yahoo_url = yahoo_url % ticker_tup
# Try connecting to Yahoo Finance and obtaining the data
# On failure, print an error message.
try:
yf_data = requests.get(yahoo_url).text.split("\\n")[1:-1]
prices = []
for y in yf_data:
p = y.strip().split(’,’)
prices.append(
(datetime.datetime.strptime(p[0], ’%Y-%m-%d’),
p[1], p[2], p[3], p[4], p[5], p[6])
)
except Exception as e:
print("Could not download Yahoo data: %s" % e)
return prices
def insert_daily_data_into_db(
data_vendor_id, symbol_id, daily_data
):
"""
Takes a list of tuples of daily data and adds it to the
MySQL database. Appends the vendor ID and symbol ID to the data.
daily_data: List of tuples of the OHLC data (with
adj_close and volume)
"""
# Create the time now
now = datetime.datetime.utcnow()
# Amend the data to include the vendor ID and symbol ID
daily_data = [
(data_vendor_id, symbol_id, d[0], now, now,
d[1], d[2], d[3], d[4], d[5], d[6])
for d in daily_data
]
# Create the insert strings
column_str = """data_vendor_id, symbol_id, price_date, created_date,

last_updated_date, open_price, high_price, low_price,
close_price, volume, adj_close_price"""
insert_str = ("%s, " * 11)[:-2]
final_str = "INSERT INTO daily_price (%s) VALUES (%s)" % \\
(column_str, insert_str)
# Using the MySQL connection, carry out an INSERT INTO for every symbol
with con:
cur = con.cursor()
cur.executemany(final_str, daily_data)
if __name__ == "__main__":
# This ignores the warnings regarding Data Truncation
# from the Yahoo precision to Decimal(19,4) datatypes
warnings.filterwarnings(’ignore’)
# Loop over the tickers and insert the daily historical
# data into the database
tickers = obtain_list_of_db_tickers()
lentickers = len(tickers)
for i, t in enumerate(tickers):
print(
"Adding data for %s: %s out of %s" %
(t[1], i+1, lentickers)
)
yf_data = get_daily_historic_data_yahoo(t[1])
insert_daily_data_into_db(’1’, t[0], yf_data)
print("Successfully added Yahoo Finance pricing data to DB.")


請注意,當然有一些方法可以優化這個過程。例如,如果我們使用Python ScraPy庫,我們將從下載中獲得高併發性,因為ScraPy構建在事件驅動的Twisted框架上。目前,每個下載將按順序執行。



分享到:


相關文章: