0%

利用Python进行数据分析(11):金融和经济数据应用

利用Python进行数据分析》读书笔记。
第 11 章:金融和经济数据应用

自2005年开始,python在金融行业中的应用越来越多。

这主要得益于越来越成熟的函数库(NumPy和pandas)以及大量经验丰富的程序员。

许多机构发现python不仅非常适合成为交互式的分析环境,也非常适合开发文件的系统,所需的时间也比Java或C++少得多。

Python还是一种非常好的粘合层,可以非常轻松为C或C++编写的库构建Python接口。

金融分析领域的内容博大精深。在数据规整化方面所花费的精力常常会比解决核心建模和研究问题所花费的时间多得多。

在本章中,术语截面(cross-section)来表示某个时间点的数据。

例如标普500指数中所有成份股在特定日期的收盘价就形成了一个截面。

多个数据在多个时间点的截面数据就构成了一个面板(panel)。

面板数据既可以表示为层次化索引的DataFrame,也可以表示为三维的Panel pandas对象。

数据规整化方面的话题

利用Python进行数据分析》读书笔记。

第11章 第1节:数据规整化方面的话题

所有用到的数据可以从作者的 github下载。

%pylab inline
import pandas as pd
from pandas import Series, DataFrame
Populating the interactive namespace from numpy and matplotlib

数据规整:Data munging

时间序列以及截面对齐

处理金融数据时,最费神的一个问题就是所谓的数据对齐(data alignment)。

两个时间序列的索引可能没有很好的对齐,或者两个DataFrame对象可能含有不匹配的行或者列。

MATLAB、R用户通常会耗费大量的时间来进行数据对对齐工作。

pandas可以在运算中自动对齐数据。这是极好的,会提高效率。

close_px = pd.read_csv('data/ch11/stock_px.csv', parse_dates=True, index_col=0)
volume = pd.read_csv('data/ch11/volume.csv', parse_dates=True, index_col=0)
prices = close_px.ix['2011-09-05':'2011-09-14', ['AAPL', 'JNJ', 'SPX', 'XOM']]
volume = volume.ix['2011-09-05':'2011-09-12', ['AAPL', 'JNJ', 'XOM']]
prices
Out[6]:
AAPL JNJ SPX XOM
2011-09-06 379.74 64.64 1165.24 71.15
2011-09-07 383.93 65.43 1198.62 73.65
2011-09-08 384.14 64.95 1185.90 72.82
2011-09-09 377.48 63.64 1154.23 71.01
2011-09-12 379.94 63.59 1162.27 71.84
2011-09-13 384.62 63.61 1172.87 71.65
2011-09-14 389.30 63.73 1188.68 72.64
volume
Out[7]:
AAPL JNJ XOM
2011-09-06 18173500.0 15848300.0 25416300.0
2011-09-07 12492000.0 10759700.0 23108400.0
2011-09-08 14839800.0 15551500.0 22434800.0
2011-09-09 20171900.0 17008200.0 27969100.0
2011-09-12 16697300.0 13448200.0 26205800.0
# 如果想计算一个基于成交量的加权平均价
# pandas在算术运算时会自动对齐数据
# sum函数自动忽略NaN值
prices * volume
Out[10]:
AAPL JNJ SPX XOM
2011-09-06 6.901205e+09 1.024434e+09 NaN 1.808370e+09
2011-09-07 4.796054e+09 7.040072e+08 NaN 1.701934e+09
2011-09-08 5.700561e+09 1.010070e+09 NaN 1.633702e+09
2011-09-09 7.614489e+09 1.082402e+09 NaN 1.986086e+09
2011-09-12 6.343972e+09 8.551710e+08 NaN 1.882625e+09
2011-09-13 NaN NaN NaN NaN
2011-09-14 NaN NaN NaN NaN
# 计算基于成交量的加权平均价
vwap = (prices * volume).sum() / volume.sum()
vwap
Out[9]:
AAPL    380.655181
JNJ      64.394769
SPX            NaN
XOM      72.024288
dtype: float64
vwap.dropna()
Out[12]:
AAPL    380.655181
JNJ      64.394769
XOM      72.024288
dtype: float64
# 如果需要手工对齐,可以使用DataFrame的align方法
prices.align(volume, join='inner')
Out[13]:
(              AAPL    JNJ    XOM
 2011-09-06  379.74  64.64  71.15
 2011-09-07  383.93  65.43  73.65
 2011-09-08  384.14  64.95  72.82
 2011-09-09  377.48  63.64  71.01
 2011-09-12  379.94  63.59  71.84,
                   AAPL         JNJ         XOM
 2011-09-06  18173500.0  15848300.0  25416300.0
 2011-09-07  12492000.0  10759700.0  23108400.0
 2011-09-08  14839800.0  15551500.0  22434800.0
 2011-09-09  20171900.0  17008200.0  27969100.0
 2011-09-12  16697300.0  13448200.0  26205800.0)
# 通过一组索引可能不同的Series构建DataFrame
s1 = Series(range(3), index=['a', 'b', 'c'])
s2 = Series(range(4), index=['d', 'b', 'c', 'e'])
s3 = Series(range(3), index=['f', 'a', 'c'])
DataFrame({'one': s1, 'two': s2, 'three': s3})
Out[14]:
one three two
a 0.0 1.0 NaN
b 1.0 NaN 1.0
c 2.0 2.0 2.0
d NaN NaN 0.0
e NaN NaN 3.0
f NaN 0.0 NaN
# 可以指定结果的索引(丢弃其余的数据)
DataFrame({'one': s1, 'two': s2, 'three': s3}, index=list('face'))
Out[15]:
one three two
f NaN 0.0 NaN
a 0.0 1.0 NaN
c 2.0 2.0 2.0
e NaN NaN 3.0

频率不同的时间按序列的运算

经济学时间序列常常按年月日等频率进行数据统计。但有些是无规律的。

频率转换和重对齐的主要工具是resample 和 reindex 方法:

  • resample 用于将数据转换到固定频率
  • reindex 用于使数据符合一个新索引

二者都支持插值逻辑。

# 一个简单的周时间序列
ts1 = Series(np.random.randn(3),
             index=pd.date_range('2012-6-13', periods=3, freq='W-WED'))
ts1
Out[18]:
2012-06-13   -0.928173
2012-06-20    0.506413
2012-06-27    1.052517
Freq: W-WED, dtype: float64
# 重采样到工作日,就会有缺省值出现
ts1.resample('B').mean()
Out[19]:
2012-06-13   -0.928173
2012-06-14         NaN
2012-06-15         NaN
2012-06-18         NaN
2012-06-19         NaN
2012-06-20    0.506413
2012-06-21         NaN
2012-06-22         NaN
2012-06-25         NaN
2012-06-26         NaN
2012-06-27    1.052517
Freq: B, dtype: float64
# 用前面的值填充 NaN
ts1.resample('B').ffill()
Out[23]:
2012-06-13   -0.928173
2012-06-14   -0.928173
2012-06-15   -0.928173
2012-06-18   -0.928173
2012-06-19   -0.928173
2012-06-20    0.506413
2012-06-21    0.506413
2012-06-22    0.506413
2012-06-25    0.506413
2012-06-26    0.506413
2012-06-27    1.052517
Freq: B, dtype: float64
# 更一般的不规则时间序列
dates = pd.DatetimeIndex(['2012-6-12', '2012-6-17', '2012-6-18',
                          '2012-6-21', '2012-6-22', '2012-6-29'])
ts2 = Series(np.random.randn(6), index=dates)
ts2
Out[24]:
2012-06-12    0.131619
2012-06-17    1.440314
2012-06-18    0.780129
2012-06-21    1.024207
2012-06-22   -0.660424
2012-06-29   -0.218203
dtype: float64
# 如果想将处理过后的ts1加到ts2上,可以先将两个频率弄相同再相加
# 也可以用reindex方法,维持 ts2 的日期索引
ts1.reindex(ts2.index, method='ffill')
Out[26]:
2012-06-12         NaN
2012-06-17   -0.928173
2012-06-18   -0.928173
2012-06-21    0.506413
2012-06-22    0.506413
2012-06-29    1.052517
dtype: float64
ts2 + ts1.reindex(ts2.index, method='ffill')
Out[27]:
2012-06-12         NaN
2012-06-17    0.512141
2012-06-18   -0.148044
2012-06-21    1.530620
2012-06-22   -0.154011
2012-06-29    0.834314
dtype: float64

使用 Period

Period 提供了另一种处理不同频率时间序列的方法。

# 比如,一个公司可能会发布其以6月结尾的财年的每季度盈利报告,即频率为Q-JUN
gdp = Series([1.78, 1.94, 2.08, 2.01, 2.15, 2.31, 2.46],
             index=pd.period_range('1984Q2', periods=7, freq='Q-SEP'))
infl = Series([0.025, 0.045, 0.037, 0.04],
              index=pd.period_range('1982', periods=4, freq='A-DEC'))
gdp
Out[28]:
1984Q2    1.78
1984Q3    1.94
1984Q4    2.08
1985Q1    2.01
1985Q2    2.15
1985Q3    2.31
1985Q4    2.46
Freq: Q-SEP, dtype: float64
infl
Out[29]:
1982    0.025
1983    0.045
1984    0.037
1985    0.040
Freq: A-DEC, dtype: float64
# 与Timestamp的时间序列不同,由period索引的不同频率的时间序列之间的运算必须进行显示转换
# 假设已知 infl 值是每年年末观测的,于是可以将其转换为 Q-SEP ,以得到改频率下的正确时期
infl_q = infl.asfreq('Q-SEP', how='end')
infl_q
Out[31]:
1983Q1    0.025
1984Q1    0.045
1985Q1    0.037
1986Q1    0.040
Freq: Q-SEP, dtype: float64
# 显示转换以后,就可以被重新索引了(使用向前填充ffill ,以匹配gdp)
infl_q.reindex(gdp.index, method='ffill')
Out[32]:
1984Q2    0.045
1984Q3    0.045
1984Q4    0.045
1985Q1    0.037
1985Q2    0.037
1985Q3    0.037
1985Q4    0.037
Freq: Q-SEP, dtype: float64

时间和“最当前”数据选取

假设有一个很长的盘中数据,希望抽取其中每天特定时间的价格数据。如果数据不规整该怎么办?

# Make an intraday date range and time series
rng = pd.date_range('2012-06-01 09:30', '2012-06-01 15:59', freq='T')
# Make a 5-day series of 9:30-15:59 values
rng = rng.append([rng + pd.offsets.BDay(i) for i in range(1, 4)])
ts = Series(np.arange(len(rng), dtype=float), index=rng)
ts.head()
Out[34]:
2012-06-01 09:30:00    0.0
2012-06-01 09:31:00    1.0
2012-06-01 09:32:00    2.0
2012-06-01 09:33:00    3.0
2012-06-01 09:34:00    4.0
dtype: float64
# 只取10点钟的数据
from datetime import time
ts[time(10, 0)]
Out[37]:
2012-06-01 10:00:00      30.0
2012-06-04 10:00:00     420.0
2012-06-05 10:00:00     810.0
2012-06-06 10:00:00    1200.0
dtype: float64
# 该操作实际上用了实例方法at_time(各时间序列以及类似的DataFrame对象都有)
ts.at_time(time(10, 0))
Out[38]:
2012-06-01 10:00:00      30.0
2012-06-04 10:00:00     420.0
2012-06-05 10:00:00     810.0
2012-06-06 10:00:00    1200.0
dtype: float64
# 选取两个Time对象之间的值
ts.between_time(time(10, 0), time(10, 1))
Out[39]:
2012-06-01 10:00:00      30.0
2012-06-01 10:01:00      31.0
2012-06-04 10:00:00     420.0
2012-06-04 10:01:00     421.0
2012-06-05 10:00:00     810.0
2012-06-05 10:01:00     811.0
2012-06-06 10:00:00    1200.0
2012-06-06 10:01:00    1201.0
dtype: float64
# 可能刚好就没有任何数据落在某个具体的时间上(比如上午10点)。这时,可能会希望得到上午10点之前最后出现的值
#下面将该时间序列的大部分内容随机设置为NA
np.random.seed(12346)
indexer = np.sort(np.random.permutation(len(ts))[700:])
irr_ts = ts.copy()
irr_ts[indexer] = np.nan
irr_ts['2012-06-01 09:50':'2012-06-01 10:00']
Out[40]:
2012-06-01 09:50:00    20.0
2012-06-01 09:51:00     NaN
2012-06-01 09:52:00    22.0
2012-06-01 09:53:00    23.0
2012-06-01 09:54:00     NaN
2012-06-01 09:55:00    25.0
2012-06-01 09:56:00     NaN
2012-06-01 09:57:00     NaN
2012-06-01 09:58:00     NaN
2012-06-01 09:59:00     NaN
2012-06-01 10:00:00     NaN
dtype: float64
#如果将一组Timestamp传入asof方法,就能得到这些时间点处(或其之前最近)的有效值(非NA)。
# 例如,构造一个日期范围(每天上午10点),然后将其传入asof
selection = pd.date_range('2012-06-01 10:00', periods=4, freq='B')
irr_ts.asof(selection)
Out[41]:
2012-06-01 10:00:00      25.0
2012-06-04 10:00:00     420.0
2012-06-05 10:00:00     810.0
2012-06-06 10:00:00    1197.0
Freq: B, dtype: float64

拼接多个数据源

在第七章中曾经介绍了数据拼接的知识,在金融或经济中,还有另外几个经常出现的情况:

  • 在一个特定的时间点上,从一个数据源切换到另一个数据源
  • 用另一个时间序列对当前时间序列中的缺失值“打补丁”
  • 将数据中的符号(国家、资产代码等)替换为实际数据
# 关于特定时间的数据源切换,就是用concat函数进行连接
data1 = DataFrame(np.ones((6, 3), dtype=float),
                  columns=['a', 'b', 'c'],
                  index=pd.date_range('6/12/2012', periods=6))
data2 = DataFrame(np.ones((6, 3), dtype=float) * 2,
                  columns=['a', 'b', 'c'],
                  index=pd.date_range('6/13/2012', periods=6))
spliced = pd.concat([data1.ix[:'2012-06-14'], data2.ix['2012-06-15':]])
spliced
Out[42]:
a b c
2012-06-12 1.0 1.0 1.0
2012-06-13 1.0 1.0 1.0
2012-06-14 1.0 1.0 1.0
2012-06-15 2.0 2.0 2.0
2012-06-16 2.0 2.0 2.0
2012-06-17 2.0 2.0 2.0
2012-06-18 2.0 2.0 2.0
# 假设data1缺失了data2中存在的某个时间序列
data2 = DataFrame(np.ones((6, 4), dtype=float) * 2,
                  columns=['a', 'b', 'c', 'd'],
                  index=pd.date_range('6/13/2012', periods=6))
spliced = pd.concat([data1.ix[:'2012-06-14'], data2.ix['2012-06-15':]])
spliced
Out[43]:
a b c d
2012-06-12 1.0 1.0 1.0 NaN
2012-06-13 1.0 1.0 1.0 NaN
2012-06-14 1.0 1.0 1.0 NaN
2012-06-15 2.0 2.0 2.0 2.0
2012-06-16 2.0 2.0 2.0 2.0
2012-06-17 2.0 2.0 2.0 2.0
2012-06-18 2.0 2.0 2.0 2.0
# combine_first可以引入合并点之前的数据,这样也就扩展了'd'项的历史
spliced_filled = spliced.combine_first(data2)
spliced_filled
Out[44]:
a b c d
2012-06-12 1.0 1.0 1.0 NaN
2012-06-13 1.0 1.0 1.0 2.0
2012-06-14 1.0 1.0 1.0 2.0
2012-06-15 2.0 2.0 2.0 2.0
2012-06-16 2.0 2.0 2.0 2.0
2012-06-17 2.0 2.0 2.0 2.0
2012-06-18 2.0 2.0 2.0 2.0
# DataFrame也有一个类似的方法update,它可以实现就地更新
# 如果只想填充空洞,则必须差U纳入overwrite = False才行
# 不传入overwrite会把整条数据都覆盖

spliced.update(data2, overwrite=False)
spliced
Out[46]:
a b c d
2012-06-12 1.0 1.0 1.0 NaN
2012-06-13 1.0 1.0 1.0 2.0
2012-06-14 1.0 1.0 1.0 2.0
2012-06-15 2.0 2.0 2.0 2.0
2012-06-16 2.0 2.0 2.0 2.0
2012-06-17 2.0 2.0 2.0 2.0
2012-06-18 2.0 2.0 2.0 2.0
# 上面所讲的技术可以将数据中的符号替换为实际数据
# 但有时利用 DataFrame的索引机制直接进行设置会更简单一些
cp_spliced = spliced.copy()
cp_spliced[['a', 'c']] = data1[['a', 'c']]
cp_spliced
Out[47]:
a b c d
2012-06-12 1.0 1.0 1.0 NaN
2012-06-13 1.0 1.0 1.0 2.0
2012-06-14 1.0 1.0 1.0 2.0
2012-06-15 1.0 2.0 1.0 2.0
2012-06-16 1.0 2.0 1.0 2.0
2012-06-17 1.0 2.0 1.0 2.0
2012-06-18 NaN 2.0 NaN 2.0

收益指数和累计收益

金融领域中,收益(return)通常指的是某资产价格的百分比变化。

# 2011到2012年苹果公司的股票价格数据
from pandas_datareader import data, yahoo
price =yahoo.daily.YahooDailyReader.read('AAPL')['Adj Close']
price[-5:]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-69-1c64e0491f72> in <module>()
      1 # 2011到2012年苹果公司的股票价格数据
      2 from pandas_datareader import data, yahoo
----> 3 price =yahoo.daily.YahooDailyReader.read('AAPL')['Adj Close']
      4 price[-5:]

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\yahoo\daily.py in read(self)
     75     def read(self):
     76         """ read one data from specified URL """
---> 77         df = super(YahooDailyReader, self).read()
     78         if self.ret_index:
     79             df['Ret_Index'] = _calc_return_index(df['Adj Close'])

TypeError: super(type, obj): obj must be an instance or subtype of type
# 计算两个时间点之间的累计百分比回报只需计算价格的百分比变化即可
price['2011-10-03'] / price['2011-3-01'] - 1
# 通常会先算出一个收益指数,它表示单位投资(比如1美元)收益的时间序列
# 从收益指数中可以得出许多假设。例如,人们可以决定是否进行利润再投资
# 可以用cumprod计算出一个简单的收益指数

returns = price.pct_change()
ret_index = (1 + returns).cumprod()
ret_index[0] = 1  # Set first value to 1
ret_index
# 得到收益指数之后,计算指定时期内的累计收益就很简单了
m_returns = ret_index.resample('BM', how='last').pct_change()
m_returns['2012']
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-70-b1657b385e76> in <module>()
      1 # 得到收益指数之后,计算指定时期内的累计收益就很简单了
----> 2 m_returns = ret_index.resample('BM', how='last').pct_change()
      3 m_returns['2012']

NameError: name 'ret_index' is not defined
#如果知道了股息的派发日和支付率,就可以将它们计入到每日总收益中
m_rets = (1 + returns).resample('M', how='prod', kind='period') - 1
m_rets['2012']
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-71-3ca97f7f2ac8> in <module>()
      1 #如果知道了股息的派发日和支付率,就可以将它们计入到每日总收益中
----> 2 m_rets = (1 + returns).resample('M', how='prod', kind='period') - 1
      3 m_rets['2012']

NameError: name 'returns' is not defined
returns[dividend_dates] += dividend_pcts
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-72-2136d75d50a0> in <module>()
----> 1 returns[dividend_dates] += dividend_pcts

NameError: name 'returns' is not defined
 

分组变换和分析

利用Python进行数据分析》读书笔记。

第11章 第2节:分组变换和分析

所有用到的数据可以从作者的 github下载。

%pylab inline
import pandas as pd
from pandas import Series, DataFrame
Populating the interactive namespace from numpy and matplotlib

在第九章中,已经学习了分组统计的基础,还学习了如何对数据集的分组应用自定义的变换函数。

下面以一组假想的投资组合为例。

pd.options.display.max_rows = 100
pd.options.display.max_columns = 10
np.random.seed(12345)

import pytz
import random; random.seed(0)
import string

#首先生成1000个股票代码

N = 1000
def rands(n):
    choices = string.ascii_uppercase
    return ''.join([random.choice(choices) for _ in range(n)])
tickers = np.array([rands(5) for _ in range(N)])
# 创建一个含有3列的DataFrame来承载这些假想数据,不过只选择部分股票组成该投资组合
M = 500
df = DataFrame({'Momentum' : np.random.randn(M) / 200 + 0.03,
                'Value' : np.random.randn(M) / 200 + 0.08,
                'ShortInterest' : np.random.randn(M) / 200 - 0.02},
                index=tickers[:M])
# 随机创建行业分类
ind_names = np.array(['FINANCIAL', 'TECH'])
sampler = np.random.randint(0, len(ind_names), N)
industries = Series(ind_names[sampler], index=tickers,
                    name='industry')
# 根据行业分类进行分组并执行分组聚合和变换
by_industry = df.groupby(industries)
by_industry.mean()
Out[9]:
Momentum ShortInterest Value
industry
FINANCIAL 0.029485 -0.020739 0.079929
TECH 0.030407 -0.019609 0.080113
by_industry.describe()
Out[10]:
Momentum ShortInterest Value
industry
FINANCIAL count 246.000000 246.000000 246.000000
mean 0.029485 -0.020739 0.079929
std 0.004802 0.004986 0.004548
min 0.017210 -0.036997 0.067025
25% 0.026263 -0.024138 0.076638
50% 0.029261 -0.020833 0.079804
75% 0.032806 -0.017345 0.082718
max 0.045884 -0.006322 0.093334
TECH count 254.000000 254.000000 254.000000
mean 0.030407 -0.019609 0.080113
std 0.005303 0.005074 0.004886
min 0.016778 -0.032682 0.065253
25% 0.026456 -0.022779 0.076737
50% 0.030650 -0.019829 0.080296
75% 0.033602 -0.016923 0.083353
max 0.049638 -0.003698 0.093081
# 自定义变换:行业内标准化处理(平均值为 0 ,标准差为 1 )
def zscore(group):
    return (group - group.mean()) / group.std()

df_stand = by_industry.apply(zscore)
df_stand.groupby(industries).agg(['mean', 'std'])
Out[12]:
Momentum ShortInterest Value
mean std mean std mean std
industry
FINANCIAL 1.114736e-15 1.0 3.081772e-15 1.0 8.001278e-15 1.0
TECH -2.779929e-16 1.0 -1.910982e-15 1.0 -7.139521e-15 1.0
# 内置变换函数(比如rank)的用法会更简洁一些
ind_rank = by_industry.rank(ascending=False)
ind_rank.groupby(industries).agg(['min', 'max'])
Out[14]:
Momentum ShortInterest Value
min max min max min max
industry
FINANCIAL 1.0 246.0 1.0 246.0 1.0 246.0
TECH 1.0 254.0 1.0 254.0 1.0 254.0
# 在股票投资组合的定量分析中,排名和标准化是一种常见的变换运算组合。
# 通过rank和zscore链接在一起即可完成整个过程
# 行业内排名和标准化,这是把排名进行了标准化
# Industry rank and standardize
by_industry.apply(lambda x: zscore(x.rank())).head()
Out[16]:
Momentum ShortInterest Value
MYNBI -0.091346 -0.976696 -1.004802
QPMZJ 0.794005 1.299919 -0.358356
PLSGQ -0.541047 -0.836164 -1.679355
EJEYD -0.583207 -1.623142 0.990749
TZIRW 1.572120 -0.265423 0.374314

分组因子暴露

因子分析(factor analysis)是投资组合定量管理中的一种技术。

投资组合的持有量和性能(收益与损失)可以被分解为一个或多个表示投资组合权重的因子(风险因子就是其中之一)。

例如,某只股票与某个基准(比如标普500指数)的协动性被称为其beta风险系数。

下面以一个人为构成的投资的投资组合为例进行讲解,它由三个随机生成的因子(通常称为因子载荷)和一些权重构成。

from numpy.random import rand
fac1, fac2, fac3 = np.random.rand(3, 1000)

ticker_subset = tickers.take(np.random.permutation(N)[:1000])

# 因子加权和,噪声
port = Series(0.7 * fac1 - 1.2 * fac2 + 0.3 * fac3 + rand(1000),
              index=ticker_subset)
factors = DataFrame({'f1': fac1, 'f2': fac2, 'f3': fac3},
                    index=ticker_subset)
# 各因子与投资组合之间的矢量相关性可能说明不了什么问题
factors.corrwith(port)
Out[18]:
f1    0.402377
f2   -0.680980
f3    0.168083
dtype: float64
#计算因子暴露的标准方式是最小二乘回归, 可以使用pandas.ols
pd.ols(y=port, x=factors).beta
C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2881: FutureWarning: The pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages like statsmodels, see some examples here: http://www.statsmodels.org/stable/regression.html
  exec(code_obj, self.user_global_ns, self.user_ns)
Out[19]:
f1           0.761789
f2          -1.208760
f3           0.289865
intercept    0.484477
dtype: float64
#可以看出,由于没有给投资组合添加过多的随机噪声,所以原始因子基本恢复了。
# 还可以通过groupby计算各行业的暴露量
def beta_exposure(chunk, factors=None):
    return pd.ols(y=chunk, x=factors).beta
# 根据行业进行分组,并应用该函数
by_ind = port.groupby(industries)
exposures = by_ind.apply(beta_exposure, factors=factors)
exposures.unstack()
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py:685: FutureWarning: The pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages like statsmodels, see some examples here: http://www.statsmodels.org/stable/regression.html
  return func(g, *args, **kwargs)
Out[22]:
f1 f2 f3 intercept
industry
FINANCIAL 0.790329 -1.182970 0.275624 0.455569
TECH 0.740857 -1.232882 0.303811 0.508188

十分位和四分位分析

基于样本分位数的分析是金融分析师们的另一个重要工具,

例如,股票投资组合的性能可以根据各股的市盈率被划分入四分位。

通过pandas.qcut和groupby可以轻松实现分位数分析。

import pandas.io.data as web
data = web.get_data_yahoo('SPY', '2006-01-01')
data.info()
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-24-26e850125bda> in <module>()
----> 1 import pandas.io.data as web
      2 data = web.get_data_yahoo('SPY', '2006-01-01')
      3 data.info()

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\data.py in <module>()
      1 raise ImportError(
----> 2     "The pandas.io.data module is moved to a separate package "
      3     "(pandas-datareader). After installing the pandas-datareader package "
      4     "(https://github.com/pandas-dev/pandas-datareader), you can change "
      5     "the import ``from pandas.io import data, wb`` to "

ImportError: The pandas.io.data module is moved to a separate package (pandas-datareader). After installing the pandas-datareader package (https://github.com/pandas-dev/pandas-datareader), you can change the import ``from pandas.io import data, wb`` to ``from pandas_datareader import data, wb``.
# 计算日收益率,并编写一个用于将收益率转换为趋势信号的函数

px = data['Adj Close']
returns = px.pct_change()

def to_index(rets):
    index = (1 + rets).cumprod()
    first_loc = max(index.index.get_loc(index.idxmax()) - 1, 0)
    index.values[first_loc] = 1
    return index

def trend_signal(rets, lookback, lag):
    signal = pd.rolling_sum(rets, lookback, min_periods=lookback - 5)
    return signal.shift(lag)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-4b81d1bad1a6> in <module>()
      1 # 计算日收益率,并编写一个用于将收益率转换为趋势信号的函数
      2 
----> 3 px = data['Adj Close']
      4 returns = px.pct_change()
      5 

NameError: name 'data' is not defined
# 通过该函数,我们可以单纯地创建和测试一种根据每周五动量信号进行交易的交易策略
signal = trend_signal(returns, 100, 3)
trade_friday = signal.resample('W-FRI').resample('B', fill_method='ffill')
trade_rets = trade_friday.shift(1) * returns
trade_rets = trade_rets[:len(returns)]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-26-fb0073b1d37d> in <module>()
      1 # 通过该函数,我们可以单纯地创建和测试一种根据每周五动量信号进行交易的交易策略
----> 2 signal = trend_signal(returns, 100, 3)
      3 trade_friday = signal.resample('W-FRI').resample('B', fill_method='ffill')
      4 trade_rets = trade_friday.shift(1) * returns
      5 trade_rets = trade_rets[:len(returns)]

NameError: name 'trend_signal' is not defined
# 将该策略的收益率转换为一个收益指数,并绘制一张图表
to_index(trade_rets).plot()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-27-adbb6b68a3f3> in <module>()
      1 # 将该策略的收益率转换为一个收益指数,并绘制一张图表
----> 2 to_index(trade_rets).plot()

NameError: name 'to_index' is not defined
# 假如希望将该策略的性能按不同大小的交易期波幅进行划分。
# 年度标准差是计算波幅的一种简单办法,可以通过计算夏普比率来观察不同波动机制下的风险收益率:

vol = pd.rolling_std(returns, 250, min_periods=200) * np.sqrt(250)

def sharpe(rets, ann=250):
    return rets.mean() / rets.std()  * np.sqrt(ann)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-28-7bfb63db4a01> in <module>()
      2 # 年度标准差是计算波幅的一种简单办法,可以通过计算夏普比率来观察不同波动机制下的风险收益率:
      3 
----> 4 vol = pd.rolling_std(returns, 250, min_periods=200) * np.sqrt(250)
      5 
      6 def sharpe(rets, ann=250):

NameError: name 'returns' is not defined
# 现在利用qcut将vol划分为4等份,并用sharpe进行聚合
cats = pd.qcut(vol, 4)
print('cats: %d, trade_rets: %d, vol: %d' % (len(cats), len(trade_rets), len(vol)))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-29-daca98ef2bb1> in <module>()
      1 # 现在利用qcut将vol划分为4等份,并用sharpe进行聚合
----> 2 cats = pd.qcut(vol, 4)
      3 print('cats: %d, trade_rets: %d, vol: %d' % (len(cats), len(trade_rets), len(vol)))

NameError: name 'vol' is not defined
trade_rets.groupby(cats).agg(sharpe)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-30-27c438e4e892> in <module>()
----> 1 trade_rets.groupby(cats).agg(sharpe)

NameError: name 'trade_rets' is not defined
 

更多示例应用

利用Python进行数据分析》读书笔记。

第11章 第3节:更多示例应用

所有用到的数据可以从作者的 github下载。

%pylab inline
import pandas as pd
from pandas import Series, DataFrame
Populating the interactive namespace from numpy and matplotlib

信号前沿分析

本小节将介绍一种简化的截面动量投资组合,并得出如何得到模型参数化网格。

# 将几只股票做成一个投资组合,并假装历史价格数据
import pandas_datareader.data as web

names = ['AAPL', 'GOOG', 'MSFT', 'DELL', 'GS', 'MS', 'BAC', 'C']
def get_px(stock, start, end):
    return web.DataReader(stock, 'yahoo',start, end)['Adj Close']
px = DataFrame({n: get_px(n, None, None) for n in names})

#px = pd.read_csv('data/ch11/stock_px.csv')
px.head()
---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connection.py in _new_conn(self)
    137             conn = connection.create_connection(
--> 138                 (self.host, self.port), self.timeout, **extra_kw)
    139 

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\util\connection.py in create_connection(address, timeout, source_address, socket_options)
     74 
---> 75     for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
     76         af, socktype, proto, canonname, sa = res

C:\ProgramData\Anaconda3\lib\socket.py in getaddrinfo(host, port, family, type, proto, flags)
    742     addrlist = []
--> 743     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    744         af, socktype, proto, canonname, sa = res

gaierror: [Errno 11004] getaddrinfo failed

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, **response_kw)
    593                                                   body=body, headers=headers,
--> 594                                                   chunked=chunked)
    595 

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    360         else:
--> 361             conn.request(method, url, **httplib_request_kw)
    362 

C:\ProgramData\Anaconda3\lib\http\client.py in request(self, method, url, body, headers, encode_chunked)
   1238         """Send a complete request to the server."""
-> 1239         self._send_request(method, url, body, headers, encode_chunked)
   1240 

C:\ProgramData\Anaconda3\lib\http\client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1284             body = _encode(body, 'body')
-> 1285         self.endheaders(body, encode_chunked=encode_chunked)
   1286 

C:\ProgramData\Anaconda3\lib\http\client.py in endheaders(self, message_body, encode_chunked)
   1233             raise CannotSendHeader()
-> 1234         self._send_output(message_body, encode_chunked=encode_chunked)
   1235 

C:\ProgramData\Anaconda3\lib\http\client.py in _send_output(self, message_body, encode_chunked)
   1025         del self._buffer[:]
-> 1026         self.send(msg)
   1027 

C:\ProgramData\Anaconda3\lib\http\client.py in send(self, data)
    963             if self.auto_open:
--> 964                 self.connect()
    965             else:

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connection.py in connect(self)
    162     def connect(self):
--> 163         conn = self._new_conn()
    164         self._prepare_conn(conn)

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connection.py in _new_conn(self)
    146             raise NewConnectionError(
--> 147                 self, "Failed to establish a new connection: %s" % e)
    148 

NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x000000000B5C59B0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    422                     retries=self.max_retries,
--> 423                     timeout=timeout
    424                 )

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, **response_kw)
    642             retries = retries.increment(method, url, error=e, _pool=self,
--> 643                                         _stacktrace=sys.exc_info()[2])
    644             retries.sleep()

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    362         if new_retry.is_exhausted():
--> 363             raise MaxRetryError(_pool, url, error or ResponseError(cause))
    364 

MaxRetryError: HTTPConnectionPool(host='ichart.finance.yahoo.com', port=80): Max retries exceeded with url: /table.csv?s=AAPL&a=0&b=1&c=2010&d=6&e=24&f=2017&g=d&ignore=.csv (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x000000000B5C59B0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
<ipython-input-16-74effd808e23> in <module>()
      5 def get_px(stock, start, end):
      6     return web.DataReader(stock, 'yahoo',start, end)['Adj Close']
----> 7 px = DataFrame({n: get_px(n, None, None) for n in names})
      8 
      9 #px = pd.read_csv('data/ch11/stock_px.csv')

<ipython-input-16-74effd808e23> in <dictcomp>(.0)
      5 def get_px(stock, start, end):
      6     return web.DataReader(stock, 'yahoo',start, end)['Adj Close']
----> 7 px = DataFrame({n: get_px(n, None, None) for n in names})
      8 
      9 #px = pd.read_csv('data/ch11/stock_px.csv')

<ipython-input-16-74effd808e23> in get_px(stock, start, end)
      4 names = ['AAPL', 'GOOG', 'MSFT', 'DELL', 'GS', 'MS', 'BAC', 'C']
      5 def get_px(stock, start, end):
----> 6     return web.DataReader(stock, 'yahoo',start, end)['Adj Close']
      7 px = DataFrame({n: get_px(n, None, None) for n in names})
      8 

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\data.py in DataReader(name, data_source, start, end, retry_count, pause, session, access_key)
    115                                 adjust_price=False, chunksize=25,
    116                                 retry_count=retry_count, pause=pause,
--> 117                                 session=session).read()
    118 
    119     elif data_source == "yahoo-actions":

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\yahoo\daily.py in read(self)
     75     def read(self):
     76         """ read one data from specified URL """
---> 77         df = super(YahooDailyReader, self).read()
     78         if self.ret_index:
     79             df['Ret_Index'] = _calc_return_index(df['Adj Close'])

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py in read(self)
    155         if isinstance(self.symbols, (compat.string_types, int)):
    156             df = self._read_one_data(self.url,
--> 157                                      params=self._get_params(self.symbols))
    158         # Or multiple symbols, (e.g., ['GOOG', 'AAPL', 'MSFT'])
    159         elif isinstance(self.symbols, DataFrame):

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py in _read_one_data(self, url, params)
     72         """ read one data from specified URL """
     73         if self._format == 'string':
---> 74             out = self._read_url_as_StringIO(url, params=params)
     75         elif self._format == 'json':
     76             out = self._get_response(url, params=params).json()

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py in _read_url_as_StringIO(self, url, params)
     83         Open url (and retry)
     84         """
---> 85         response = self._get_response(url, params=params)
     86         text = self._sanitize_response(response)
     87         out = StringIO()

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py in _get_response(self, url, params)
    112         # initial attempt + retry
    113         for i in range(self.retry_count + 1):
--> 114             response = self.session.get(url, params=params)
    115             if response.status_code == requests.codes.ok:
    116                 return response

C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py in get(self, url, **kwargs)
    499 
    500         kwargs.setdefault('allow_redirects', True)
--> 501         return self.request('GET', url, **kwargs)
    502 
    503     def options(self, url, **kwargs):

C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    486         }
    487         send_kwargs.update(settings)
--> 488         resp = self.send(prep, **send_kwargs)
    489 
    490         return resp

C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py in send(self, request, **kwargs)
    607 
    608         # Send the request
--> 609         r = adapter.send(request, **kwargs)
    610 
    611         # Total elapsed time of the request (approximately)

C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    485                 raise ProxyError(e, request=request)
    486 
--> 487             raise ConnectionError(e, request=request)
    488 
    489         except ClosedPoolError as e:

ConnectionError: HTTPConnectionPool(host='ichart.finance.yahoo.com', port=80): Max retries exceeded with url: /table.csv?s=AAPL&a=0&b=1&c=2010&d=6&e=24&f=2017&g=d&ignore=.csv (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x000000000B5C59B0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
# 绘制每只股票的累计收益
px = px.asfreq('B').fillna(method='pad')
rets = px.pct_change()
((1 + rets).cumprod() - 1).plot()
C:\ProgramData\Anaconda3\lib\site-packages\pandas\tseries\index.py:506: UserWarning: Discarding nonzero nanoseconds in conversion
  index = _generate_regular_range(start, end, periods, offset)
C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axes\_base.py:2903: UserWarning: Attempting to set identical left==right results
in singular transformations; automatically expanding.
left=0.0, right=0.0
  'left=%s, right=%s') % (left, right))
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0xa12feb8>
# 对于投资组合的构建,计算特定回顾期的动量,然后按降序排列,并标准化
def calc_mom(price, lookback, lag):
    mom_ret = price.shift(lag).pct_change(lookback)
    ranks = mom_ret.rank(axis=1, ascending=False)
    demeaned = ranks.subtract(ranks.mean(axis=1), axis=0)
    return demeaned.divide(demeaned.std(axis=1), axis=0)
# 编写检验函数,计算夏普比率
compound = lambda x : (1 + x).prod() - 1
daily_sr = lambda x: x.mean() / x.std()

def strat_sr(prices, lb, hold):
    # Compute portfolio weights
    freq = '%dB' % hold
    port = calc_mom(prices, lb, lag=1)

    daily_rets = prices.pct_change()

    # 计算投资组合收益
    port = port.shift(1).resample(freq, how='first')
    returns = daily_rets.resample(freq, how=compound)
    port_rets = (port * returns).sum(axis=1)

    return daily_sr(port_rets) * np.sqrt(252 / hold)
# 得到一个标量值
strat_sr(px, 70, 30)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py in na_op(x, y)
   1169             result = expressions.evaluate(op, str_rep, x, y,
-> 1170                                           raise_on_error=True, **eval_kwargs)
   1171         except TypeError:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\computation\expressions.py in evaluate(op, op_str, a, b, raise_on_error, use_numexpr, **eval_kwargs)
    209         return _evaluate(op, op_str, a, b, raise_on_error=raise_on_error,
--> 210                          **eval_kwargs)
    211     return _evaluate_standard(op, op_str, a, b, raise_on_error=raise_on_error)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\computation\expressions.py in _evaluate_numexpr(op, op_str, a, b, raise_on_error, truediv, reversed, **eval_kwargs)
    120     if result is None:
--> 121         result = _evaluate_standard(op, op_str, a, b, raise_on_error)
    122 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\computation\expressions.py in _evaluate_standard(op, op_str, a, b, raise_on_error, **eval_kwargs)
     62     with np.errstate(all='ignore'):
---> 63         return op(a, b)
     64 

TypeError: unsupported operand type(s) for /: 'str' and 'float'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-11-f56a8482ef9f> in <module>()
      1 # 得到一个标量值
----> 2 strat_sr(px, 70, 30)

<ipython-input-8-46ba12b4e341> in strat_sr(prices, lb, hold)
      6     # Compute portfolio weights
      7     freq = '%dB' % hold
----> 8     port = calc_mom(prices, lb, lag=1)
      9 
     10     daily_rets = prices.pct_change()

<ipython-input-7-a52d117a5e26> in calc_mom(price, lookback, lag)
      1 # 对于投资组合的构建,计算特定回顾期的动量,然后按降序排列,并标准化
      2 def calc_mom(price, lookback, lag):
----> 3     mom_ret = price.shift(lag).pct_change(lookback)
      4     ranks = mom_ret.rank(axis=1, ascending=False)
      5     demeaned = ranks.subtract(ranks.mean(axis=1), axis=0)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in pct_change(self, periods, fill_method, limit, freq, **kwargs)
   5330 
   5331         rs = (data.div(data.shift(periods=periods, freq=freq, axis=axis,
-> 5332                                   **kwargs)) - 1)
   5333         if freq is None:
   5334             mask = isnull(_values_from_object(self))

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py in f(self, other, axis, level, fill_value)
   1226 
   1227         if isinstance(other, pd.DataFrame):  # Another DataFrame
-> 1228             return self._combine_frame(other, na_op, fill_value, level)
   1229         elif isinstance(other, ABCSeries):
   1230             return self._combine_series(other, na_op, fill_value, axis, level)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in _combine_frame(self, other, func, fill_value, level)
   3548                                                     dtype=r.dtype)
   3549 
-> 3550                 result = dict([(col, f(col)) for col in this])
   3551 
   3552             # non-unique

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in <listcomp>(.0)
   3548                                                     dtype=r.dtype)
   3549 
-> 3550                 result = dict([(col, f(col)) for col in this])
   3551 
   3552             # non-unique

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in f(col)
   3544 
   3545                 def f(col):
-> 3546                     r = _arith_op(this[col].values, other[col].values)
   3547                     return self._constructor_sliced(r, index=new_index,
   3548                                                     dtype=r.dtype)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in _arith_op(left, right)
   3536                 right[right_mask & mask] = fill_value
   3537 
-> 3538             return func(left, right)
   3539 
   3540         if this._is_mixed_type or other._is_mixed_type:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py in na_op(x, y)
   1187                 if np.prod(xrav.shape) and np.prod(yrav.shape):
   1188                     with np.errstate(all='ignore'):
-> 1189                         result[mask] = op(xrav, yrav)
   1190             elif hasattr(x, 'size'):
   1191                 result = np.empty(x.size, dtype=x.dtype)

TypeError: unsupported operand type(s) for /: 'str' and 'str'
# 对参数网格(多对参数组合)应用strat_sr函数,结果报出到defaultdict中
# 最后将全部结果放到DataFrame
from collections import defaultdict

lookbacks = range(20, 90, 5)
holdings = range(20, 90, 5)
dd = defaultdict(dict)
for lb in lookbacks:
    for hold in holdings:
        dd[lb][hold] = strat_sr(px, lb, hold)

ddf = DataFrame(dd)
ddf.index.name = 'Holding Period'
ddf.columns.name = 'Lookback Period'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py in na_op(x, y)
   1169             result = expressions.evaluate(op, str_rep, x, y,
-> 1170                                           raise_on_error=True, **eval_kwargs)
   1171         except TypeError:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\computation\expressions.py in evaluate(op, op_str, a, b, raise_on_error, use_numexpr, **eval_kwargs)
    209         return _evaluate(op, op_str, a, b, raise_on_error=raise_on_error,
--> 210                          **eval_kwargs)
    211     return _evaluate_standard(op, op_str, a, b, raise_on_error=raise_on_error)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\computation\expressions.py in _evaluate_numexpr(op, op_str, a, b, raise_on_error, truediv, reversed, **eval_kwargs)
    120     if result is None:
--> 121         result = _evaluate_standard(op, op_str, a, b, raise_on_error)
    122 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\computation\expressions.py in _evaluate_standard(op, op_str, a, b, raise_on_error, **eval_kwargs)
     62     with np.errstate(all='ignore'):
---> 63         return op(a, b)
     64 

TypeError: unsupported operand type(s) for /: 'str' and 'float'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-12-ad29c52e2e4b> in <module>()
      8 for lb in lookbacks:
      9     for hold in holdings:
---> 10         dd[lb][hold] = strat_sr(px, lb, hold)
     11 
     12 ddf = DataFrame(dd)

<ipython-input-8-46ba12b4e341> in strat_sr(prices, lb, hold)
      6     # Compute portfolio weights
      7     freq = '%dB' % hold
----> 8     port = calc_mom(prices, lb, lag=1)
      9 
     10     daily_rets = prices.pct_change()

<ipython-input-7-a52d117a5e26> in calc_mom(price, lookback, lag)
      1 # 对于投资组合的构建,计算特定回顾期的动量,然后按降序排列,并标准化
      2 def calc_mom(price, lookback, lag):
----> 3     mom_ret = price.shift(lag).pct_change(lookback)
      4     ranks = mom_ret.rank(axis=1, ascending=False)
      5     demeaned = ranks.subtract(ranks.mean(axis=1), axis=0)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in pct_change(self, periods, fill_method, limit, freq, **kwargs)
   5330 
   5331         rs = (data.div(data.shift(periods=periods, freq=freq, axis=axis,
-> 5332                                   **kwargs)) - 1)
   5333         if freq is None:
   5334             mask = isnull(_values_from_object(self))

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py in f(self, other, axis, level, fill_value)
   1226 
   1227         if isinstance(other, pd.DataFrame):  # Another DataFrame
-> 1228             return self._combine_frame(other, na_op, fill_value, level)
   1229         elif isinstance(other, ABCSeries):
   1230             return self._combine_series(other, na_op, fill_value, axis, level)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in _combine_frame(self, other, func, fill_value, level)
   3548                                                     dtype=r.dtype)
   3549 
-> 3550                 result = dict([(col, f(col)) for col in this])
   3551 
   3552             # non-unique

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in <listcomp>(.0)
   3548                                                     dtype=r.dtype)
   3549 
-> 3550                 result = dict([(col, f(col)) for col in this])
   3551 
   3552             # non-unique

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in f(col)
   3544 
   3545                 def f(col):
-> 3546                     r = _arith_op(this[col].values, other[col].values)
   3547                     return self._constructor_sliced(r, index=new_index,
   3548                                                     dtype=r.dtype)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in _arith_op(left, right)
   3536                 right[right_mask & mask] = fill_value
   3537 
-> 3538             return func(left, right)
   3539 
   3540         if this._is_mixed_type or other._is_mixed_type:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py in na_op(x, y)
   1187                 if np.prod(xrav.shape) and np.prod(yrav.shape):
   1188                     with np.errstate(all='ignore'):
-> 1189                         result[mask] = op(xrav, yrav)
   1190             elif hasattr(x, 'size'):
   1191                 result = np.empty(x.size, dtype=x.dtype)

TypeError: unsupported operand type(s) for /: 'str' and 'str'
# 生成热力图

import matplotlib.pyplot as plt

def heatmap(df, cmap=plt.cm.gray_r):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    axim = ax.imshow(df.values, cmap=cmap, interpolation='nearest')
    ax.set_xlabel(df.columns.name)
    ax.set_xticks(np.arange(len(df.columns)))
    ax.set_xticklabels(list(df.columns))
    ax.set_ylabel(df.index.name)
    ax.set_yticks(np.arange(len(df.index)))
    ax.set_yticklabels(list(df.index))
    plt.colorbar(axim)
heatmap(ddf)    
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-13-9a253bc5a17b> in <module>()
     14     ax.set_yticklabels(list(df.index))
     15     plt.colorbar(axim)
---> 16 heatmap(ddf)

NameError: name 'ddf' is not defined

期货合约转仓

pd.options.display.max_rows = 10
import pandas_datareader.data as web
px = web.DataReader('SPY', 'yahoo')['Adj Close']
#px = web.get_data_yahoo('')['Adj Close'] * 10
px
---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connection.py in _new_conn(self)
    137             conn = connection.create_connection(
--> 138                 (self.host, self.port), self.timeout, **extra_kw)
    139 

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\util\connection.py in create_connection(address, timeout, source_address, socket_options)
     74 
---> 75     for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
     76         af, socktype, proto, canonname, sa = res

C:\ProgramData\Anaconda3\lib\socket.py in getaddrinfo(host, port, family, type, proto, flags)
    742     addrlist = []
--> 743     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    744         af, socktype, proto, canonname, sa = res

gaierror: [Errno 11004] getaddrinfo failed

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, **response_kw)
    593                                                   body=body, headers=headers,
--> 594                                                   chunked=chunked)
    595 

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    360         else:
--> 361             conn.request(method, url, **httplib_request_kw)
    362 

C:\ProgramData\Anaconda3\lib\http\client.py in request(self, method, url, body, headers, encode_chunked)
   1238         """Send a complete request to the server."""
-> 1239         self._send_request(method, url, body, headers, encode_chunked)
   1240 

C:\ProgramData\Anaconda3\lib\http\client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1284             body = _encode(body, 'body')
-> 1285         self.endheaders(body, encode_chunked=encode_chunked)
   1286 

C:\ProgramData\Anaconda3\lib\http\client.py in endheaders(self, message_body, encode_chunked)
   1233             raise CannotSendHeader()
-> 1234         self._send_output(message_body, encode_chunked=encode_chunked)
   1235 

C:\ProgramData\Anaconda3\lib\http\client.py in _send_output(self, message_body, encode_chunked)
   1025         del self._buffer[:]
-> 1026         self.send(msg)
   1027 

C:\ProgramData\Anaconda3\lib\http\client.py in send(self, data)
    963             if self.auto_open:
--> 964                 self.connect()
    965             else:

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connection.py in connect(self)
    162     def connect(self):
--> 163         conn = self._new_conn()
    164         self._prepare_conn(conn)

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connection.py in _new_conn(self)
    146             raise NewConnectionError(
--> 147                 self, "Failed to establish a new connection: %s" % e)
    148 

NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x000000000B5E5550>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    422                     retries=self.max_retries,
--> 423                     timeout=timeout
    424                 )

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, **response_kw)
    642             retries = retries.increment(method, url, error=e, _pool=self,
--> 643                                         _stacktrace=sys.exc_info()[2])
    644             retries.sleep()

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    362         if new_retry.is_exhausted():
--> 363             raise MaxRetryError(_pool, url, error or ResponseError(cause))
    364 

MaxRetryError: HTTPConnectionPool(host='ichart.finance.yahoo.com', port=80): Max retries exceeded with url: /table.csv?s=SPY&a=0&b=1&c=2010&d=6&e=24&f=2017&g=d&ignore=.csv (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x000000000B5E5550>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
<ipython-input-21-4d516e3238de> in <module>()
      1 import pandas_datareader.data as web
----> 2 px = web.DataReader('SPY', 'yahoo')['Adj Close']
      3 #px = web.get_data_yahoo('')['Adj Close'] * 10
      4 px

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\data.py in DataReader(name, data_source, start, end, retry_count, pause, session, access_key)
    115                                 adjust_price=False, chunksize=25,
    116                                 retry_count=retry_count, pause=pause,
--> 117                                 session=session).read()
    118 
    119     elif data_source == "yahoo-actions":

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\yahoo\daily.py in read(self)
     75     def read(self):
     76         """ read one data from specified URL """
---> 77         df = super(YahooDailyReader, self).read()
     78         if self.ret_index:
     79             df['Ret_Index'] = _calc_return_index(df['Adj Close'])

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py in read(self)
    155         if isinstance(self.symbols, (compat.string_types, int)):
    156             df = self._read_one_data(self.url,
--> 157                                      params=self._get_params(self.symbols))
    158         # Or multiple symbols, (e.g., ['GOOG', 'AAPL', 'MSFT'])
    159         elif isinstance(self.symbols, DataFrame):

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py in _read_one_data(self, url, params)
     72         """ read one data from specified URL """
     73         if self._format == 'string':
---> 74             out = self._read_url_as_StringIO(url, params=params)
     75         elif self._format == 'json':
     76             out = self._get_response(url, params=params).json()

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py in _read_url_as_StringIO(self, url, params)
     83         Open url (and retry)
     84         """
---> 85         response = self._get_response(url, params=params)
     86         text = self._sanitize_response(response)
     87         out = StringIO()

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py in _get_response(self, url, params)
    112         # initial attempt + retry
    113         for i in range(self.retry_count + 1):
--> 114             response = self.session.get(url, params=params)
    115             if response.status_code == requests.codes.ok:
    116                 return response

C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py in get(self, url, **kwargs)
    499 
    500         kwargs.setdefault('allow_redirects', True)
--> 501         return self.request('GET', url, **kwargs)
    502 
    503     def options(self, url, **kwargs):

C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    486         }
    487         send_kwargs.update(settings)
--> 488         resp = self.send(prep, **send_kwargs)
    489 
    490         return resp

C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py in send(self, request, **kwargs)
    607 
    608         # Send the request
--> 609         r = adapter.send(request, **kwargs)
    610 
    611         # Total elapsed time of the request (approximately)

C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    485                 raise ProxyError(e, request=request)
    486 
--> 487             raise ConnectionError(e, request=request)
    488 
    489         except ClosedPoolError as e:

ConnectionError: HTTPConnectionPool(host='ichart.finance.yahoo.com', port=80): Max retries exceeded with url: /table.csv?s=SPY&a=0&b=1&c=2010&d=6&e=24&f=2017&g=d&ignore=.csv (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x000000000B5E5550>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
# 放入两份标普500指数合约及其到期日
from datetime import datetime
expiry = {'ESU2': datetime(2012, 9, 21),
          'ESZ2': datetime(2012, 12, 21)}
expiry = Series(expiry).order()
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\__main__.py:5: FutureWarning: order is deprecated, use sort_values(...)
expiry
Out[23]:
ESU2   2012-09-21
ESZ2   2012-12-21
dtype: datetime64[ns]
# 用 雅虎的价格及随机漫步噪声模拟未来走势
np.random.seed(12347)
N = 200
walk = (np.random.randint(0, 200, size=N) - 100) * 0.25
perturb = (np.random.randint(0, 20, size=N) - 10) * 0.25
walk = walk.cumsum()

rng = pd.date_range(px.index[0], periods=len(px) + N, freq='B')
near = np.concatenate([px.values, px.values[-1] + walk])
far = np.concatenate([px.values, px.values[-1] + walk + perturb])
prices = DataFrame({'ESU2': near, 'ESZ2': far}, index=rng)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-0a4a98036b40> in <module>()
      7 
      8 rng = pd.date_range(px.index[0], periods=len(px) + N, freq='B')
----> 9 near = np.concatenate([px.values, px.values[-1] + walk])
     10 far = np.concatenate([px.values, px.values[-1] + walk + perturb])
     11 prices = DataFrame({'ESU2': near, 'ESZ2': far}, index=rng)

ValueError: operands could not be broadcast together with shapes (10,) (200,) 
# prices: 关于这两个合约的时间序列
prices.tail()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-26-84d92c9aa1f0> in <module>()
      1 # prices: 关于这两个合约的时间序列
----> 2 prices.tail()

NameError: name 'prices' is not defined
# 将多个时间序列合并为单个连续序列。方法:构造一个加权矩阵。
# 其中活动合约的权重为1, 直到期满为止。
# 下面的函数计算加权矩阵,权重根据到期前的期数减少而线性衰减

def get_roll_weights(start, expiry, items, roll_periods=5):
    # start : first date to compute weighting DataFrame
    # expiry : Series of ticker -> expiration dates
    # items : sequence of contract names

    dates = pd.date_range(start, expiry[-1], freq='B')
    weights = DataFrame(np.zeros((len(dates), len(items))),
                        index=dates, columns=items)

    prev_date = weights.index[0]
    for i, (item, ex_date) in enumerate(expiry.iteritems()):
        if i < len(expiry) - 1:
            weights.ix[prev_date:ex_date - pd.offsets.BDay(), item] = 1
            roll_rng = pd.date_range(end=ex_date - pd.offsets.BDay(),
                                     periods=roll_periods + 1, freq='B')

            decay_weights = np.linspace(0, 1, roll_periods + 1)
            weights.ix[roll_rng, item] = 1 - decay_weights
            weights.ix[roll_rng, expiry.index[i + 1]] = decay_weights
        else:
            weights.ix[prev_date:, item] = 1

        prev_date = ex_date

    return weights
# 权重计算结果
weights = get_roll_weights('6/1/2012', expiry, prices.columns)
weights.ix['2012-09-12':'2012-09-21']
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-28-2841b6423261> in <module>()
      1 # 权重计算结果
----> 2 weights = get_roll_weights('6/1/2012', expiry, prices.columns)
      3 weights.ix['2012-09-12':'2012-09-21']

NameError: name 'prices' is not defined
# 转仓收益就是合约收益的加权和
rolled_returns = (prices.pct_change() * weights).sum(1)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-29-f255425d54ca> in <module>()
      1 # 转仓收益就是合约收益的加权和
----> 2 rolled_returns = (prices.pct_change() * weights).sum(1)

NameError: name 'prices' is not defined

移动相关系数与线性回归

动态模型,可用于模拟历史时期中的交易决策。

移动窗口和指数加权时间序列函数是用于处理动态模型的工具。

# 相关系数是观察两个资产时间序列变化的协动性的一种手段。
# pandas 的 rolling_corr函数可以根据两个收益序列,计算出移动窗口的相关系数

# 加载价格序列并计算每日收益率
aapl = web.get_data_yahoo('AAPL', '2000-01-01')['Adj Close']
msft = web.get_data_yahoo('MSFT', '2000-01-01')['Adj Close']

aapl_rets = aapl.pct_change()
msft_rets = msft.pct_change()
---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connection.py in _new_conn(self)
    137             conn = connection.create_connection(
--> 138                 (self.host, self.port), self.timeout, **extra_kw)
    139 

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\util\connection.py in create_connection(address, timeout, source_address, socket_options)
     74 
---> 75     for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
     76         af, socktype, proto, canonname, sa = res

C:\ProgramData\Anaconda3\lib\socket.py in getaddrinfo(host, port, family, type, proto, flags)
    742     addrlist = []
--> 743     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    744         af, socktype, proto, canonname, sa = res

gaierror: [Errno 11004] getaddrinfo failed

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, **response_kw)
    593                                                   body=body, headers=headers,
--> 594                                                   chunked=chunked)
    595 

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    360         else:
--> 361             conn.request(method, url, **httplib_request_kw)
    362 

C:\ProgramData\Anaconda3\lib\http\client.py in request(self, method, url, body, headers, encode_chunked)
   1238         """Send a complete request to the server."""
-> 1239         self._send_request(method, url, body, headers, encode_chunked)
   1240 

C:\ProgramData\Anaconda3\lib\http\client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1284             body = _encode(body, 'body')
-> 1285         self.endheaders(body, encode_chunked=encode_chunked)
   1286 

C:\ProgramData\Anaconda3\lib\http\client.py in endheaders(self, message_body, encode_chunked)
   1233             raise CannotSendHeader()
-> 1234         self._send_output(message_body, encode_chunked=encode_chunked)
   1235 

C:\ProgramData\Anaconda3\lib\http\client.py in _send_output(self, message_body, encode_chunked)
   1025         del self._buffer[:]
-> 1026         self.send(msg)
   1027 

C:\ProgramData\Anaconda3\lib\http\client.py in send(self, data)
    963             if self.auto_open:
--> 964                 self.connect()
    965             else:

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connection.py in connect(self)
    162     def connect(self):
--> 163         conn = self._new_conn()
    164         self._prepare_conn(conn)

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connection.py in _new_conn(self)
    146             raise NewConnectionError(
--> 147                 self, "Failed to establish a new connection: %s" % e)
    148 

NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x000000000B5A6E80>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    422                     retries=self.max_retries,
--> 423                     timeout=timeout
    424                 )

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, **response_kw)
    642             retries = retries.increment(method, url, error=e, _pool=self,
--> 643                                         _stacktrace=sys.exc_info()[2])
    644             retries.sleep()

C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    362         if new_retry.is_exhausted():
--> 363             raise MaxRetryError(_pool, url, error or ResponseError(cause))
    364 

MaxRetryError: HTTPConnectionPool(host='ichart.finance.yahoo.com', port=80): Max retries exceeded with url: /table.csv?s=AAPL&a=0&b=1&c=2000&d=6&e=24&f=2017&g=d&ignore=.csv (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x000000000B5A6E80>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
<ipython-input-30-4db3157b9e89> in <module>()
      3 
      4 # 加载价格序列并计算每日收益率
----> 5 aapl = web.get_data_yahoo('AAPL', '2000-01-01')['Adj Close']
      6 msft = web.get_data_yahoo('MSFT', '2000-01-01')['Adj Close']
      7 

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\data.py in get_data_yahoo(*args, **kwargs)
     38 
     39 def get_data_yahoo(*args, **kwargs):
---> 40     return YahooDailyReader(*args, **kwargs).read()
     41 
     42 

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\yahoo\daily.py in read(self)
     75     def read(self):
     76         """ read one data from specified URL """
---> 77         df = super(YahooDailyReader, self).read()
     78         if self.ret_index:
     79             df['Ret_Index'] = _calc_return_index(df['Adj Close'])

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py in read(self)
    155         if isinstance(self.symbols, (compat.string_types, int)):
    156             df = self._read_one_data(self.url,
--> 157                                      params=self._get_params(self.symbols))
    158         # Or multiple symbols, (e.g., ['GOOG', 'AAPL', 'MSFT'])
    159         elif isinstance(self.symbols, DataFrame):

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py in _read_one_data(self, url, params)
     72         """ read one data from specified URL """
     73         if self._format == 'string':
---> 74             out = self._read_url_as_StringIO(url, params=params)
     75         elif self._format == 'json':
     76             out = self._get_response(url, params=params).json()

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py in _read_url_as_StringIO(self, url, params)
     83         Open url (and retry)
     84         """
---> 85         response = self._get_response(url, params=params)
     86         text = self._sanitize_response(response)
     87         out = StringIO()

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py in _get_response(self, url, params)
    112         # initial attempt + retry
    113         for i in range(self.retry_count + 1):
--> 114             response = self.session.get(url, params=params)
    115             if response.status_code == requests.codes.ok:
    116                 return response

C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py in get(self, url, **kwargs)
    499 
    500         kwargs.setdefault('allow_redirects', True)
--> 501         return self.request('GET', url, **kwargs)
    502 
    503     def options(self, url, **kwargs):

C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    486         }
    487         send_kwargs.update(settings)
--> 488         resp = self.send(prep, **send_kwargs)
    489 
    490         return resp

C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py in send(self, request, **kwargs)
    607 
    608         # Send the request
--> 609         r = adapter.send(request, **kwargs)
    610 
    611         # Total elapsed time of the request (approximately)

C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    485                 raise ProxyError(e, request=request)
    486 
--> 487             raise ConnectionError(e, request=request)
    488 
    489         except ClosedPoolError as e:

ConnectionError: HTTPConnectionPool(host='ichart.finance.yahoo.com', port=80): Max retries exceeded with url: /table.csv?s=AAPL&a=0&b=1&c=2000&d=6&e=24&f=2017&g=d&ignore=.csv (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x000000000B5A6E80>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
# 计算一年期移动相关系数并绘制图表
pd.rolling_corr(aapl_rets, msft_rets, 250).plot()
plt.figure()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-31-0853087e59d3> in <module>()
      1 # 计算一年期移动相关系数并绘制图表
----> 2 pd.rolling_corr(aapl_rets, msft_rets, 250).plot()
      3 plt.figure()

NameError: name 'aapl_rets' is not defined
# 两个资产直接的相关系数不能捕获波动性差异。
# 最小二乘回归,提供了一个变量与一个或多个其他预测变量之间动态关系的建模办法
model = pd.ols(y=aapl_rets, x={'MSFT': msft_rets}, window=250)
model.beta
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-32-d9ef4ad8be80> in <module>()
      1 # 两个资产直接的相关系数不能捕获波动性差异。
      2 # 最小二乘回归,提供了一个变量与一个或多个其他预测变量之间动态关系的建模办法
----> 3 model = pd.ols(y=aapl_rets, x={'MSFT': msft_rets}, window=250)
      4 model.beta

NameError: name 'aapl_rets' is not defined
model.beta['MSFT'].plot()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-33-3518a6dfbdb1> in <module>()
----> 1 model.beta['MSFT'].plot()

NameError: name 'model' is not defined