Time Series / Date functionality

原文:http://pandas.pydata.org/pandas-docs/stable/timeseries.html

译者:飞龙 UsyiyiCN

校对:(虚位以待)

pandas已经证明非常成功地作为用于处理时间序列数据的工具,特别是在金融数据分析空间中。使用NumPy datetime64timedelta64 dtypes,我们整合了大量来自其他Python库的特性,例如scikits.timeseries大量的用于操作时间序列数据的新功能。

在处理时间序列数据时,我们将经常寻求:

pandas提供用于执行上述任务的相对紧凑和自足的工具集合。

创建日期范围:

# 72 hours starting with midnight Jan 1st, 2011
In [1]: rng = pd.date_range('1/1/2011', periods=72, freq='H')

In [2]: rng[:5]
Out[2]: 
DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',
               '2011-01-01 02:00:00', '2011-01-01 03:00:00',
               '2011-01-01 04:00:00'],
              dtype='datetime64[ns]', freq='H')

使用日期索引pandas对象:

In [3]: ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [4]: ts.head()
Out[4]: 
2011-01-01 00:00:00    0.469112
2011-01-01 01:00:00   -0.282863
2011-01-01 02:00:00   -1.509059
2011-01-01 03:00:00   -1.135632
2011-01-01 04:00:00    1.212112
Freq: H, dtype: float64

更改频率和填充间隙:

# to 45 minute frequency and forward fill
In [5]: converted = ts.asfreq('45Min', method='pad')

In [6]: converted.head()
Out[6]: 
2011-01-01 00:00:00    0.469112
2011-01-01 00:45:00    0.469112
2011-01-01 01:30:00   -0.282863
2011-01-01 02:15:00   -1.509059
2011-01-01 03:00:00   -1.135632
Freq: 45T, dtype: float64

重新取样:

# Daily means
In [7]: ts.resample('D').mean()
Out[7]: 
2011-01-01   -0.319569
2011-01-02   -0.337703
2011-01-03    0.117258
Freq: D, dtype: float64

Overview

下表显示了pandas可以处理的时间相关类的类型以及如何创建它们。

备注 如何创建
Timestamp 时刻数据 to_datetimeTimestamp
DatetimeIndex Timestamp的索引 to_datetimedate_rangeDatetimeIndex
Period 时期数据 Period
PeriodIndex Period的索引 period_rangePeriodIndex

Time Stamps vs. Time Spans

时间戳数据是将值与时间点相关联的最基本类型的时间序列数据。对于熊猫对象,它意味着使用时间点。

In [8]: pd.Timestamp(datetime(2012, 5, 1))
Out[8]: Timestamp('2012-05-01 00:00:00')

In [9]: pd.Timestamp('2012-05-01')
Out[9]: Timestamp('2012-05-01 00:00:00')

In [10]: pd.Timestamp(2012, 5, 1)
Out[10]: Timestamp('2012-05-01 00:00:00')

但是,在许多情况下,更像是将变化变量与时间范围相关联。Period表示的范围可以明确指定,也可以从datetime字符串格式推断。

例如:

In [11]: pd.Period('2011-01')
Out[11]: Period('2011-01', 'M')

In [12]: pd.Period('2012-05', freq='D')
Out[12]: Period('2012-05-01', 'D')

TimestampPeriod可以是索引。TimestampPeriod的列表将分别自动强制为DatetimeIndexPeriodIndex

In [13]: dates = [pd.Timestamp('2012-05-01'), pd.Timestamp('2012-05-02'), pd.Timestamp('2012-05-03')]

In [14]: ts = pd.Series(np.random.randn(3), dates)

In [15]: type(ts.index)
Out[15]: pandas.tseries.index.DatetimeIndex

In [16]: ts.index
Out[16]: DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None)

In [17]: ts
Out[17]: 
2012-05-01   -0.410001
2012-05-02   -0.078638
2012-05-03    0.545952
dtype: float64

In [18]: periods = [pd.Period('2012-01'), pd.Period('2012-02'), pd.Period('2012-03')]

In [19]: ts = pd.Series(np.random.randn(3), periods)

In [20]: type(ts.index)
Out[20]: pandas.tseries.period.PeriodIndex

In [21]: ts.index
Out[21]: PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]', freq='M')

In [22]: ts
Out[22]: 
2012-01   -1.219217
2012-02   -1.226825
2012-03    0.769804
Freq: M, dtype: float64

pandas允许你捕获表示和它们之间的转换。在内部,pandas表示使用Timestamp的实例的时间戳和使用DatetimeIndex的实例的时间戳序列。对于常规时间范围,pandas对于标量值使用Period对象,对于跨度序列使用PeriodIndex更好地支持具有任意起点和终点的不规则间隔将在未来版本中出现。

Converting to Timestamps

要转换类似日期的对象的系列或列表对象,例如字符串,历元或混合,您可以使用to_datetime函数。当传递一个Series时,它返回一个Series(具有相同的索引),而类似列表被转换为DatetimeIndex:

In [23]: pd.to_datetime(pd.Series(['Jul 31, 2009', '2010-01-10', None]))
Out[23]: 
0   2009-07-31
1   2010-01-10
2          NaT
dtype: datetime64[ns]

In [24]: pd.to_datetime(['2005/11/23', '2010.12.31'])
Out[24]: DatetimeIndex(['2005-11-23', '2010-12-31'], dtype='datetime64[ns]', freq=None)

如果您使用以日期开头的日期(例如欧式样式),则可以传递dayfirst旗标:

In [25]: pd.to_datetime(['04-01-2012 10:00'], dayfirst=True)
Out[25]: DatetimeIndex(['2012-01-04 10:00:00'], dtype='datetime64[ns]', freq=None)

In [26]: pd.to_datetime(['14-01-2012', '01-14-2012'], dayfirst=True)
Out[26]: DatetimeIndex(['2012-01-14', '2012-01-14'], dtype='datetime64[ns]', freq=None)

警告

您在上面的示例中看到dayfirst不是严格的,因此如果无法解析日期,那么将解析为dayfirst为False 。

注意

指定format参数可能会显着加快转换速度,而在稍后的版本中,0.13.0明确指定格式字符串'%Y%m%d'需要更快的路径。

如果将单个字符串传递到to_datetime,则它将返回单个Timestamp此外,Timestamp可以接受字符串输入。请注意,Timestamp不接受像dayfirstformat的字符串解析选项,如果需要,请使用to_datetime

In [27]: pd.to_datetime('2010/11/12')
Out[27]: Timestamp('2010-11-12 00:00:00')

In [28]: pd.Timestamp('2010/11/12')
Out[28]: Timestamp('2010-11-12 00:00:00')

版本0.18.1中的新功能。

您还可以传递整数或字符串列的DataFrame,以组合到TimestampsSeries中。

In [29]: df = pd.DataFrame({'year': [2015, 2016],
   ....:                    'month': [2, 3],
   ....:                    'day': [4, 5],
   ....:                    'hour': [2, 3]})
   ....: 

In [30]: pd.to_datetime(df)
Out[30]: 
0   2015-02-04 02:00:00
1   2016-03-05 03:00:00
dtype: datetime64[ns]

您只能传递需要汇编的列。

In [31]: pd.to_datetime(df[['year', 'month', 'day']])
Out[31]: 
0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

pd.to_datetime在列名称中查找datetime组件的标准名称,包括:

Invalid Data

注意

在版本0.17.0中,to_datetime的默认值现在为errors='raise',而不是errors='ignore'这意味着无效的解析将提升,而返回原始输入,如在以前的版本。

通过errors='coerce'将无效数据转换为NaT(不是时间):

当不可解析时报错,这是默认值

In [2]: pd.to_datetime(['2009/07/31', 'asd'], errors='raise')
ValueError: Unknown string format

不可解析时返回原始输入

In [4]: pd.to_datetime(['2009/07/31', 'asd'], errors='ignore')
Out[4]: array(['2009/07/31', 'asd'], dtype=object)

不可扩展时返回NaT输入

In [6]: pd.to_datetime(['2009/07/31', 'asd'], errors='coerce')
Out[6]: DatetimeIndex(['2009-07-31', 'NaT'], dtype='datetime64[ns]', freq=None)

Epoch Timestamps

它也可以转换整数或浮点时代。这些的默认单位是纳秒(因为这些是如何存储Timestamp的)。然而,经常将时期存储在另一个可以指定的unit中:

典型的时期存储单位

In [32]: pd.to_datetime([1349720105, 1349806505, 1349892905,
   ....:                 1349979305, 1350065705], unit='s')
   ....: 
Out[32]: 
DatetimeIndex(['2012-10-08 18:15:05', '2012-10-09 18:15:05',
               '2012-10-10 18:15:05', '2012-10-11 18:15:05',
               '2012-10-12 18:15:05'],
              dtype='datetime64[ns]', freq=None)

In [33]: pd.to_datetime([1349720105100, 1349720105200, 1349720105300,
   ....:                 1349720105400, 1349720105500 ], unit='ms')
   ....: 
Out[33]: 
DatetimeIndex(['2012-10-08 18:15:05.100000', '2012-10-08 18:15:05.200000',
               '2012-10-08 18:15:05.300000', '2012-10-08 18:15:05.400000',
               '2012-10-08 18:15:05.500000'],
              dtype='datetime64[ns]', freq=None)

这些工作,但结果可能是意外的。

In [34]: pd.to_datetime([1])
Out[34]: DatetimeIndex(['1970-01-01 00:00:00.000000001'], dtype='datetime64[ns]', freq=None)

In [35]: pd.to_datetime([1, 3.14], unit='s')
Out[35]: DatetimeIndex(['1970-01-01 00:00:01', '1970-01-01 00:00:03.140000'], dtype='datetime64[ns]', freq=None)

注意

大纪元倍数将舍入到最接近的纳秒。

Generating Ranges of Timestamps

要生成带有时间戳的索引,可以使用DatetimeIndex或Index构造函数,并传入datetime对象列表:

In [36]: dates = [datetime(2012, 5, 1), datetime(2012, 5, 2), datetime(2012, 5, 3)]

# Note the frequency information
In [37]: index = pd.DatetimeIndex(dates)

In [38]: index
Out[38]: DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None)

# Automatically converted to DatetimeIndex
In [39]: index = pd.Index(dates)

In [40]: index
Out[40]: DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None)

实际上,这变得非常麻烦,因为我们经常需要具有大量时间戳的非常长的索引。如果我们需要在正常频率上的时间戳,我们可以使用pandas函数date_rangebdate_range来创建时间戳索引。

In [41]: index = pd.date_range('2000-1-1', periods=1000, freq='M')

In [42]: index
Out[42]: 
DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-30', '2000-10-31',
               ...
               '2082-07-31', '2082-08-31', '2082-09-30', '2082-10-31',
               '2082-11-30', '2082-12-31', '2083-01-31', '2083-02-28',
               '2083-03-31', '2083-04-30'],
              dtype='datetime64[ns]', length=1000, freq='M')

In [43]: index = pd.bdate_range('2012-1-1', periods=250)

In [44]: index
Out[44]: 
DatetimeIndex(['2012-01-02', '2012-01-03', '2012-01-04', '2012-01-05',
               '2012-01-06', '2012-01-09', '2012-01-10', '2012-01-11',
               '2012-01-12', '2012-01-13',
               ...
               '2012-12-03', '2012-12-04', '2012-12-05', '2012-12-06',
               '2012-12-07', '2012-12-10', '2012-12-11', '2012-12-12',
               '2012-12-13', '2012-12-14'],
              dtype='datetime64[ns]', length=250, freq='B')

类似date_rangebdate_range的便利功能使用各种频率别名。date_range的默认频率为日历日,而bdate_range的默认频率为工作日

In [45]: start = datetime(2011, 1, 1)

In [46]: end = datetime(2012, 1, 1)

In [47]: rng = pd.date_range(start, end)

In [48]: rng
Out[48]: 
DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
               '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
               '2011-01-09', '2011-01-10',
               ...
               '2011-12-23', '2011-12-24', '2011-12-25', '2011-12-26',
               '2011-12-27', '2011-12-28', '2011-12-29', '2011-12-30',
               '2011-12-31', '2012-01-01'],
              dtype='datetime64[ns]', length=366, freq='D')

In [49]: rng = pd.bdate_range(start, end)

In [50]: rng
Out[50]: 
DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
               '2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14',
               ...
               '2011-12-19', '2011-12-20', '2011-12-21', '2011-12-22',
               '2011-12-23', '2011-12-26', '2011-12-27', '2011-12-28',
               '2011-12-29', '2011-12-30'],
              dtype='datetime64[ns]', length=260, freq='B')

date_rangebdate_range可以使用startend等参数的各种组合轻松生成日期范围, periodsfreq

In [51]: pd.date_range(start, end, freq='BM')
Out[51]: 
DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29',
               '2011-05-31', '2011-06-30', '2011-07-29', '2011-08-31',
               '2011-09-30', '2011-10-31', '2011-11-30', '2011-12-30'],
              dtype='datetime64[ns]', freq='BM')

In [52]: pd.date_range(start, end, freq='W')
Out[52]: 
DatetimeIndex(['2011-01-02', '2011-01-09', '2011-01-16', '2011-01-23',
               '2011-01-30', '2011-02-06', '2011-02-13', '2011-02-20',
               '2011-02-27', '2011-03-06', '2011-03-13', '2011-03-20',
               '2011-03-27', '2011-04-03', '2011-04-10', '2011-04-17',
               '2011-04-24', '2011-05-01', '2011-05-08', '2011-05-15',
               '2011-05-22', '2011-05-29', '2011-06-05', '2011-06-12',
               '2011-06-19', '2011-06-26', '2011-07-03', '2011-07-10',
               '2011-07-17', '2011-07-24', '2011-07-31', '2011-08-07',
               '2011-08-14', '2011-08-21', '2011-08-28', '2011-09-04',
               '2011-09-11', '2011-09-18', '2011-09-25', '2011-10-02',
               '2011-10-09', '2011-10-16', '2011-10-23', '2011-10-30',
               '2011-11-06', '2011-11-13', '2011-11-20', '2011-11-27',
               '2011-12-04', '2011-12-11', '2011-12-18', '2011-12-25',
               '2012-01-01'],
              dtype='datetime64[ns]', freq='W-SUN')

In [53]: pd.bdate_range(end=end, periods=20)
Out[53]: 
DatetimeIndex(['2011-12-05', '2011-12-06', '2011-12-07', '2011-12-08',
               '2011-12-09', '2011-12-12', '2011-12-13', '2011-12-14',
               '2011-12-15', '2011-12-16', '2011-12-19', '2011-12-20',
               '2011-12-21', '2011-12-22', '2011-12-23', '2011-12-26',
               '2011-12-27', '2011-12-28', '2011-12-29', '2011-12-30'],
              dtype='datetime64[ns]', freq='B')

In [54]: pd.bdate_range(start=start, periods=20)
Out[54]: 
DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
               '2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14', '2011-01-17', '2011-01-18',
               '2011-01-19', '2011-01-20', '2011-01-21', '2011-01-24',
               '2011-01-25', '2011-01-26', '2011-01-27', '2011-01-28'],
              dtype='datetime64[ns]', freq='B')

开始和结束日期严格包含。因此,如果指定,则不会生成这些日期之外的任何日期。

Timestamp limitations

由于pandas表示纳秒分辨率的时间戳,可以使用64位整数表示的时间间隔限制为大约584年:

In [55]: pd.Timestamp.min
Out[55]: Timestamp('1677-09-21 00:12:43.145225')

In [56]: pd.Timestamp.max
Out[56]: Timestamp('2262-04-11 23:47:16.854775807')

有关如何在这些边界之外表示数据的方法,请参见here

DatetimeIndex

DatetimeIndex的主要用途之一是作为pandas对象的索引。DatetimeIndex类包含许多与时间相关的优化:

  • 对于各种偏移量的大范围的日期被预先计算并缓存在引擎盖下,以便使得生成随后的日期范围非常快(只需要抓住切片)
  • 使用pandas对象上的shifttshift方法快速移位
  • 具有相同频率的重叠DatetimeIndex对象的联合非常快(对于快速数据对齐很重要)
  • 通过yearmonth等属性快速访问日期字段
  • 正则化函数如snap和非常快的asof逻辑

DatetimeIndex对象具有常规索引对象的所有基本功能和一个高级时间序列特定方法的smorgasbord,以便于频率处理。

也可以看看

Reindexing methods

注意

虽然pandas不强迫你有一个排序的日期索引,但如果日期未排序,这些方法中的一些可能具有意外或不正确的行为。所以请小心。

DatetimeIndex可以像常规索引一样使用,并提供其所有智能功能,如选择,切片等。

In [57]: rng = pd.date_range(start, end, freq='BM')

In [58]: ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [59]: ts.index
Out[59]: 
DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29',
               '2011-05-31', '2011-06-30', '2011-07-29', '2011-08-31',
               '2011-09-30', '2011-10-31', '2011-11-30', '2011-12-30'],
              dtype='datetime64[ns]', freq='BM')

In [60]: ts[:5].index
Out[60]: 
DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29',
               '2011-05-31'],
              dtype='datetime64[ns]', freq='BM')

In [61]: ts[::2].index
Out[61]: 
DatetimeIndex(['2011-01-31', '2011-03-31', '2011-05-31', '2011-07-29',
               '2011-09-30', '2011-11-30'],
              dtype='datetime64[ns]', freq='2BM')

DatetimeIndex Partial String Indexing

您可以传递解析为日期的日期和字符串作为索引参数:

In [62]: ts['1/31/2011']
Out[62]: -1.2812473076599531

In [63]: ts[datetime(2011, 12, 25):]
Out[63]: 
2011-12-30    0.687738
Freq: BM, dtype: float64

In [64]: ts['10/31/2011':'12/31/2011']
Out[64]: 
2011-10-31    0.149748
2011-11-30   -0.732339
2011-12-30    0.687738
Freq: BM, dtype: float64

为了方便访问较长的时间序列,您还可以将年或年和月作为字符串传入:

In [65]: ts['2011']
Out[65]: 
2011-01-31   -1.281247
2011-02-28   -0.727707
2011-03-31   -0.121306
2011-04-29   -0.097883
2011-05-31    0.695775
2011-06-30    0.341734
2011-07-29    0.959726
2011-08-31   -1.110336
2011-09-30   -0.619976
2011-10-31    0.149748
2011-11-30   -0.732339
2011-12-30    0.687738
Freq: BM, dtype: float64

In [66]: ts['2011-6']
Out[66]: 
2011-06-30    0.341734
Freq: BM, dtype: float64

这种类型的切片将在具有DateTimeIndex的DataFrame上工作。由于部分字符串选择是标签分割的一种形式,所以包括端点这将包括所包含日期的匹配时间。这里有一个例子:

In [67]: dft = pd.DataFrame(randn(100000,1),
   ....:                    columns=['A'],
   ....:                    index=pd.date_range('20130101',periods=100000,freq='T'))
   ....: 

In [68]: dft
Out[68]: 
                            A
2013-01-01 00:00:00  0.176444
2013-01-01 00:01:00  0.403310
2013-01-01 00:02:00 -0.154951
2013-01-01 00:03:00  0.301624
2013-01-01 00:04:00 -2.179861
2013-01-01 00:05:00 -1.369849
2013-01-01 00:06:00 -0.954208
...                       ...
2013-03-11 10:33:00 -0.293083
2013-03-11 10:34:00 -0.059881
2013-03-11 10:35:00  1.252450
2013-03-11 10:36:00  0.046611
2013-03-11 10:37:00  0.059478
2013-03-11 10:38:00 -0.286539
2013-03-11 10:39:00  0.841669

[100000 rows x 1 columns]

In [69]: dft['2013']
Out[69]: 
                            A
2013-01-01 00:00:00  0.176444
2013-01-01 00:01:00  0.403310
2013-01-01 00:02:00 -0.154951
2013-01-01 00:03:00  0.301624
2013-01-01 00:04:00 -2.179861
2013-01-01 00:05:00 -1.369849
2013-01-01 00:06:00 -0.954208
...                       ...
2013-03-11 10:33:00 -0.293083
2013-03-11 10:34:00 -0.059881
2013-03-11 10:35:00  1.252450
2013-03-11 10:36:00  0.046611
2013-03-11 10:37:00  0.059478
2013-03-11 10:38:00 -0.286539
2013-03-11 10:39:00  0.841669

[100000 rows x 1 columns]

这会从当月的第一个时间开始,包括该月的最后一个日期和时间

In [70]: dft['2013-1':'2013-2']
Out[70]: 
                            A
2013-01-01 00:00:00  0.176444
2013-01-01 00:01:00  0.403310
2013-01-01 00:02:00 -0.154951
2013-01-01 00:03:00  0.301624
2013-01-01 00:04:00 -2.179861
2013-01-01 00:05:00 -1.369849
2013-01-01 00:06:00 -0.954208
...                       ...
2013-02-28 23:53:00  0.103114
2013-02-28 23:54:00 -1.303422
2013-02-28 23:55:00  0.451943
2013-02-28 23:56:00  0.220534
2013-02-28 23:57:00 -1.624220
2013-02-28 23:58:00  0.093915
2013-02-28 23:59:00 -1.087454

[84960 rows x 1 columns]

这指定了包括最后一天的所有时间的停止时间

In [71]: dft['2013-1':'2013-2-28']
Out[71]: 
                            A
2013-01-01 00:00:00  0.176444
2013-01-01 00:01:00  0.403310
2013-01-01 00:02:00 -0.154951
2013-01-01 00:03:00  0.301624
2013-01-01 00:04:00 -2.179861
2013-01-01 00:05:00 -1.369849
2013-01-01 00:06:00 -0.954208
...                       ...
2013-02-28 23:53:00  0.103114
2013-02-28 23:54:00 -1.303422
2013-02-28 23:55:00  0.451943
2013-02-28 23:56:00  0.220534
2013-02-28 23:57:00 -1.624220
2013-02-28 23:58:00  0.093915
2013-02-28 23:59:00 -1.087454

[84960 rows x 1 columns]

这指定了精确停止时间(与上述不同)

In [72]: dft['2013-1':'2013-2-28 00:00:00']
Out[72]: 
                            A
2013-01-01 00:00:00  0.176444
2013-01-01 00:01:00  0.403310
2013-01-01 00:02:00 -0.154951
2013-01-01 00:03:00  0.301624
2013-01-01 00:04:00 -2.179861
2013-01-01 00:05:00 -1.369849
2013-01-01 00:06:00 -0.954208
...                       ...
2013-02-27 23:54:00  0.897051
2013-02-27 23:55:00 -0.309230
2013-02-27 23:56:00  1.944713
2013-02-27 23:57:00  0.369265
2013-02-27 23:58:00  0.053071
2013-02-27 23:59:00 -0.019734
2013-02-28 00:00:00  1.388189

[83521 rows x 1 columns]

我们停在所包含的端点,因为它是索引的一部分

In [73]: dft['2013-1-15':'2013-1-15 12:30:00']
Out[73]: 
                            A
2013-01-15 00:00:00  0.501288
2013-01-15 00:01:00 -0.605198
2013-01-15 00:02:00  0.215146
2013-01-15 00:03:00  0.924732
2013-01-15 00:04:00 -2.228519
2013-01-15 00:05:00  1.517331
2013-01-15 00:06:00 -1.188774
...                       ...
2013-01-15 12:24:00  1.358314
2013-01-15 12:25:00 -0.737727
2013-01-15 12:26:00  1.838323
2013-01-15 12:27:00 -0.774090
2013-01-15 12:28:00  0.622261
2013-01-15 12:29:00 -0.631649
2013-01-15 12:30:00  0.193284

[751 rows x 1 columns]

警告

以下选择将产生KeyError;否则这种选择方法将与pandas中的其他选择方法不一致(因为这不是切片,也不解析为一个)

dft['2013-1-15 12:30:00']

要选择单个行,请使用.loc

In [74]: dft.loc['2013-1-15 12:30:00']
Out[74]: 
A    0.193284
Name: 2013-01-15 12:30:00, dtype: float64

版本0.18.0中的新功能。

DatetimeIndex部分字符串索引也适用于具有MultiIndex的DataFrames。例如:

In [75]: dft2 = pd.DataFrame(np.random.randn(20, 1),
   ....:                     columns=['A'],
   ....:                     index=pd.MultiIndex.from_product([pd.date_range('20130101',
   ....:                                                                     periods=10,
   ....:                                                                     freq='12H'),
   ....:                                                      ['a', 'b']]))
   ....: 

In [76]: dft2
Out[76]: 
                              A
2013-01-01 00:00:00 a -0.659574
                    b  1.494522
2013-01-01 12:00:00 a -0.778425
                    b -0.253355
2013-01-02 00:00:00 a -2.816159
                    b -1.210929
2013-01-02 12:00:00 a  0.144669
...                         ...
2013-01-04 00:00:00 b -1.624463
2013-01-04 12:00:00 a  0.056912
                    b  0.149867
2013-01-05 00:00:00 a -1.256173
                    b  2.324544
2013-01-05 12:00:00 a -1.067396
                    b -0.660996

[20 rows x 1 columns]

In [77]: dft2.loc['2013-01-05']
Out[77]: 
                              A
2013-01-05 00:00:00 a -1.256173
                    b  2.324544
2013-01-05 12:00:00 a -1.067396
                    b -0.660996

In [78]: idx = pd.IndexSlice

In [79]: dft2 = dft2.swaplevel(0, 1).sort_index()

In [80]: dft2.loc[idx[:, '2013-01-05'], :]
Out[80]: 
                              A
a 2013-01-05 00:00:00 -1.256173
  2013-01-05 12:00:00 -1.067396
b 2013-01-05 00:00:00  2.324544
  2013-01-05 12:00:00 -0.660996

Datetime Indexing

使用部分字符串索引DateTimeIndex取决于句点的“准确性”,换句话说,间隔与索引的频率之间的具体程度。相反,使用datetime对象的索引是精确的,因为对象具有确切的含义。这些也遵循的语义,包括两个端点

这些datetime对象是特定的小时, 分钟,seconds没有明确指定(它们是0)。

In [81]: dft[datetime(2013, 1, 1):datetime(2013,2,28)]
Out[81]: 
                            A
2013-01-01 00:00:00  0.176444
2013-01-01 00:01:00  0.403310
2013-01-01 00:02:00 -0.154951
2013-01-01 00:03:00  0.301624
2013-01-01 00:04:00 -2.179861
2013-01-01 00:05:00 -1.369849
2013-01-01 00:06:00 -0.954208
...                       ...
2013-02-27 23:54:00  0.897051
2013-02-27 23:55:00 -0.309230
2013-02-27 23:56:00  1.944713
2013-02-27 23:57:00  0.369265
2013-02-27 23:58:00  0.053071
2013-02-27 23:59:00 -0.019734
2013-02-28 00:00:00  1.388189

[83521 rows x 1 columns]

没有默认值。

In [82]: dft[datetime(2013, 1, 1, 10, 12, 0):datetime(2013, 2, 28, 10, 12, 0)]
Out[82]: 
                            A
2013-01-01 10:12:00 -0.246733
2013-01-01 10:13:00 -1.429225
2013-01-01 10:14:00 -1.265339
2013-01-01 10:15:00  0.710986
2013-01-01 10:16:00 -0.818200
2013-01-01 10:17:00  0.543542
2013-01-01 10:18:00  1.577713
...                       ...
2013-02-28 10:06:00  0.311249
2013-02-28 10:07:00  2.366080
2013-02-28 10:08:00 -0.490372
2013-02-28 10:09:00  0.373340
2013-02-28 10:10:00  0.638442
2013-02-28 10:11:00  1.330135
2013-02-28 10:12:00 -0.945450

[83521 rows x 1 columns]

Truncating & Fancy Indexing

提供了等效于切片的truncate便利函数:

In [83]: ts.truncate(before='10/31/2011', after='12/31/2011')
Out[83]: 
2011-10-31    0.149748
2011-11-30   -0.732339
2011-12-30    0.687738
Freq: BM, dtype: float64

即使是打破DatetimeIndex的频率规律性的复杂花式索引也会导致DatetimeIndex(但频率丢失):

In [84]: ts[[0, 2, 6]].index
Out[84]: DatetimeIndex(['2011-01-31', '2011-03-31', '2011-07-29'], dtype='datetime64[ns]', freq=None)

Time/Date Components

有几个时间/日期属性可以从Timestamp或像DateTimeIndex的时间戳集合访问。

属性 描述
year datetime的年份
month datetime的月份
day datetime的日期
hour datetime的小时
minute datetime的分钟
second datetime的秒数
microsecond datetime的微秒
nanosecond datetime的纳秒
date 返回datetime.date(不包含时区信息)
time 返回datetime.time(不包含时区信息)
dayofyear 一年的序数
weekofyear 一年的周数
week 一年的周数
dayofweek 星期几的数字,星期一= 0,星期日= 6
weekday 星期几的数字,星期一= 0,星期日= 6
weekday_name 一周中的日期名称(例如:星期五)
quarter 季度:Jan = Mar = 1,Apr-Jun = 2等。
days_in_month 日期时间的月份中的天数
is_month_start 逻辑指示是否每月的第一天(由频率定义)
is_month_end 逻辑指示是否每月的最后一天(由频率定义)
is_quarter_start 逻辑指示季度的第一天(由频率定义)
is_quarter_end 逻辑指示季度的最后一天(由频率定义)
is_year_start 逻辑指示一年中的第一天(由频率定义)
is_year_end 逻辑指示一年中的最后一天(由频率定义)
is_leap_year 逻辑指示日期是否属于闰年

此外,如果您有一个具有datetimelike值的Series,那么您可以通过.dt访问器访问这些属性,请参阅docs

DateOffset objects

在前面的示例中,我们通过将frequency strings(如“M”,“W”和“BM”)传递到freq关键字,以各种频率创建DatetimeIndex对象。在这种情况下,这些频率字符串正被转换为pandas DateOffset的实例,其表示规则的频率增量。诸如“月”,“工作日”或“一小时”的特定偏移逻辑表示在其各种子类中。

班级名称 描述
DateOffset 通用偏移类,默认为1个日历日
BDay 工作日(工作日)
CDay 自定义工作日(实验)
Week 一周,任选地锚定在一周的一天
WeekOfMonth 每个月第y周的第x天
LastWeekOfMonth 每个月最后一周的第x天
MonthEnd 日历月结束
MonthBegin 日历月开始
BMonthEnd 业务月末
BMonthBegin 商业月开始
CBMonthEnd 定制业务月末
CBMonthBegin 定制商业月开始
SemiMonthEnd 15日(或其他day_of_month)和日历月结束
SemiMonthBegin 15日(或其他day_of_month)和日历月开始
QuarterEnd 日历季度末
QuarterBegin 日历季度开始
BQuarterEnd 业务季度末
BQuarterBegin 业务季开始
FY5253Quarter 零售(又名52-53周)季度
YearEnd 日历年结束
YearBegin 日历年开始
BYearEnd 业务年度结束
BYearBegin 商业年开始
FY5253 零售(又名52-53周)一年
BusinessHour 营业时间
CustomBusinessHour 定制营业时间
Hour 一小时
Minute 等一下
Second 一秒
Milli 一毫秒
Micro 一微秒
Nano 一纳秒

基本的DateOffset采用与dateutil.relativedelta相同的参数,其工作方式如下:

In [85]: d = datetime(2008, 8, 18, 9, 0)

In [86]: d + relativedelta(months=4, days=5)
Out[86]: datetime.datetime(2008, 12, 23, 9, 0)

我们可以用DateOffset做同样的事情:

In [87]: from pandas.tseries.offsets import *

In [88]: d + DateOffset(months=4, days=5)
Out[88]: Timestamp('2008-12-23 09:00:00')

DateOffset对象的主要功能有:

  • 可以向/从日期时间对象添加/减去它以获得移位的日期
  • 它可以乘以一个整数(正或负),使得增量将被应用多次
  • 它具有用于将日期向前或向后移动到下一个或上一个“偏移日期”的rollforwardrollback

DateOffset的子类定义了规定自定义日期增量逻辑的apply函数,例如添加营业日:

class BDay(DateOffset):
    """DateOffset increments between business days"""
    def apply(self, other):
        ...
In [89]: d - 5 * BDay()
Out[89]: Timestamp('2008-08-11 09:00:00')

In [90]: d + BMonthEnd()
Out[90]: Timestamp('2008-08-29 09:00:00')

rollforwardrollback方法完全符合您的预期:

In [91]: d
Out[91]: datetime.datetime(2008, 8, 18, 9, 0)

In [92]: offset = BMonthEnd()

In [93]: offset.rollforward(d)
Out[93]: Timestamp('2008-08-29 09:00:00')

In [94]: offset.rollback(d)
Out[94]: Timestamp('2008-07-31 09:00:00')

这绝对值得探索pandas.tseries.offsets模块和类的各种docstrings。

默认情况下,这些操作(applyrollforwardrollback)保留时间(小时,分钟等)信息。要重置时间,请在创建偏移实例时使用normalize=True关键字。如果normalize=True,则在应用函数后将结果归一化。

In [95]: day = Day()

In [96]: day.apply(pd.Timestamp('2014-01-01 09:00'))
Out[96]: Timestamp('2014-01-02 09:00:00')

In [97]: day = Day(normalize=True)

In [98]: day.apply(pd.Timestamp('2014-01-01 09:00'))
Out[98]: Timestamp('2014-01-02 00:00:00')

In [99]: hour = Hour()

In [100]: hour.apply(pd.Timestamp('2014-01-01 22:00'))
Out[100]: Timestamp('2014-01-01 23:00:00')

In [101]: hour = Hour(normalize=True)

In [102]: hour.apply(pd.Timestamp('2014-01-01 22:00'))
Out[102]: Timestamp('2014-01-01 00:00:00')

In [103]: hour.apply(pd.Timestamp('2014-01-01 23:00'))
Out[103]: Timestamp('2014-01-02 00:00:00')

Parametric offsets

一些偏移可以在被创建时被“参数化”以导致不同的行为。例如,用于生成周数据的Week偏移接受weekday参数,这导致生成的日期总是位于一周的特定日期:

In [104]: d
Out[104]: datetime.datetime(2008, 8, 18, 9, 0)

In [105]: d + Week()
Out[105]: Timestamp('2008-08-25 09:00:00')

In [106]: d + Week(weekday=4)
Out[106]: Timestamp('2008-08-22 09:00:00')

In [107]: (d + Week(weekday=4)).weekday()
Out[107]: 4

In [108]: d - Week()
Out[108]: Timestamp('2008-08-11 09:00:00')

normalize选项将有效用于加法和减法。

In [109]: d + Week(normalize=True)
Out[109]: Timestamp('2008-08-25 00:00:00')

In [110]: d - Week(normalize=True)
Out[110]: Timestamp('2008-08-11 00:00:00')

另一个例子是参数化具有特定结束月份的YearEnd

In [111]: d + YearEnd()
Out[111]: Timestamp('2008-12-31 09:00:00')

In [112]: d + YearEnd(month=6)
Out[112]: Timestamp('2009-06-30 09:00:00')

Using offsets with Series / DatetimeIndex

偏移可以与SeriesDatetimeIndex一起使用,以将偏移应用于每个元素。

In [113]: rng = pd.date_range('2012-01-01', '2012-01-03')

In [114]: s = pd.Series(rng)

In [115]: rng
Out[115]: DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03'], dtype='datetime64[ns]', freq='D')

In [116]: rng + DateOffset(months=2)
Out[116]: DatetimeIndex(['2012-03-01', '2012-03-02', '2012-03-03'], dtype='datetime64[ns]', freq='D')

In [117]: s + DateOffset(months=2)
Out[117]: 
0   2012-03-01
1   2012-03-02
2   2012-03-03
dtype: datetime64[ns]

In [118]: s - DateOffset(months=2)
Out[118]: 
0   2011-11-01
1   2011-11-02
2   2011-11-03
dtype: datetime64[ns]

If the offset class maps directly to a Timedelta (Day, Hour, Minute, Second, Micro, Milli, Nano) it can be used exactly like a Timedelta - see the Timedelta section for more examples.

In [119]: s - Day(2)
Out[119]: 
0   2011-12-30
1   2011-12-31
2   2012-01-01
dtype: datetime64[ns]

In [120]: td = s - pd.Series(pd.date_range('2011-12-29', '2011-12-31'))

In [121]: td
Out[121]: 
0   3 days
1   3 days
2   3 days
dtype: timedelta64[ns]

In [122]: td + Minute(15)
Out[122]: 
0   3 days 00:15:00
1   3 days 00:15:00
2   3 days 00:15:00
dtype: timedelta64[ns]

注意,一些偏移(例如BQuarterEnd)没有向量化实现。它们仍然可以使用,但可能会显着减慢,并会产生PerformanceWarning

In [123]: rng + BQuarterEnd()
Out[123]: DatetimeIndex(['2012-03-30', '2012-03-30', '2012-03-30'], dtype='datetime64[ns]', freq=None)

Custom Business Days (Experimental)

CDayCustomBusinessDay类提供了一个参数BusinessDay类,可用于创建定制的工作日日历,用于说明当地假日和本地周末约定。

作为一个有趣的例子,让我们看看埃及,周五周六周末。

In [124]: from pandas.tseries.offsets import CustomBusinessDay

In [125]: weekmask_egypt = 'Sun Mon Tue Wed Thu'

# They also observe International Workers' Day so let's
# add that for a couple of years
In [126]: holidays = ['2012-05-01', datetime(2013, 5, 1), np.datetime64('2014-05-01')]

In [127]: bday_egypt = CustomBusinessDay(holidays=holidays, weekmask=weekmask_egypt)

In [128]: dt = datetime(2013, 4, 30)

In [129]: dt + 2 * bday_egypt
Out[129]: Timestamp('2013-05-05 00:00:00')

让我们映射到星期几的名字

In [130]: dts = pd.date_range(dt, periods=5, freq=bday_egypt)

In [131]: pd.Series(dts.weekday, dts).map(pd.Series('Mon Tue Wed Thu Fri Sat Sun'.split()))
Out[131]: 
2013-04-30    Tue
2013-05-02    Thu
2013-05-05    Sun
2013-05-06    Mon
2013-05-07    Tue
Freq: C, dtype: object

从v0.14起,假日日历可用于提供假日列表。有关详细信息,请参阅holiday calendar部分。

In [132]: from pandas.tseries.holiday import USFederalHolidayCalendar

In [133]: bday_us = CustomBusinessDay(calendar=USFederalHolidayCalendar())

# Friday before MLK Day
In [134]: dt = datetime(2014, 1, 17)

# Tuesday after MLK Day (Monday is skipped because it's a holiday)
In [135]: dt + bday_us
Out[135]: Timestamp('2014-01-21 00:00:00')

可以以通常的方式定义尊重某个假日日历的每月偏移量。

In [136]: from pandas.tseries.offsets import CustomBusinessMonthBegin

In [137]: bmth_us = CustomBusinessMonthBegin(calendar=USFederalHolidayCalendar())

# Skip new years
In [138]: dt = datetime(2013, 12, 17)

In [139]: dt + bmth_us
Out[139]: Timestamp('2014-01-02 00:00:00')

# Define date index with custom offset
In [140]: pd.DatetimeIndex(start='20100101',end='20120101',freq=bmth_us)
Out[140]: 
DatetimeIndex(['2010-01-04', '2010-02-01', '2010-03-01', '2010-04-01',
               '2010-05-03', '2010-06-01', '2010-07-01', '2010-08-02',
               '2010-09-01', '2010-10-01', '2010-11-01', '2010-12-01',
               '2011-01-03', '2011-02-01', '2011-03-01', '2011-04-01',
               '2011-05-02', '2011-06-01', '2011-07-01', '2011-08-01',
               '2011-09-01', '2011-10-03', '2011-11-01', '2011-12-01'],
              dtype='datetime64[ns]', freq='CBMS')

注意

频率字符串'C'用于指示使用CustomBusinessDay DateOffset,重要的是要注意,由于CustomBusinessDay是一种参数化类型,CustomBusinessDay的实例可能不同,并且不能从“C”频率字符串中检测到。因此,用户需要确保“C”频率串在用户的应用中一致地使用。

Business Hour

BusinessHour类在BusinessDay上提供营业时间表示,允许使用特定的开始和结束时间。

默认情况下,BusinessHour使用上午9:00 - 17:00作为营业时间。添加BusinessHour将每小时增加Timestamp如果目标Timestamp超出营业时间,请移至下一个营业时间,然后将其递增。如果结果超过营业时间结束,剩余将添加到下一个工作日。

In [141]: bh = BusinessHour()

In [142]: bh
Out[142]: <BusinessHour: BH=09:00-17:00>

# 2014-08-01 is Friday
In [143]: pd.Timestamp('2014-08-01 10:00').weekday()
Out[143]: 4

In [144]: pd.Timestamp('2014-08-01 10:00') + bh
Out[144]: Timestamp('2014-08-01 11:00:00')

# Below example is the same as: pd.Timestamp('2014-08-01 09:00') + bh
In [145]: pd.Timestamp('2014-08-01 08:00') + bh
Out[145]: Timestamp('2014-08-01 10:00:00')

# If the results is on the end time, move to the next business day
In [146]: pd.Timestamp('2014-08-01 16:00') + bh
Out[146]: Timestamp('2014-08-04 09:00:00')

# Remainings are added to the next day
In [147]: pd.Timestamp('2014-08-01 16:30') + bh
Out[147]: Timestamp('2014-08-04 09:30:00')

# Adding 2 business hours
In [148]: pd.Timestamp('2014-08-01 10:00') + BusinessHour(2)
Out[148]: Timestamp('2014-08-01 12:00:00')

# Subtracting 3 business hours
In [149]: pd.Timestamp('2014-08-01 10:00') + BusinessHour(-3)
Out[149]: Timestamp('2014-07-31 15:00:00')

此外,您可以通过关键字指定startend时间。参数必须为str,其具有hour:minute表示形式或datetime.time实例。将工作时间指定为秒,微秒和纳秒导致ValueError

In [150]: bh = BusinessHour(start='11:00', end=time(20, 0))

In [151]: bh
Out[151]: <BusinessHour: BH=11:00-20:00>

In [152]: pd.Timestamp('2014-08-01 13:00') + bh
Out[152]: Timestamp('2014-08-01 14:00:00')

In [153]: pd.Timestamp('2014-08-01 09:00') + bh
Out[153]: Timestamp('2014-08-01 12:00:00')

In [154]: pd.Timestamp('2014-08-01 18:00') + bh
Out[154]: Timestamp('2014-08-01 19:00:00')

end之后传递start时间表示午夜营业时间。在这种情况下,营业时间超过午夜,并与第二天重叠。有效营业时间的区别在于它是否从有效的BusinessDay开始。

In [155]: bh = BusinessHour(start='17:00', end='09:00')

In [156]: bh
Out[156]: <BusinessHour: BH=17:00-09:00>

In [157]: pd.Timestamp('2014-08-01 17:00') + bh
Out[157]: Timestamp('2014-08-01 18:00:00')

In [158]: pd.Timestamp('2014-08-01 23:00') + bh
Out[158]: Timestamp('2014-08-02 00:00:00')

# Although 2014-08-02 is Satuaday,
# it is valid because it starts from 08-01 (Friday).
In [159]: pd.Timestamp('2014-08-02 04:00') + bh
Out[159]: Timestamp('2014-08-02 05:00:00')

# Although 2014-08-04 is Monday,
# it is out of business hours because it starts from 08-03 (Sunday).
In [160]: pd.Timestamp('2014-08-04 04:00') + bh
Out[160]: Timestamp('2014-08-04 18:00:00')

BusinessHour.rollforwardrollback应用于工作时间会导致下一个工作时间开始或上一天结束。与其他偏移不同,BusinessHour.rollforward可根据定义从apply输出不同的结果。

这是因为一天的营业时间结束等于下一天的营业时间开始。例如,在默认营业时间(9:00 - 17:00)之间,2014-08-01 17:00之间没有间隔(0分钟) t2>2014-08-04 09:00

# This adjusts a Timestamp to business hour edge
In [161]: BusinessHour().rollback(pd.Timestamp('2014-08-02 15:00'))
Out[161]: Timestamp('2014-08-01 17:00:00')

In [162]: BusinessHour().rollforward(pd.Timestamp('2014-08-02 15:00'))
Out[162]: Timestamp('2014-08-04 09:00:00')

# It is the same as BusinessHour().apply(pd.Timestamp('2014-08-01 17:00')).
# And it is the same as BusinessHour().apply(pd.Timestamp('2014-08-04 09:00'))
In [163]: BusinessHour().apply(pd.Timestamp('2014-08-02 15:00'))
Out[163]: Timestamp('2014-08-04 10:00:00')

# BusinessDay results (for reference)
In [164]: BusinessHour().rollforward(pd.Timestamp('2014-08-02'))
Out[164]: Timestamp('2014-08-04 09:00:00')

# It is the same as BusinessDay().apply(pd.Timestamp('2014-08-01'))
# The result is the same as rollworward because BusinessDay never overlap.
In [165]: BusinessHour().apply(pd.Timestamp('2014-08-02'))
Out[165]: Timestamp('2014-08-04 10:00:00')

BusinessHour将星期六和星期日视为假日。要使用任意假日,您可以使用CustomBusinessHour偏移量,请参阅Custom Business Hour

Custom Business Hour

版本0.18.1中的新功能。

CustomBusinessHourBusinessHourCustomBusinessDay的混合体,允许您指定任意假日。CustomBusinessHourBusinessHour的工作方式相同,只是它跳过指定的自定义假日。

In [166]: from pandas.tseries.holiday import USFederalHolidayCalendar

In [167]: bhour_us = CustomBusinessHour(calendar=USFederalHolidayCalendar())

# Friday before MLK Day
In [168]: dt = datetime(2014, 1, 17, 15)

In [169]: dt + bhour_us
Out[169]: Timestamp('2014-01-17 16:00:00')

# Tuesday after MLK Day (Monday is skipped because it's a holiday)
In [170]: dt + bhour_us * 2
Out[170]: Timestamp('2014-01-21 09:00:00')

您可以使用由BusinessHourCustomBusinessDay支持的关键字参数。

In [171]: bhour_mon = CustomBusinessHour(start='10:00', weekmask='Tue Wed Thu Fri')

# Monday is skipped because it's a holiday, business hour starts from 10:00
In [172]: dt + bhour_mon * 2
Out[172]: Timestamp('2014-01-21 10:00:00')

Offset Aliases

许多字符串别名被赋予有用的公共时间系列频率。我们将这些别名称为偏移别名(在v0.8.0之前称为时间规则)。

别名 描述
B 工作日频率
C 定制工作日频率(实验)
D 日历日频率
W 每周频率
M 月结束频率
SM 半月结束频率(15个月和月末)
BM 业务月末频率
CBM 定制业务月末频率
MS 月起始频率
SMS 半月起始频率(第1和15)
BMS 业务月开始频率
CBMS 定制商业月份开始频率
Q 四分频结束频率
BQ 业务四分之一频率
QS 四分频启动频率
BQS 业务季开始频率
A 年结束频率
BA 业务年结束频率
AS 年起始频率
BAS 业务年开始频率
BH 工作时间频率
H 每小时频率
T,min 分钟频率
S 次频
L,ms 毫秒
U,us 微秒
N 纳秒

Combining Aliases

正如我们之前所看到的,别名和偏移实例在大多数函数中是可替换的:

In [173]: pd.date_range(start, periods=5, freq='B')
Out[173]: 
DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
               '2011-01-07'],
              dtype='datetime64[ns]', freq='B')

In [174]: pd.date_range(start, periods=5, freq=BDay())
Out[174]: 
DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
               '2011-01-07'],
              dtype='datetime64[ns]', freq='B')

您可以将日和期内抵消结合在一起:

In [175]: pd.date_range(start, periods=10, freq='2h20min')
Out[175]: 
DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 02:20:00',
               '2011-01-01 04:40:00', '2011-01-01 07:00:00',
               '2011-01-01 09:20:00', '2011-01-01 11:40:00',
               '2011-01-01 14:00:00', '2011-01-01 16:20:00',
               '2011-01-01 18:40:00', '2011-01-01 21:00:00'],
              dtype='datetime64[ns]', freq='140T')

In [176]: pd.date_range(start, periods=10, freq='1D10U')
Out[176]: 
DatetimeIndex([       '2011-01-01 00:00:00', '2011-01-02 00:00:00.000010',
               '2011-01-03 00:00:00.000020', '2011-01-04 00:00:00.000030',
               '2011-01-05 00:00:00.000040', '2011-01-06 00:00:00.000050',
               '2011-01-07 00:00:00.000060', '2011-01-08 00:00:00.000070',
               '2011-01-09 00:00:00.000080', '2011-01-10 00:00:00.000090'],
              dtype='datetime64[ns]', freq='86400000010U')

Anchored Offsets

对于某些频率,您可以指定锚定后缀:

别名 描述
W-SUN 每周频率(星期日)。与“W”相同
W-MON 每周频率(星期一)
W-TUE 每周频率
W-WED 每周频率
W-THU 每周频率(thursdays)
W-FRI 每周频率(星期五)
W-SAT 每周频率(saturdays)
(B)Q(S)-DEC 季度频率,年度在12月结束。与“Q”相同
(B)Q(S)-JAN 季度频率,年度在1月结束
(B)Q(S)-FEB 季度频率,年度在2月结束
(B)Q(S)-MAR 季度频率,年度在3月结束
(B)Q(S)-APR 季度频率,年度在4月结束
(B)Q(S)-MAY 季度频率,年度在5月结束
(B)Q(S)-JUN 季度频率,年度在6月结束
(B)Q(S)-JUL 季度频率,年度在7月结束
(B)Q(S)-AUG 季度频率,年度在8月结束
(B)Q(S)-SEP 季度频率,年度在9月结束
(B)Q(S)-OCT 季度频率,年度在10月结束
(B)Q(S)-NOV 季度频率,年度在11月结束
(B)A(S)-DEC 年度频率,定于12月底。与“A”相同
(B)A(S)-JAN 年度频率,定于1月底
(B)A(S)-FEB 年度频率,定于2月底
(B)A(S)-MAR 年度频率,定于3月底
(B)A(S)-APR 年度频率,定于4月底
(B)A(S)-MAY 年度频率,定于5月底
(B)A(S)-JUN 年度频率,定于6月底
(B)A(S)-JUL 年度频率,定于7月底
(B)A(S)-AUG 年度频率,定于8月底
(B)A(S)-SEP 年度频率,定于9月底
(B)A(S)-OCT 年度频率,定于10月底
(B)A(S)-NOV 年度频率,定于11月底

这些可以用作date_rangebdate_rangeDatetimeIndex的构造函数以及pandas中其他各种与时间序列相关的函数的参数。

Anchored Offset Semantics

对于固定在特定频率(MonthEndMonthBeginWeekEnd等)开始或结束的偏移量,以下规则适用于滚动前进和后退。

n不为0时,如果给定日期不在锚点上,则它会捕捉到下一个(上一个)锚点,并移动|n|-1向前或向后步进。

In [177]: pd.Timestamp('2014-01-02') + MonthBegin(n=1)
Out[177]: Timestamp('2014-02-01 00:00:00')

In [178]: pd.Timestamp('2014-01-02') + MonthEnd(n=1)
Out[178]: Timestamp('2014-01-31 00:00:00')

In [179]: pd.Timestamp('2014-01-02') - MonthBegin(n=1)
Out[179]: Timestamp('2014-01-01 00:00:00')

In [180]: pd.Timestamp('2014-01-02') - MonthEnd(n=1)
Out[180]: Timestamp('2013-12-31 00:00:00')

In [181]: pd.Timestamp('2014-01-02') + MonthBegin(n=4)
Out[181]: Timestamp('2014-05-01 00:00:00')

In [182]: pd.Timestamp('2014-01-02') - MonthBegin(n=4)
Out[182]: Timestamp('2013-10-01 00:00:00')

如果给定日期在锚点上,则向前或向后移动|n|

In [183]: pd.Timestamp('2014-01-01') + MonthBegin(n=1)
Out[183]: Timestamp('2014-02-01 00:00:00')

In [184]: pd.Timestamp('2014-01-31') + MonthEnd(n=1)
Out[184]: Timestamp('2014-02-28 00:00:00')

In [185]: pd.Timestamp('2014-01-01') - MonthBegin(n=1)
Out[185]: Timestamp('2013-12-01 00:00:00')

In [186]: pd.Timestamp('2014-01-31') - MonthEnd(n=1)
Out[186]: Timestamp('2013-12-31 00:00:00')

In [187]: pd.Timestamp('2014-01-01') + MonthBegin(n=4)
Out[187]: Timestamp('2014-05-01 00:00:00')

In [188]: pd.Timestamp('2014-01-31') - MonthBegin(n=4)
Out[188]: Timestamp('2013-10-01 00:00:00')

对于n=0的情况,如果在锚点上,则不移动日期,否则将向前滚动到下一个锚点。

In [189]: pd.Timestamp('2014-01-02') + MonthBegin(n=0)
Out[189]: Timestamp('2014-02-01 00:00:00')

In [190]: pd.Timestamp('2014-01-02') + MonthEnd(n=0)
Out[190]: Timestamp('2014-01-31 00:00:00')

In [191]: pd.Timestamp('2014-01-01') + MonthBegin(n=0)
Out[191]: Timestamp('2014-01-01 00:00:00')

In [192]: pd.Timestamp('2014-01-31') + MonthEnd(n=0)
Out[192]: Timestamp('2014-01-31 00:00:00')

Holidays / Holiday Calendars

假日和日历提供了一种简单的方法来定义要与CustomBusinessDay配合使用的假日规则,或在需要预定义假日集的其他分析中。AbstractHolidayCalendar类提供了返回节假日列表所需的所有方法,并且只需要在特定假日日历类中定义rules此外,start_dateend_date类别属性决定生成哪些日期范围的假日。这些应该在AbstractHolidayCalendar类上覆盖,以将范围应用于所有日历子类。USFederalHolidayCalendar是唯一存在的日历,主要用作开发其他日历的示例。

对于在固定日期(例如,美国纪念日或7月4日)发生的假日,遵守规则确定当该假日落在周末或某些其他非观察日时,何时观察该假日。定义的遵守规则是:

规则 描述
nearest_workday 周六至周五,周日至周一
sunday_to_monday 周日移至下周一
next_monday_or_tuesday 星期六到星期一和星期日/星期一到星期二
previous_friday 将星期六和星期日移动到上一个星期五“
next_monday 将星期六和星期日移动到下一个星期一

定义假日和假日日历的示例:

In [193]: from pandas.tseries.holiday import Holiday, USMemorialDay,\
   .....:     AbstractHolidayCalendar, nearest_workday, MO
   .....: 

In [194]: class ExampleCalendar(AbstractHolidayCalendar):
   .....:     rules = [
   .....:         USMemorialDay,
   .....:         Holiday('July 4th', month=7, day=4, observance=nearest_workday),
   .....:         Holiday('Columbus Day', month=10, day=1,
   .....:             offset=DateOffset(weekday=MO(2))), #same as 2*Week(weekday=2)
   .....:         ]
   .....: 

In [195]: cal = ExampleCalendar()

In [196]: cal.holidays(datetime(2012, 1, 1), datetime(2012, 12, 31))
Out[196]: DatetimeIndex(['2012-05-28', '2012-07-04', '2012-10-08'], dtype='datetime64[ns]', freq=None)

使用此日历,创建索引或进行偏移算术会跳过周末和假日(即阵亡将士纪念日/ 7月4日)。例如,以下使用ExampleCalendar定义自定义工作日偏移量。像任何其他偏移一样,它可以用于创建DatetimeIndex或添加到datetimeTimestamp对象。

In [197]: from pandas.tseries.offsets import CDay

In [198]: pd.DatetimeIndex(start='7/1/2012', end='7/10/2012',
   .....:     freq=CDay(calendar=cal)).to_pydatetime()
   .....: 
Out[198]: 
array([datetime.datetime(2012, 7, 2, 0, 0),
       datetime.datetime(2012, 7, 3, 0, 0),
       datetime.datetime(2012, 7, 5, 0, 0),
       datetime.datetime(2012, 7, 6, 0, 0),
       datetime.datetime(2012, 7, 9, 0, 0),
       datetime.datetime(2012, 7, 10, 0, 0)], dtype=object)

In [199]: offset = CustomBusinessDay(calendar=cal)

In [200]: datetime(2012, 5, 25) + offset
Out[200]: Timestamp('2012-05-29 00:00:00')

In [201]: datetime(2012, 7, 3) + offset
Out[201]: Timestamp('2012-07-05 00:00:00')

In [202]: datetime(2012, 7, 3) + 2 * offset
Out[202]: Timestamp('2012-07-06 00:00:00')

In [203]: datetime(2012, 7, 6) + offset
Out[203]: Timestamp('2012-07-09 00:00:00')

范围由AbstractHolidayCalendarstart_dateend_date类属性定义。默认值如下。

In [204]: AbstractHolidayCalendar.start_date
Out[204]: Timestamp('1970-01-01 00:00:00')

In [205]: AbstractHolidayCalendar.end_date
Out[205]: Timestamp('2030-12-31 00:00:00')

可以通过将属性设置为datetime / Timestamp / string来覆盖这些日期。

In [206]: AbstractHolidayCalendar.start_date = datetime(2012, 1, 1)

In [207]: AbstractHolidayCalendar.end_date = datetime(2012, 12, 31)

In [208]: cal.holidays()
Out[208]: DatetimeIndex(['2012-05-28', '2012-07-04', '2012-10-08'], dtype='datetime64[ns]', freq=None)

每个日历类都可以通过名称使用get_calendar函数访问,该函数返回假日类实例。任何导入的日历类都将自动通过此功能可用。此外,HolidayCalendarFactory提供了一个简单的界面,用于创建日历,这些日历是具有其他规则的日历或日历的组合。

In [209]: from pandas.tseries.holiday import get_calendar, HolidayCalendarFactory,\
   .....:     USLaborDay
   .....: 

In [210]: cal = get_calendar('ExampleCalendar')

In [211]: cal.rules
Out[211]: 
[Holiday: MemorialDay (month=5, day=31, offset=<DateOffset: kwds={'weekday': MO(-1)}>),
 Holiday: July 4th (month=7, day=4, observance=<function nearest_workday at 0x7ff271135aa0>),
 Holiday: Columbus Day (month=10, day=1, offset=<DateOffset: kwds={'weekday': MO(+2)}>)]

In [212]: new_cal = HolidayCalendarFactory('NewExampleCalendar', cal, USLaborDay)

In [213]: new_cal.rules
Out[213]: 
[Holiday: Labor Day (month=9, day=1, offset=<DateOffset: kwds={'weekday': MO(+1)}>),
 Holiday: Columbus Day (month=10, day=1, offset=<DateOffset: kwds={'weekday': MO(+2)}>),
 Holiday: July 4th (month=7, day=4, observance=<function nearest_workday at 0x7ff271135aa0>),
 Holiday: MemorialDay (month=5, day=31, offset=<DateOffset: kwds={'weekday': MO(-1)}>)]

Resampling

警告

.resample的接口在0.18.0中已更改为更像groupby的,因此更灵活。有关与以前版本的比较,请参阅whatsnew docs

Pandas具有用于在频率转换期间执行重采样操作(例如,将第二数据转换为5分钟数据)的简单,强大和高效的功能。这在金融应用中是非常常见的,但不限于此。

.resample()是基于时间的groupby,后面是每个组的缩减方法。

从版本0.18.1开始,可以直接从DataFrameGroupBy对象使用resample()函数,请参阅groupby docs

注意

.resample()类似于使用具有基于时间的偏移的.rolling()操作,参见讨论这里&lt; stats.moments.ts-对比重采样&gt;

有关某些高级策略,请参阅一些cookbook examples

In [224]: rng = pd.date_range('1/1/2012', periods=100, freq='S')

In [225]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)

In [226]: ts.resample('5Min').sum()
Out[226]: 
2012-01-01    24390
Freq: 5T, dtype: int64

resample功能非常灵活,允许您指定许多不同的参数来控制频率转换和重采样操作。

how参数可以是函数名或numpy数组函数,它接受数组并生成聚合值:

In [227]: ts.resample('5Min').mean()
Out[227]: 
2012-01-01    243.9
Freq: 5T, dtype: float64

In [228]: ts.resample('5Min').ohlc()
Out[228]: 
            open  high  low  close
2012-01-01   161   495    1    245

In [229]: ts.resample('5Min').max()
Out[229]: 
2012-01-01    495
Freq: 5T, dtype: int64

Any function available via dispatching can be given to the how parameter by name, including sum, mean, std, sem, max, min, median, first, last, ohlc.

对于下采样,可以将closed设置为“左”或“右”以指定间隔的哪个结束:

In [230]: ts.resample('5Min', closed='right').mean()
Out[230]: 
2011-12-31 23:55:00    161.000000
2012-01-01 00:00:00    244.737374
Freq: 5T, dtype: float64

In [231]: ts.resample('5Min', closed='left').mean()
Out[231]: 
2012-01-01    243.9
Freq: 5T, dtype: float64

Parameters like label and loffset are used to manipulate the resulting labels. label指定结果是否标记有间隔的开始或结束。loffset对输出标签执行时间调整。

In [232]: ts.resample('5Min').mean() # by default label='right'
Out[232]: 
2012-01-01    243.9
Freq: 5T, dtype: float64

In [233]: ts.resample('5Min', label='left').mean()
Out[233]: 
2012-01-01    243.9
Freq: 5T, dtype: float64

In [234]: ts.resample('5Min', label='left', loffset='1s').mean()
Out[234]: 
2012-01-01 00:00:01    243.9
dtype: float64

axis参数可以设置为0或1,并允许为DataFrame重新采样指定的轴。

kind可以设置为“timestamp”或“period”,以将结果索引转换为/来自时间戳和时间跨度表示。默认情况下,resample保留输入表示。

convention可以在重新采样周期数据(下面的详细信息)时设置为“开始”或“结束”。它指定如何将低频周期转换为较高频率周期。

Up Sampling

对于上采样,您可以指定上采样的方式和limit参数,以在创建的间隙内插值:

# from secondly to every 250 milliseconds
In [235]: ts[:2].resample('250L').asfreq()
Out[235]: 
2012-01-01 00:00:00.000    161.0
2012-01-01 00:00:00.250      NaN
2012-01-01 00:00:00.500      NaN
2012-01-01 00:00:00.750      NaN
2012-01-01 00:00:01.000    199.0
Freq: 250L, dtype: float64

In [236]: ts[:2].resample('250L').ffill()
Out[236]: 
2012-01-01 00:00:00.000    161
2012-01-01 00:00:00.250    161
2012-01-01 00:00:00.500    161
2012-01-01 00:00:00.750    161
2012-01-01 00:00:01.000    199
Freq: 250L, dtype: int64

In [237]: ts[:2].resample('250L').ffill(limit=2)
Out[237]: 
2012-01-01 00:00:00.000    161.0
2012-01-01 00:00:00.250    161.0
2012-01-01 00:00:00.500    161.0
2012-01-01 00:00:00.750      NaN
2012-01-01 00:00:01.000    199.0
Freq: 250L, dtype: float64

Sparse Resampling

稀疏时间序列是指相对于您要重新抽样的时间,点数较少的点。稀疏序列的稀疏上采样可能产生大量的中间值。当您不想使用方法来填充这些值时,例如fill_methodNone,则中间值将用NaN填充。

由于resample是基于时间的组,以下是一种方法,用于仅对不是全部NaN的组有效地重新取样,

In [238]: rng = pd.date_range('2014-1-1', periods=100, freq='D') + pd.Timedelta('1s')

In [239]: ts = pd.Series(range(100), index=rng)

如果我们想重新取样到系列的全部范围

In [240]: ts.resample('3T').sum()
Out[240]: 
2014-01-01 00:00:00     0.0
2014-01-01 00:03:00     NaN
2014-01-01 00:06:00     NaN
2014-01-01 00:09:00     NaN
2014-01-01 00:12:00     NaN
2014-01-01 00:15:00     NaN
2014-01-01 00:18:00     NaN
                       ... 
2014-04-09 23:42:00     NaN
2014-04-09 23:45:00     NaN
2014-04-09 23:48:00     NaN
2014-04-09 23:51:00     NaN
2014-04-09 23:54:00     NaN
2014-04-09 23:57:00     NaN
2014-04-10 00:00:00    99.0
Freq: 3T, dtype: float64

我们只能重新取样那些我们有分数的组,如下:

In [241]: from functools import partial

In [242]: from pandas.tseries.frequencies import to_offset

In [243]: def round(t, freq):
   .....:     freq = to_offset(freq)
   .....:     return pd.Timestamp((t.value // freq.delta.value) * freq.delta.value)
   .....: 

In [244]: ts.groupby(partial(round, freq='3T')).sum()
Out[244]: 
2014-01-01     0
2014-01-02     1
2014-01-03     2
2014-01-04     3
2014-01-05     4
2014-01-06     5
2014-01-07     6
              ..
2014-04-04    93
2014-04-05    94
2014-04-06    95
2014-04-07    96
2014-04-08    97
2014-04-09    98
2014-04-10    99
dtype: int64

Aggregation

类似于groupby aggregateswindow functions,可以选择性地重新采样Resampler

重新取样DataFrame,默认情况下将对具有相同函数的所有列执行操作。

In [245]: df = pd.DataFrame(np.random.randn(1000, 3),
   .....:                   index=pd.date_range('1/1/2012', freq='S', periods=1000),
   .....:                   columns=['A', 'B', 'C'])
   .....: 

In [246]: r = df.resample('3T')

In [247]: r.mean()
Out[247]: 
                            A         B         C
2012-01-01 00:00:00 -0.220339  0.034854 -0.073757
2012-01-01 00:03:00  0.037070  0.040013  0.053754
2012-01-01 00:06:00 -0.041597 -0.144562 -0.007614
2012-01-01 00:09:00  0.043127 -0.076432 -0.032570
2012-01-01 00:12:00 -0.027609  0.054618  0.056878
2012-01-01 00:15:00 -0.014181  0.043958  0.077734

我们可以使用标准的getitem选择特定的一列或多列。

In [248]: r['A'].mean()
Out[248]: 
2012-01-01 00:00:00   -0.220339
2012-01-01 00:03:00    0.037070
2012-01-01 00:06:00   -0.041597
2012-01-01 00:09:00    0.043127
2012-01-01 00:12:00   -0.027609
2012-01-01 00:15:00   -0.014181
Freq: 3T, Name: A, dtype: float64

In [249]: r[['A','B']].mean()
Out[249]: 
                            A         B
2012-01-01 00:00:00 -0.220339  0.034854
2012-01-01 00:03:00  0.037070  0.040013
2012-01-01 00:06:00 -0.041597 -0.144562
2012-01-01 00:09:00  0.043127 -0.076432
2012-01-01 00:12:00 -0.027609  0.054618
2012-01-01 00:15:00 -0.014181  0.043958

你可以传递一个list或dict的函数做聚合,输出一个DataFrame:

In [250]: r['A'].agg([np.sum, np.mean, np.std])
Out[250]: 
                           sum      mean       std
2012-01-01 00:00:00 -39.660974 -0.220339  1.033912
2012-01-01 00:03:00   6.672559  0.037070  0.971503
2012-01-01 00:06:00  -7.487453 -0.041597  1.018418
2012-01-01 00:09:00   7.762901  0.043127  1.025842
2012-01-01 00:12:00  -4.969624 -0.027609  0.961649
2012-01-01 00:15:00  -1.418119 -0.014181  0.978847

如果传递了dict,则键将用于命名列。否则将使用函数的名称(存储在函数对象中)。

In [251]: r['A'].agg({'result1' : np.sum,
   .....:             'result2' : np.mean})
   .....: 
Out[251]: 
                      result2    result1
2012-01-01 00:00:00 -0.220339 -39.660974
2012-01-01 00:03:00  0.037070   6.672559
2012-01-01 00:06:00 -0.041597  -7.487453
2012-01-01 00:09:00  0.043127   7.762901
2012-01-01 00:12:00 -0.027609  -4.969624
2012-01-01 00:15:00 -0.014181  -1.418119

在重新采样的DataFrame上,可以传递要应用于每个列的函数列表,这会生成具有层次索引的聚合结果:

In [252]: r.agg([np.sum, np.mean])
Out[252]: 
                             A                    B                    C  \
                           sum      mean        sum      mean        sum   
2012-01-01 00:00:00 -39.660974 -0.220339   6.273786  0.034854 -13.276324   
2012-01-01 00:03:00   6.672559  0.037070   7.202361  0.040013   9.675632   
2012-01-01 00:06:00  -7.487453 -0.041597 -26.021155 -0.144562  -1.370600   
2012-01-01 00:09:00   7.762901  0.043127 -13.757837 -0.076432  -5.862640   
2012-01-01 00:12:00  -4.969624 -0.027609   9.831208  0.054618  10.237970   
2012-01-01 00:15:00  -1.418119 -0.014181   4.395766  0.043958   7.773442   

                               
                         mean  
2012-01-01 00:00:00 -0.073757  
2012-01-01 00:03:00  0.053754  
2012-01-01 00:06:00 -0.007614  
2012-01-01 00:09:00 -0.032570  
2012-01-01 00:12:00  0.056878  
2012-01-01 00:15:00  0.077734  

通过将dict传递到aggregate,您可以对DataFrame的列应用不同的聚合:

In [253]: r.agg({'A' : np.sum,
   .....:        'B' : lambda x: np.std(x, ddof=1)})
   .....: 
Out[253]: 
                             A         B
2012-01-01 00:00:00 -39.660974  1.004756
2012-01-01 00:03:00   6.672559  0.963559
2012-01-01 00:06:00  -7.487453  0.950766
2012-01-01 00:09:00   7.762901  0.949182
2012-01-01 00:12:00  -4.969624  1.093736
2012-01-01 00:15:00  -1.418119  1.028869

函数名也可以是字符串。为了使字符串有效,它必须在重采样对象上实现

In [254]: r.agg({'A' : 'sum', 'B' : 'std'})
Out[254]: 
                             A         B
2012-01-01 00:00:00 -39.660974  1.004756
2012-01-01 00:03:00   6.672559  0.963559
2012-01-01 00:06:00  -7.487453  0.950766
2012-01-01 00:09:00   7.762901  0.949182
2012-01-01 00:12:00  -4.969624  1.093736
2012-01-01 00:15:00  -1.418119  1.028869

此外,您还可以为每个列分别指定多个聚合函数。

In [255]: r.agg({'A' : ['sum','std'], 'B' : ['mean','std'] })
Out[255]: 
                             A                   B          
                           sum       std      mean       std
2012-01-01 00:00:00 -39.660974  1.033912  0.034854  1.004756
2012-01-01 00:03:00   6.672559  0.971503  0.040013  0.963559
2012-01-01 00:06:00  -7.487453  1.018418 -0.144562  0.950766
2012-01-01 00:09:00   7.762901  1.025842 -0.076432  0.949182
2012-01-01 00:12:00  -4.969624  0.961649  0.054618  1.093736
2012-01-01 00:15:00  -1.418119  0.978847  0.043958  1.028869

如果DataFrame没有datetimelike索引,而是要根据框架中的datetimelike列重新取样,则它可以传递到on关键字。

In [256]: df = pd.DataFrame({'date': pd.date_range('2015-01-01', freq='W', periods=5),
   .....:                    'a': np.arange(5)},
   .....:                   index=pd.MultiIndex.from_arrays([
   .....:                            [1,2,3,4,5],
   .....:                            pd.date_range('2015-01-01', freq='W', periods=5)],
   .....:                        names=['v','d']))
   .....: 

In [257]: df
Out[257]: 
              a       date
v d                       
1 2015-01-04  0 2015-01-04
2 2015-01-11  1 2015-01-11
3 2015-01-18  2 2015-01-18
4 2015-01-25  3 2015-01-25
5 2015-02-01  4 2015-02-01

In [258]: df.resample('M', on='date').sum()
Out[258]: 
            a
date         
2015-01-31  6
2015-02-28  4

类似地,如果您想要通过MultiIndex的数据类型级别重新采样,则其名称或位置可以传递到level关键字。

In [259]: df.resample('M', level='d').sum()
Out[259]: 
            a
d            
2015-01-31  6
2015-02-28  4

Time Span Representation

规则的时间间隔由pandas中的Period对象表示,而Period对象的序列被收集在PeriodIndex中,函数period_range

Period

A Period表示时间跨度(例如,一天,一个月,一个季度等)。您可以使用下面的频率别名通过freq关键字指定范围。因为freq表示Period的跨度,所以不能像“-3D”那样为负。

In [260]: pd.Period('2012', freq='A-DEC')
Out[260]: Period('2012', 'A-DEC')

In [261]: pd.Period('2012-1-1', freq='D')
Out[261]: Period('2012-01-01', 'D')

In [262]: pd.Period('2012-1-1 19:00', freq='H')
Out[262]: Period('2012-01-01 19:00', 'H')

In [263]: pd.Period('2012-1-1 19:00', freq='5H')
Out[263]: Period('2012-01-01 19:00', '5H')

从周期中增加和减去整数将周期移动自己的频率。不允许在Period与不同freq(跨度)之间进行算术。

In [264]: p = pd.Period('2012', freq='A-DEC')

In [265]: p + 1
Out[265]: Period('2013', 'A-DEC')

In [266]: p - 3
Out[266]: Period('2009', 'A-DEC')

In [267]: p = pd.Period('2012-01', freq='2M')

In [268]: p + 2
Out[268]: Period('2012-05', '2M')

In [269]: p - 1
Out[269]: Period('2011-11', '2M')

In [270]: p == pd.Period('2012-01', freq='3M')
---------------------------------------------------------------------------
IncompatibleFrequency                     Traceback (most recent call last)
<ipython-input-270-ff54ce3238f5> in <module>()
----> 1 p == pd.Period('2012-01', freq='3M')

/home/joris/scipy/pandas/pandas/src/period.pyx in pandas._period._Period.__richcmp__ (pandas/src/period.c:11340)()
    729             if other.freq != self.freq:
    730                 msg = _DIFFERENT_FREQ.format(self.freqstr, other.freqstr)
--> 731                 raise IncompatibleFrequency(msg)
    732             return PyObject_RichCompareBool(self.ordinal, other.ordinal, op)
    733         elif other is tslib.NaT:

IncompatibleFrequency: Input has different freq=3M from Period(freq=2M)

如果Period freq是每日或更高(DHTSLUN),offsetstimedelta在有相同的频率的情况下可以使用,否则将引发ValueError。否则,将引发ValueError

In [271]: p = pd.Period('2014-07-01 09:00', freq='H')

In [272]: p + Hour(2)
Out[272]: Period('2014-07-01 11:00', 'H')

In [273]: p + timedelta(minutes=120)
Out[273]: Period('2014-07-01 11:00', 'H')

In [274]: p + np.timedelta64(7200, 's')
Out[274]: Period('2014-07-01 11:00', 'H')
In [1]: p + Minute(5)
Traceback
   ...
ValueError: Input has different freq from Period(freq=H)

如果Period有其他频率,则只能添加相同的offsets否则,将引发ValueError

In [275]: p = pd.Period('2014-07', freq='M')

In [276]: p + MonthEnd(3)
Out[276]: Period('2014-10', 'M')
In [1]: p + MonthBegin(3)
Traceback
   ...
ValueError: Input has different freq from Period(freq=M)

以相同频率的Period实例的差异将返回它们之间的频率单位数:

In [277]: pd.Period('2012', freq='A-DEC') - pd.Period('2002', freq='A-DEC')
Out[277]: 10

PeriodIndex and period_range

可以在PeriodIndex中收集Period对象的规则序列,其可以使用period_range方便函数构建:

In [278]: prng = pd.period_range('1/1/2011', '1/1/2012', freq='M')

In [279]: prng
Out[279]: 
PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06',
             '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12',
             '2012-01'],
            dtype='period[M]', freq='M')

PeriodIndex构造函数也可以直接使用:

In [280]: pd.PeriodIndex(['2011-1', '2011-2', '2011-3'], freq='M')
Out[280]: PeriodIndex(['2011-01', '2011-02', '2011-03'], dtype='period[M]', freq='M')

传递乘法频率输出具有乘法跨度的Period的序列。

In [281]: pd.PeriodIndex(start='2014-01', freq='3M', periods=4)
Out[281]: PeriodIndex(['2014-01', '2014-04', '2014-07', '2014-10'], dtype='period[3M]', freq='3M')

DatetimeIndex一样,PeriodIndex也可以用于索引pandas对象:

In [282]: ps = pd.Series(np.random.randn(len(prng)), prng)

In [283]: ps
Out[283]: 
2011-01   -1.022670
2011-02    1.371155
2011-03    1.035277
2011-04    1.694400
2011-05   -1.659733
2011-06    0.511432
2011-07    0.433176
2011-08   -0.317955
2011-09   -0.517114
2011-10   -0.310466
2011-11    0.543957
2011-12    0.492003
2012-01    0.193420
Freq: M, dtype: float64

PeriodIndex支持与Period相同的规则进行加法和减法。

In [284]: idx = pd.period_range('2014-07-01 09:00', periods=5, freq='H')

In [285]: idx
Out[285]: 
PeriodIndex(['2014-07-01 09:00', '2014-07-01 10:00', '2014-07-01 11:00',
             '2014-07-01 12:00', '2014-07-01 13:00'],
            dtype='period[H]', freq='H')

In [286]: idx + Hour(2)
Out[286]: 
PeriodIndex(['2014-07-01 11:00', '2014-07-01 12:00', '2014-07-01 13:00',
             '2014-07-01 14:00', '2014-07-01 15:00'],
            dtype='period[H]', freq='H')

In [287]: idx = pd.period_range('2014-07', periods=5, freq='M')

In [288]: idx
Out[288]: PeriodIndex(['2014-07', '2014-08', '2014-09', '2014-10', '2014-11'], dtype='period[M]', freq='M')

In [289]: idx + MonthEnd(3)
Out[289]: PeriodIndex(['2014-10', '2014-11', '2014-12', '2015-01', '2015-02'], dtype='period[M]', freq='M')

PeriodIndex有自己的dtype period,请参阅Period Dtypes

Period Dtypes

版本0.19.0中的新功能。

PeriodIndex具有自定义period dtype。这是类似于timezone aware dtypedatetime64 [ns, tz])的pandas扩展dtype。

The period dtype holds the freq attribute and is represented with period[freq] like period[D] or period[M], using frequency strings.

In [290]: pi = pd.period_range('2016-01-01', periods=3, freq='M')

In [291]: pi
Out[291]: PeriodIndex(['2016-01', '2016-02', '2016-03'], dtype='period[M]', freq='M')

In [292]: pi.dtype
Out[292]: period[M]

period dtype可以在.astype(...)中使用。它允许像.asfreq()一样更改PeriodIndexfreq,并将DatetimeIndex转换为PeriodIndex like to_period()

# change monthly freq to daily freq
In [293]: pi.astype('period[D]')
Out[293]: PeriodIndex(['2016-01-31', '2016-02-29', '2016-03-31'], dtype='period[D]', freq='D')

# convert to DatetimeIndex
In [294]: pi.astype('datetime64[ns]')
Out[294]: DatetimeIndex(['2016-01-01', '2016-02-01', '2016-03-01'], dtype='datetime64[ns]', freq='MS')

# convert to PeriodIndex
In [295]: dti = pd.date_range('2011-01-01', freq='M', periods=3)

In [296]: dti
Out[296]: DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31'], dtype='datetime64[ns]', freq='M')

In [297]: dti.astype('period[M]')
Out[297]: PeriodIndex(['2011-01', '2011-02', '2011-03'], dtype='period[M]', freq='M')

PeriodIndex Partial String Indexing

您可以使用DatetimeIndex相同的方式将日期和字符串传递到SeriesDataFramePeriodIndex有关详细信息,请参阅DatetimeIndex Partial String Indexing

In [298]: ps['2011-01']
Out[298]: -1.022669594890105

In [299]: ps[datetime(2011, 12, 25):]
Out[299]: 
2011-12    0.492003
2012-01    0.193420
Freq: M, dtype: float64

In [300]: ps['10/31/2011':'12/31/2011']
Out[300]: 
2011-10   -0.310466
2011-11    0.543957
2011-12    0.492003
Freq: M, dtype: float64

传递表示比PeriodIndex更低的频率的字符串返回部分分片数据。

In [301]: ps['2011']
Out[301]: 
2011-01   -1.022670
2011-02    1.371155
2011-03    1.035277
2011-04    1.694400
2011-05   -1.659733
2011-06    0.511432
2011-07    0.433176
2011-08   -0.317955
2011-09   -0.517114
2011-10   -0.310466
2011-11    0.543957
2011-12    0.492003
Freq: M, dtype: float64

In [302]: dfp = pd.DataFrame(np.random.randn(600,1),
   .....:                    columns=['A'],
   .....:                    index=pd.period_range('2013-01-01 9:00', periods=600, freq='T'))
   .....: 

In [303]: dfp
Out[303]: 
                         A
2013-01-01 09:00  0.197720
2013-01-01 09:01 -0.284769
2013-01-01 09:02  0.061491
2013-01-01 09:03  1.630257
2013-01-01 09:04  2.042442
2013-01-01 09:05 -0.804392
2013-01-01 09:06  0.212760
...                    ...
2013-01-01 18:53  0.150586
2013-01-01 18:54 -0.679569
2013-01-01 18:55 -0.910216
2013-01-01 18:56 -0.413168
2013-01-01 18:57 -0.247752
2013-01-01 18:58  1.590875
2013-01-01 18:59 -2.005294

[600 rows x 1 columns]

In [304]: dfp['2013-01-01 10H']
Out[304]: 
                         A
2013-01-01 10:00 -0.569936
2013-01-01 10:01 -1.179183
2013-01-01 10:02 -0.838602
2013-01-01 10:03 -1.727539
2013-01-01 10:04  1.334027
2013-01-01 10:05  0.417423
2013-01-01 10:06 -0.221189
...                    ...
2013-01-01 10:53 -0.375925
2013-01-01 10:54  0.212750
2013-01-01 10:55 -0.592417
2013-01-01 10:56 -0.466064
2013-01-01 10:57 -1.715347
2013-01-01 10:58 -0.634913
2013-01-01 10:59 -0.809471

[60 rows x 1 columns]

DatetimeIndex一样,端点将包含在结果中。下面的示例将数据从10:00开始到11:59。

In [305]: dfp['2013-01-01 10H':'2013-01-01 11H']
Out[305]: 
                         A
2013-01-01 10:00 -0.569936
2013-01-01 10:01 -1.179183
2013-01-01 10:02 -0.838602
2013-01-01 10:03 -1.727539
2013-01-01 10:04  1.334027
2013-01-01 10:05  0.417423
2013-01-01 10:06 -0.221189
...                    ...
2013-01-01 11:53  0.616198
2013-01-01 11:54  2.843156
2013-01-01 11:55  0.572537
2013-01-01 11:56  1.709706
2013-01-01 11:57 -0.205490
2013-01-01 11:58  1.759719
2013-01-01 11:59 -1.181485

[120 rows x 1 columns]

Frequency Conversion and Resampling with PeriodIndex

可以通过asfreq方法转换PeriodPeriodIndex的频率。让我们从2011年财政年度开始,到12月为止:

In [306]: p = pd.Period('2011', freq='A-DEC')

In [307]: p
Out[307]: Period('2011', 'A-DEC')

我们可以将其转换为每月频次。使用how参数,我们可以指定是返回开始月份还是结束月份:

In [308]: p.asfreq('M', how='start')
Out[308]: Period('2011-01', 'M')

In [309]: p.asfreq('M', how='end')
Out[309]: Period('2011-12', 'M')

为方便起见,提供了shorthands's'和'e':

In [310]: p.asfreq('M', 's')
Out[310]: Period('2011-01', 'M')

In [311]: p.asfreq('M', 'e')
Out[311]: Period('2011-12', 'M')

转换为“超周期”(例如,年频率是季度频率的超周期)自动返回包括输入周期的超周期:

In [312]: p = pd.Period('2011-12', freq='M')

In [313]: p.asfreq('A-NOV')
Out[313]: Period('2012', 'A-NOV')

注意,由于我们转换为年度结束于11月的年度频率,2011年12月的月度实际上是在2012年A-NOV期间。

具有锚定频率的周期转换对于处理经济,商业和其他领域常见的各种季度数据特别有用。许多组织定义了相对于其财政年度开始和结束的月份的季度。因此,2011年第一季度可以在2010年或2011年开始几个月。通过锚定频率,pandas对于所有季度频率Q-JANQ-DEC起作用。

Q-DEC定义常规日历季度:

In [314]: p = pd.Period('2012Q1', freq='Q-DEC')

In [315]: p.asfreq('D', 's')
Out[315]: Period('2012-01-01', 'D')

In [316]: p.asfreq('D', 'e')
Out[316]: Period('2012-03-31', 'D')

Q-MAR定义3月份会计年度结束:

In [317]: p = pd.Period('2011Q4', freq='Q-MAR')

In [318]: p.asfreq('D', 's')
Out[318]: Period('2011-01-01', 'D')

In [319]: p.asfreq('D', 'e')
Out[319]: Period('2011-03-31', 'D')

Converting between Representations

可以使用to_period将时间戳记数据转换为PeriodIndexed数据,反之亦然,使用to_timestamp

In [320]: rng = pd.date_range('1/1/2012', periods=5, freq='M')

In [321]: ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [322]: ts
Out[322]: 
2012-01-31    2.167674
2012-02-29   -1.505130
2012-03-31    1.005802
2012-04-30    0.481525
2012-05-31   -0.352151
Freq: M, dtype: float64

In [323]: ps = ts.to_period()

In [324]: ps
Out[324]: 
2012-01    2.167674
2012-02   -1.505130
2012-03    1.005802
2012-04    0.481525
2012-05   -0.352151
Freq: M, dtype: float64

In [325]: ps.to_timestamp()
Out[325]: 
2012-01-01    2.167674
2012-02-01   -1.505130
2012-03-01    1.005802
2012-04-01    0.481525
2012-05-01   -0.352151
Freq: MS, dtype: float64

请记住,'s'和'e'可用于在时间段的开始或结束处返回时间戳记:

In [326]: ps.to_timestamp('D', how='s')
Out[326]: 
2012-01-01    2.167674
2012-02-01   -1.505130
2012-03-01    1.005802
2012-04-01    0.481525
2012-05-01   -0.352151
Freq: MS, dtype: float64

周期和时间戳之间的转换使一些方便的算术功能可以使用。在下面的示例中,我们将季度频率转换为以11月结束的年度到季度结束之后的月底的9am:

In [327]: prng = pd.period_range('1990Q1', '2000Q4', freq='Q-NOV')

In [328]: ts = pd.Series(np.random.randn(len(prng)), prng)

In [329]: ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9

In [330]: ts.head()
Out[330]: 
1990-03-01 09:00   -0.608988
1990-06-01 09:00    0.412294
1990-09-01 09:00   -0.715938
1990-12-01 09:00    1.297773
1991-03-01 09:00   -2.260765
Freq: H, dtype: float64

Representing out-of-bounds spans

If you have data that is outside of the Timestamp bounds, see Timestamp limitations, then you can use a PeriodIndex and/or Series of Periods to do computations.

In [331]: span = pd.period_range('1215-01-01', '1381-01-01', freq='D')

In [332]: span
Out[332]: 
PeriodIndex(['1215-01-01', '1215-01-02', '1215-01-03', '1215-01-04',
             '1215-01-05', '1215-01-06', '1215-01-07', '1215-01-08',
             '1215-01-09', '1215-01-10',
             ...
             '1380-12-23', '1380-12-24', '1380-12-25', '1380-12-26',
             '1380-12-27', '1380-12-28', '1380-12-29', '1380-12-30',
             '1380-12-31', '1381-01-01'],
            dtype='period[D]', length=60632, freq='D')

从基于int64的YYYYMMDD表示形式进行转换。

In [333]: s = pd.Series([20121231, 20141130, 99991231])

In [334]: s
Out[334]: 
0    20121231
1    20141130
2    99991231
dtype: int64

In [335]: def conv(x):
   .....:     return pd.Period(year = x // 10000, month = x//100 % 100, day = x%100, freq='D')
   .....: 

In [336]: s.apply(conv)
Out[336]: 
0   2012-12-31
1   2014-11-30
2   9999-12-31
dtype: object

In [337]: s.apply(conv)[2]
Out[337]: Period('9999-12-31', 'D')

这些可以轻松地转换为PeriodIndex

In [338]: span = pd.PeriodIndex(s.apply(conv))

In [339]: span
Out[339]: PeriodIndex(['2012-12-31', '2014-11-30', '9999-12-31'], dtype='period[D]', freq='D')

Time Zone Handling

Pandas为使用pytzdateutil库在不同时区中处理时间戳提供了丰富的支持。dateutil支持在0.14.1中是新的,目前仅支持固定偏移和tzfile区域。默认库为pytz支持dateutil是为了与其他应用程序兼容,例如如果你在其他python包中使用dateutil

Working with Time Zones

默认情况下,pandas对象是时区不感知:

In [340]: rng = pd.date_range('3/6/2012 00:00', periods=15, freq='D')

In [341]: rng.tz is None
Out[341]: True

要提供时区,可以使用tz关键字date_range和其他功能。Dateutil时区字符串与pytz时区不同,从dateutil/开始。

# pytz
In [342]: rng_pytz = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
   .....:                          tz='Europe/London')
   .....: 

In [343]: rng_pytz.tz
Out[343]: <DstTzInfo 'Europe/London' LMT-1 day, 23:59:00 STD>

# dateutil
In [344]: rng_dateutil = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
   .....:                              tz='dateutil/Europe/London')
   .....: 

In [345]: rng_dateutil.tz
Out[345]: tzfile('/usr/share/zoneinfo/Europe/London')

# dateutil - utc special case
In [346]: rng_utc = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
   .....:                         tz=dateutil.tz.tzutc())
   .....: 

In [347]: rng_utc.tz
Out[347]: tzutc()

请注意,UTC时区是dateutil中的特殊情况,应显式构造为dateutil.tz.tzutc的实例。您还可以首先显式地构建其他时区,这可以让您更好地控制使用哪个时区:

# pytz
In [348]: tz_pytz = pytz.timezone('Europe/London')

In [349]: rng_pytz = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
   .....:                          tz=tz_pytz)
   .....: 

In [350]: rng_pytz.tz == tz_pytz
Out[350]: True

# dateutil
In [351]: tz_dateutil = dateutil.tz.gettz('Europe/London')

In [352]: rng_dateutil = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
   .....:                              tz=tz_dateutil)
   .....: 

In [353]: rng_dateutil.tz == tz_dateutil
Out[353]: True

时间戳,像Python的datetime.datetime对象可以是时区naive或时区感知。原始时间系列和DatetimeIndex对象可以使用tz_localize localized

In [354]: ts = pd.Series(np.random.randn(len(rng)), rng)

In [355]: ts_utc = ts.tz_localize('UTC')

In [356]: ts_utc
Out[356]: 
2012-03-06 00:00:00+00:00    0.679135
2012-03-07 00:00:00+00:00    0.345668
2012-03-08 00:00:00+00:00   -1.143903
2012-03-09 00:00:00+00:00    0.487087
2012-03-10 00:00:00+00:00   -1.421073
2012-03-11 00:00:00+00:00   -0.327463
2012-03-12 00:00:00+00:00    0.169899
2012-03-13 00:00:00+00:00    0.867568
2012-03-14 00:00:00+00:00   -0.834122
2012-03-15 00:00:00+00:00   -1.698494
2012-03-16 00:00:00+00:00    0.974717
2012-03-17 00:00:00+00:00    0.966771
2012-03-18 00:00:00+00:00   -0.754168
2012-03-19 00:00:00+00:00   -1.434246
2012-03-20 00:00:00+00:00    0.848935
Freq: D, dtype: float64

同样,您可以首先显式构造时区对象。您可以使用tz_convert方法转换pandas对象将tz感知数据转换为另一个时区:

In [357]: ts_utc.tz_convert('US/Eastern')
Out[357]: 
2012-03-05 19:00:00-05:00    0.679135
2012-03-06 19:00:00-05:00    0.345668
2012-03-07 19:00:00-05:00   -1.143903
2012-03-08 19:00:00-05:00    0.487087
2012-03-09 19:00:00-05:00   -1.421073
2012-03-10 19:00:00-05:00   -0.327463
2012-03-11 20:00:00-04:00    0.169899
2012-03-12 20:00:00-04:00    0.867568
2012-03-13 20:00:00-04:00   -0.834122
2012-03-14 20:00:00-04:00   -1.698494
2012-03-15 20:00:00-04:00    0.974717
2012-03-16 20:00:00-04:00    0.966771
2012-03-17 20:00:00-04:00   -0.754168
2012-03-18 20:00:00-04:00   -1.434246
2012-03-19 20:00:00-04:00    0.848935
Freq: D, dtype: float64

警告

警惕图书馆之间的转换。对于某些区域pytzdateutil有不同的区域定义。对于不同的时区,这比“标准”区域(例如US/Eastern)更容易出现问题。

警告

请注意,时区库版本之间的时区定义可能不被视为相等。当使用使用一个版本进行本地化并使用不同版本进行操作的存储数据时,这可能会导致问题。有关如何处理此类情况,请参见here

警告

It is incorrect to pass a timezone directly into the datetime.datetime constructor (e.g., datetime.datetime(2011, 1, 1, tz=timezone('US/Eastern')). 相反,datetime需要使用localzone方法对时区进行本地化。

在引擎盖下,所有时间戳都以UTC存储。来自具有时区的DatetimeIndex的标量值将使其字段(日,小时,分钟)本地化到时区。但是,具有相同UTC值的时间戳仍被视为相等,即使它们在不同的时区:

In [358]: rng_eastern = rng_utc.tz_convert('US/Eastern')

In [359]: rng_berlin = rng_utc.tz_convert('Europe/Berlin')

In [360]: rng_eastern[5]
Out[360]: Timestamp('2012-03-10 19:00:00-0500', tz='US/Eastern', freq='D')

In [361]: rng_berlin[5]
Out[361]: Timestamp('2012-03-11 01:00:00+0100', tz='Europe/Berlin', freq='D')

In [362]: rng_eastern[5] == rng_berlin[5]
Out[362]: True

Like Series, DataFrame, and DatetimeIndex, Timestamp``s can be converted to other time zones using ``tz_convert:

In [363]: rng_eastern[5]
Out[363]: Timestamp('2012-03-10 19:00:00-0500', tz='US/Eastern', freq='D')

In [364]: rng_berlin[5]
Out[364]: Timestamp('2012-03-11 01:00:00+0100', tz='Europe/Berlin', freq='D')

In [365]: rng_eastern[5].tz_convert('Europe/Berlin')
Out[365]: Timestamp('2012-03-11 01:00:00+0100', tz='Europe/Berlin')

Timestamp的功能与DatetimeIndexSeries相同:

In [366]: rng[5]
Out[366]: Timestamp('2012-03-11 00:00:00', freq='D')

In [367]: rng[5].tz_localize('Asia/Shanghai')
Out[367]: Timestamp('2012-03-11 00:00:00+0800', tz='Asia/Shanghai')

在不同时区中的系列之间的操作将生成UTC系列,对齐UTC时间戳上的数据:

In [368]: eastern = ts_utc.tz_convert('US/Eastern')

In [369]: berlin = ts_utc.tz_convert('Europe/Berlin')

In [370]: result = eastern + berlin

In [371]: result
Out[371]: 
2012-03-06 00:00:00+00:00    1.358269
2012-03-07 00:00:00+00:00    0.691336
2012-03-08 00:00:00+00:00   -2.287805
2012-03-09 00:00:00+00:00    0.974174
2012-03-10 00:00:00+00:00   -2.842146
2012-03-11 00:00:00+00:00   -0.654926
2012-03-12 00:00:00+00:00    0.339798
2012-03-13 00:00:00+00:00    1.735136
2012-03-14 00:00:00+00:00   -1.668245
2012-03-15 00:00:00+00:00   -3.396988
2012-03-16 00:00:00+00:00    1.949435
2012-03-17 00:00:00+00:00    1.933541
2012-03-18 00:00:00+00:00   -1.508335
2012-03-19 00:00:00+00:00   -2.868493
2012-03-20 00:00:00+00:00    1.697870
Freq: D, dtype: float64

In [372]: result.index
Out[372]: 
DatetimeIndex(['2012-03-06', '2012-03-07', '2012-03-08', '2012-03-09',
               '2012-03-10', '2012-03-11', '2012-03-12', '2012-03-13',
               '2012-03-14', '2012-03-15', '2012-03-16', '2012-03-17',
               '2012-03-18', '2012-03-19', '2012-03-20'],
              dtype='datetime64[ns, UTC]', freq='D')

要从tz感知DatetimeIndex中删除时区,请使用tz_localize(None)tz_convert(None)tz_localize(None)将删除保存本地时间表示的时区。tz_convert(None)将在转换为UTC时间后删除时区。

In [373]: didx = pd.DatetimeIndex(start='2014-08-01 09:00', freq='H', periods=10, tz='US/Eastern')

In [374]: didx
Out[374]: 
DatetimeIndex(['2014-08-01 09:00:00-04:00', '2014-08-01 10:00:00-04:00',
               '2014-08-01 11:00:00-04:00', '2014-08-01 12:00:00-04:00',
               '2014-08-01 13:00:00-04:00', '2014-08-01 14:00:00-04:00',
               '2014-08-01 15:00:00-04:00', '2014-08-01 16:00:00-04:00',
               '2014-08-01 17:00:00-04:00', '2014-08-01 18:00:00-04:00'],
              dtype='datetime64[ns, US/Eastern]', freq='H')

In [375]: didx.tz_localize(None)
Out[375]: 
DatetimeIndex(['2014-08-01 09:00:00', '2014-08-01 10:00:00',
               '2014-08-01 11:00:00', '2014-08-01 12:00:00',
               '2014-08-01 13:00:00', '2014-08-01 14:00:00',
               '2014-08-01 15:00:00', '2014-08-01 16:00:00',
               '2014-08-01 17:00:00', '2014-08-01 18:00:00'],
              dtype='datetime64[ns]', freq='H')

In [376]: didx.tz_convert(None)
Out[376]: 
DatetimeIndex(['2014-08-01 13:00:00', '2014-08-01 14:00:00',
               '2014-08-01 15:00:00', '2014-08-01 16:00:00',
               '2014-08-01 17:00:00', '2014-08-01 18:00:00',
               '2014-08-01 19:00:00', '2014-08-01 20:00:00',
               '2014-08-01 21:00:00', '2014-08-01 22:00:00'],
              dtype='datetime64[ns]', freq='H')

# tz_convert(None) is identical with tz_convert('UTC').tz_localize(None)
In [377]: didx.tz_convert('UCT').tz_localize(None)
Out[377]: 
DatetimeIndex(['2014-08-01 13:00:00', '2014-08-01 14:00:00',
               '2014-08-01 15:00:00', '2014-08-01 16:00:00',
               '2014-08-01 17:00:00', '2014-08-01 18:00:00',
               '2014-08-01 19:00:00', '2014-08-01 20:00:00',
               '2014-08-01 21:00:00', '2014-08-01 22:00:00'],
              dtype='datetime64[ns]', freq='H')

Ambiguous Times when Localizing

在某些情况下,localize不能确定DST和非DST小时有重复时。这通常发生在读取简单地重复小时的文件或数据库记录时。tz_localize中传递ambiguous='infer'infer_dst参数)将尝试确定正确的偏移量。下面的顶部示例将失败,因为它包含模糊的时间,底部将推断正确的偏移量。

In [378]: rng_hourly = pd.DatetimeIndex(['11/06/2011 00:00', '11/06/2011 01:00',
   .....:                                '11/06/2011 01:00', '11/06/2011 02:00',
   .....:                                '11/06/2011 03:00'])
   .....: 

这将失败,因为有模糊的时间

In [2]: rng_hourly.tz_localize('US/Eastern')
AmbiguousTimeError: Cannot infer dst time from Timestamp('2011-11-06 01:00:00'), try using the 'ambiguous' argument

推断模糊的时间

In [379]: rng_hourly_eastern = rng_hourly.tz_localize('US/Eastern', ambiguous='infer')

In [380]: rng_hourly_eastern.tolist()
Out[380]: 
[Timestamp('2011-11-06 00:00:00-0400', tz='US/Eastern'),
 Timestamp('2011-11-06 01:00:00-0400', tz='US/Eastern'),
 Timestamp('2011-11-06 01:00:00-0500', tz='US/Eastern'),
 Timestamp('2011-11-06 02:00:00-0500', tz='US/Eastern'),
 Timestamp('2011-11-06 03:00:00-0500', tz='US/Eastern')]

除了'infer',还有几个其他的参数支持。传递bool或0s / 1的数组,其中True表示DST小时,False表示非DST小时,允许区分多个DST转换(例如,如果在数据库中有多个记录,每个记录都有自己的DST转换)。或者传递“NaT”将用非一次性值填充转换时间。这些方法在DatetimeIndex构造函数以及tz_localize中提供。

In [381]: rng_hourly_dst = np.array([1, 1, 0, 0, 0])

In [382]: rng_hourly.tz_localize('US/Eastern', ambiguous=rng_hourly_dst).tolist()
Out[382]: 
[Timestamp('2011-11-06 00:00:00-0400', tz='US/Eastern'),
 Timestamp('2011-11-06 01:00:00-0400', tz='US/Eastern'),
 Timestamp('2011-11-06 01:00:00-0500', tz='US/Eastern'),
 Timestamp('2011-11-06 02:00:00-0500', tz='US/Eastern'),
 Timestamp('2011-11-06 03:00:00-0500', tz='US/Eastern')]

In [383]: rng_hourly.tz_localize('US/Eastern', ambiguous='NaT').tolist()
Out[383]: 
[Timestamp('2011-11-06 00:00:00-0400', tz='US/Eastern'),
 NaT,
 NaT,
 Timestamp('2011-11-06 02:00:00-0500', tz='US/Eastern'),
 Timestamp('2011-11-06 03:00:00-0500', tz='US/Eastern')]

In [384]: didx = pd.DatetimeIndex(start='2014-08-01 09:00', freq='H', periods=10, tz='US/Eastern')

In [385]: didx
Out[385]: 
DatetimeIndex(['2014-08-01 09:00:00-04:00', '2014-08-01 10:00:00-04:00',
               '2014-08-01 11:00:00-04:00', '2014-08-01 12:00:00-04:00',
               '2014-08-01 13:00:00-04:00', '2014-08-01 14:00:00-04:00',
               '2014-08-01 15:00:00-04:00', '2014-08-01 16:00:00-04:00',
               '2014-08-01 17:00:00-04:00', '2014-08-01 18:00:00-04:00'],
              dtype='datetime64[ns, US/Eastern]', freq='H')

In [386]: didx.tz_localize(None)
Out[386]: 
DatetimeIndex(['2014-08-01 09:00:00', '2014-08-01 10:00:00',
               '2014-08-01 11:00:00', '2014-08-01 12:00:00',
               '2014-08-01 13:00:00', '2014-08-01 14:00:00',
               '2014-08-01 15:00:00', '2014-08-01 16:00:00',
               '2014-08-01 17:00:00', '2014-08-01 18:00:00'],
              dtype='datetime64[ns]', freq='H')

In [387]: didx.tz_convert(None)
Out[387]: 
DatetimeIndex(['2014-08-01 13:00:00', '2014-08-01 14:00:00',
               '2014-08-01 15:00:00', '2014-08-01 16:00:00',
               '2014-08-01 17:00:00', '2014-08-01 18:00:00',
               '2014-08-01 19:00:00', '2014-08-01 20:00:00',
               '2014-08-01 21:00:00', '2014-08-01 22:00:00'],
              dtype='datetime64[ns]', freq='H')

# tz_convert(None) is identical with tz_convert('UTC').tz_localize(None)
In [388]: didx.tz_convert('UCT').tz_localize(None)
Out[388]: 
DatetimeIndex(['2014-08-01 13:00:00', '2014-08-01 14:00:00',
               '2014-08-01 15:00:00', '2014-08-01 16:00:00',
               '2014-08-01 17:00:00', '2014-08-01 18:00:00',
               '2014-08-01 19:00:00', '2014-08-01 20:00:00',
               '2014-08-01 21:00:00', '2014-08-01 22:00:00'],
              dtype='datetime64[ns]', freq='H')

TZ aware Dtypes

版本0.17.0中的新功能。

具有时区naive值的Series/DatetimeIndexdatetime64[ns]的dtype表示。

In [389]: s_naive = pd.Series(pd.date_range('20130101',periods=3))

In [390]: s_naive
Out[390]: 
0   2013-01-01
1   2013-01-02
2   2013-01-03
dtype: datetime64[ns]

具有时区感知值的Series/DatetimeIndexdatetime64 [ns, tz] t5 >

In [391]: s_aware = pd.Series(pd.date_range('20130101',periods=3,tz='US/Eastern'))

In [392]: s_aware
Out[392]: 
0   2013-01-01 00:00:00-05:00
1   2013-01-02 00:00:00-05:00
2   2013-01-03 00:00:00-05:00
dtype: datetime64[ns, US/Eastern]

这两个Series都可以通过.dt访问器操作,参见here

例如,将本地化邮票本地化并转换为感知时区。

In [393]: s_naive.dt.tz_localize('UTC').dt.tz_convert('US/Eastern')
Out[393]: 
0   2012-12-31 19:00:00-05:00
1   2013-01-01 19:00:00-05:00
2   2013-01-02 19:00:00-05:00
dtype: datetime64[ns, US/Eastern]

此外,您还可以.astype(...)时区感知(和天真)。这个操作实际上是一个本地化AND转换一个朴素的邮票,并转换一个意识邮票。

# localize and convert a naive timezone
In [394]: s_naive.astype('datetime64[ns, US/Eastern]')
Out[394]: 
0   2012-12-31 19:00:00-05:00
1   2013-01-01 19:00:00-05:00
2   2013-01-02 19:00:00-05:00
dtype: datetime64[ns, US/Eastern]

# make an aware tz naive
In [395]: s_aware.astype('datetime64[ns]')
Out[395]: 
0   2013-01-01 05:00:00
1   2013-01-02 05:00:00
2   2013-01-03 05:00:00
dtype: datetime64[ns]

# convert to a new timezone
In [396]: s_aware.astype('datetime64[ns, CET]')
Out[396]: 
0   2013-01-01 06:00:00+01:00
1   2013-01-02 06:00:00+01:00
2   2013-01-03 06:00:00+01:00
dtype: datetime64[ns, CET]

注意

使用Series上的.values访问器,返回数据的numpy数组。这些值将转换为UTC,因为numpy当前不支持时区(即使在本地时区打印)!)。

In [397]: s_naive.values
Out[397]: 
array(['2013-01-01T00:00:00.000000000', '2013-01-02T00:00:00.000000000',
       '2013-01-03T00:00:00.000000000'], dtype='datetime64[ns]')

In [398]: s_aware.values
Out[398]: 
array(['2013-01-01T05:00:00.000000000', '2013-01-02T05:00:00.000000000',
       '2013-01-03T05:00:00.000000000'], dtype='datetime64[ns]')

还要注意,一旦转换为numpy数组,这些将丢失tz的时间。

In [399]: pd.Series(s_aware.values)
Out[399]: 
0   2013-01-01 05:00:00
1   2013-01-02 05:00:00
2   2013-01-03 05:00:00
dtype: datetime64[ns]

然而,这些可以容易地转换

In [400]: pd.Series(s_aware.values).dt.tz_localize('UTC').dt.tz_convert('US/Eastern')
Out[400]: 
0   2013-01-01 00:00:00-05:00
1   2013-01-02 00:00:00-05:00
2   2013-01-03 00:00:00-05:00
dtype: datetime64[ns, US/Eastern]