All Articles

Pytorch Fractional Differencing

If you want to predict financial markets, you will eventualy come to the task of normalizing your data. A lot of people use simple min max normalization or returns. There is a better way though, in Advances in Financial Machine Learning Marcos Lopez de Prado describes that when you are doing something like returns, you are losing more information than is necessary to make the dataset stationairy. I'm not going to go to much of the details here. What I want to to is expand on the implementations of fractional differencing that I have seen on the web, and how the speed can be increased significantly using pytorch.

Starting with the blog post GPU Fractional Differencing the author gives a cpu based way of fractional differencing following by a gpu based. I think two things can be improved by moving this implementation to pytorch.

  1. The code will run much faster on a cpu
  2. It will require few changes to move to a gpu

First lets install some libraries

!pip install torch numpy pandas python-binance
Requirement already satisfied: torch in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (1.7.1) Requirement already satisfied: numpy in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (1.18.4) Requirement already satisfied: pandas in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (1.1.5) Requirement already satisfied: python-binance in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (0.7.5) Requirement already satisfied: dataclasses; python_version < "3.7" in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from torch) (0.8) Requirement already satisfied: typing-extensions in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from torch) (3.7.4.3) Requirement already satisfied: python-dateutil>=2.7.3 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from pandas) (2.8.1) Requirement already satisfied: pytz>=2017.2 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from pandas) (2020.1) Requirement already satisfied: dateparser in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from python-binance) (1.0.0) Requirement already satisfied: autobahn in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from python-binance) (20.7.1) Requirement already satisfied: service-identity in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from python-binance) (18.1.0) Requirement already satisfied: urllib3 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from python-binance) (1.25.9) Requirement already satisfied: certifi in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from python-binance) (2020.4.5.1) Requirement already satisfied: requests in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from python-binance) (2.25.1) Requirement already satisfied: six in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from python-binance) (1.14.0) Requirement already satisfied: cryptography in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from python-binance) (2.9.2) Requirement already satisfied: Twisted in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from python-binance) (20.3.0) Requirement already satisfied: pyOpenSSL in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from python-binance) (20.0.0) Requirement already satisfied: regex!=2019.02.19 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from dateparser->python-binance) (2020.11.13) Requirement already satisfied: tzlocal in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from dateparser->python-binance) (2.1) Requirement already satisfied: txaio>=20.3.1 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from autobahn->python-binance) (20.4.1) Requirement already satisfied: attrs>=16.0.0 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from service-identity->python-binance) (20.1.0) Requirement already satisfied: pyasn1 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from service-identity->python-binance) (0.4.8) Requirement already satisfied: pyasn1-modules in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from service-identity->python-binance) (0.2.8) Requirement already satisfied: chardet<5,>=3.0.2 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from requests->python-binance) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from requests->python-binance) (2.9) Requirement already satisfied: cffi!=1.11.3,>=1.8 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from cryptography->python-binance) (1.14.0) Requirement already satisfied: incremental>=16.10.1 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from Twisted->python-binance) (17.5.0) Requirement already satisfied: hyperlink>=17.1.1 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from Twisted->python-binance) (20.0.1) Requirement already satisfied: constantly>=15.1 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from Twisted->python-binance) (15.1.0) Requirement already satisfied: zope.interface>=4.4.2 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from Twisted->python-binance) (5.2.0) Requirement already satisfied: PyHamcrest!=1.10.0,>=1.9.0 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from Twisted->python-binance) (2.0.2) Requirement already satisfied: Automat>=0.3.0 in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from Twisted->python-binance) (20.2.0) Requirement already satisfied: pycparser in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from cffi!=1.11.3,>=1.8->cryptography->python-binance) (2.20) Requirement already satisfied: setuptools in /Users/shillo/.pyenv/versions/3.6.10/lib/python3.6/site-packages (from zope.interface>=4.4.2->Twisted->python-binance) (47.1.1) You are using pip version 18.1, however version 21.0.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command.

First the original implementation

import pandas as pd
import numpy as np

def get_weights_floored(d, num_k, floor=1e-3):
    r"""Calculate weights ($w$) for each lag ($k$) through
    $w_k = -w_{k-1} \frac{d - k + 1}{k}$ provided weight above a minimum value
    (floor) for the weights to prevent computation of weights for the entire
    time series.

    Args:
        d (int): differencing value.
        num_k (int): number of lags (typically length of timeseries) to calculate w.
        floor (float): minimum value for the weights for computational efficiency.
    """
    w_k = np.array([1])
    k = 1

    while k < num_k:
        w_k_latest = -w_k[-1] * ((d - k + 1)) / k
        if abs(w_k_latest) <= floor:
            break

        w_k = np.append(w_k, w_k_latest)

        k += 1

    w_k = w_k.reshape(-1, 1) 

    return w_k

def frac_diff(df, d, floor=1e-3):
    r"""Fractionally difference time series via CPU.

    Args:
        df (pd.DataFrame): dataframe of raw time series values.
        d (float): differencing value from 0 to 1 where > 1 has no FD.
        floor (float): minimum value of weights, ignoring anything smaller.
    """
    # Get weights window
    weights = get_weights_floored(d=d, num_k=len(df), floor=floor)
    weights_window_size = len(weights)

    # Reverse weights
    weights = weights[::-1]

    # Blank fractionally differenced series to be filled
    df_fd = []

    # Slide window of time series, to calculated fractionally differenced values
    # per window
    for idx in range(weights_window_size, df.shape[0]):
        # Dot product of weights and original values
        # to get fractionally differenced values
        date_idx = df.index[idx]
        df_fd.append(np.dot(weights.T, df.iloc[idx - weights_window_size:idx]).item())

    # Return FD values and weights
    df_fd = pd.DataFrame(df_fd)

    return df_fd, weights

Now the code converted to use pytorch

import torch
def get_weights_floored(d, num_k, floor=1e-3):
    r"""Calculate weights ($w$) for each lag ($k$) through
    $w_k = -w_{k-1} \frac{d - k + 1}{k}$ provided weight above a minimum value
    (floor) for the weights to prevent computation of weights for the entire
    time series.
    
    Args:
        d (int): differencing value.
        num_k (int): number of lags (typically length of timeseries) to calculate w.
        floor (float): minimum value for the weights for computational efficiency.
    """
    w_k = torch.DoubleTensor([1.])
    k = 1
    
    while k < num_k:
        w_k_latest = -w_k[-1] * ((d - k + 1)) / k
        if abs(w_k_latest) <= floor:
            break

        w_k = torch.cat((w_k, w_k_latest.unsqueeze(0)))
        
        k += 1
    
    return w_k.flip(0)

def frac_diff_fast(df, d, floor=1e-3, weights=None):
    r"""Fractionally difference time series via CPU.
    
    Args:
        df (pd.DataFrame): dataframe of raw time series values.
        d (float): differencing value from 0 to 1 where > 1 has no FD.
        floor (float): minimum value of weights, ignoring anything smaller.
    """
    # Get weights window
    if weights is None:
        weights = get_weights_floored(d=d, num_k=len(df), floor=floor)
    weights_window_size = len(weights)

    

    df_fd = torch.DoubleTensor([])
    for idx in range(weights_window_size, df.shape[0] + 1):
        df_fd = torch.cat((df_fd, torch.dot(weights, df[idx - weights_window_size:idx]).unsqueeze(0)))

    

To test this out lets get some bitcoin prices from binance

from binance.client import Client
client = Client()
klines = client.get_historical_klines('BTCUSDT', Client.KLINE_INTERVAL_1HOUR, "1 Jan, 2019")
df = pd.DataFrame(klines, columns=['opentime', 'open', 'high', 'low', 'close', 'volume', 'closetime', 'assetvol', 'numbertrades', 'takerbasevol', 'takerquotevol', 'ignore']).astype(np.float64)
df.head()
opentime open high low close volume closetime assetvol numbertrades takerbasevol takerquotevol ignore
0 1.546301e+12 3701.23 3713.00 3689.88 3700.31 686.367420 1.546304e+12 2.539069e+06 5534.0 370.855314 1.371962e+06 0.0
1 1.546304e+12 3700.20 3702.73 3684.22 3689.69 613.539115 1.546308e+12 2.266700e+06 5086.0 320.644448 1.184519e+06 0.0
2 1.546308e+12 3689.67 3695.95 3675.04 3690.00 895.302181 1.546312e+12 3.302044e+06 6391.0 471.857118 1.740469e+06 0.0
3 1.546312e+12 3690.00 3699.77 3685.78 3693.13 796.714818 1.546315e+12 2.942422e+06 5709.0 459.948381 1.698857e+06 0.0
4 1.546315e+12 3692.32 3720.00 3685.94 3692.71 1317.452909 1.546319e+12 4.872937e+06 7908.0 770.995533 2.852106e+06 0.0

Now Let's test benchmark the two methods

import time
start = time.time()
frac_diff(df[['close']], d=0.5)
duration = time.time() - start
f"Numpy Frac Diff took {duration}"
'Numpy Frac Diff took 1.0498089790344238'
start = time.time()
data = torch.DoubleTensor(df[['close']].values).flatten()
frac_diff_fast(data, d=0.5)
duration = time.time() - start
f"Pytorch Frac Diff took {duration}"
'Pytorch Frac Diff took 0.3252909183502197'

You can see with a change of few lines, you achieve a 3x speed, and moving to a gpu is trivial. Given this speed increase, I'm going to consider moving a lot of stuff I've been doing in numpy to pytorch in the future