econometrics — Statistical Testing

Statistical testing and regression tools. Run these tests before building any price model to ensure appropriate model selection.

Standard pre-modelling test sequence:

  1. adfTest + kpssTest → stationarity decision

  2. ljungBox → serial correlation check

  3. grangerCausality → proxy/factor selection

  4. ols → factor regression and grade surfaces

Key Functions

ols(y, X)

OLS regression with robust (HC3) standard errors. Returns {coefficients, intercept, rSquared, tStats, pValues, residuals}.

adfTest(series)

Augmented Dickey-Fuller unit root test. Returns {adfStat, pValue, isStationary}. isStationary=True → use OU/ARMA; False → use GBM or difference the series.

kpssTest(series)

KPSS stationarity test (null = stationary). Returns {kpssStat, pValue, isStationary}. Use together with ADF: ADF rejects unit root AND KPSS fails to reject → strong stationarity evidence.

grangerCausality(y, x, maxlag=4)

Tests whether x Granger-causes y. Returns {fStat, pValue, granger_causes} per lag. Use for proxy selection: choose proxies that Granger-cause the target series.

ljungBox(series, lags=10)

Ljung-Box autocorrelation test. Returns {qStats, pValues} at each lag. Significant autocorrelation → use ARMA or include lagged terms in regression.

whiteTest(y, X)

White’s heteroskedasticity test. Returns {testStat, pValue, isHomoskedastic}.

breuschPaganTest(y, X)

Breusch-Pagan heteroskedasticity test.

durbinWatson(residuals)

Durbin-Watson autocorrelation statistic. Values near 2.0 indicate no autocorrelation.

import sipQuant as sq
import numpy as np

prices = np.array([182.0, 184.5, 187.0, 186.0, 185.5, 183.0, 187.5, 189.0])

# Stationarity tests
adf  = sq.econometrics.adfTest(prices)
kpss = sq.econometrics.kpssTest(prices)
print(f"ADF stationary:  {adf['isStationary']}  (p={adf['pValue']:.4f})")
print(f"KPSS stationary: {kpss['isStationary']}  (p={kpss['pValue']:.4f})")

# Autocorrelation check
returns = np.diff(np.log(prices))
lb = sq.econometrics.ljungBox(returns, lags=5)
print(f"Ljung-Box p-values: {lb['pValues'].round(4)}")

# Proxy selection via Granger causality
# e.g. test if CME Corn Granger-causes local feed grain price
corn_prices = np.array([420.0, 425.0, 422.0, 430.0, 428.0, 435.0, 432.0, 440.0])
gc = sq.econometrics.grangerCausality(prices, corn_prices, maxlag=2)
print(f"Corn Granger-causes hay? {gc[1]['granger_causes']}  (p={gc[1]['pValue']:.4f})")

# Grade surface regression
# Regress adjusted price on grade factors
moisture = np.array([14.0, 15.0, 13.0, 16.0, 14.5, 15.5, 13.5, 14.0])
X = np.column_stack([moisture])
result = sq.econometrics.ols(prices, X)
print(f"Moisture coefficient: {result['coefficients'][0]:.4f}")
print(f"R-squared:            {result['rSquared']:.4f}")