econometrics — Statistical Testing

Statistical testing and regression tools. Run these tests before building any price model to ensure appropriate model selection.

Standard pre-modelling test sequence:

adfTest + kpssTest → stationarity decision
ljungBox → serial correlation check
grangerCausality → proxy/factor selection
ols → factor regression and grade surfaces

Key Functions

ols(y, X): OLS regression with robust (HC3) standard errors. Returns {coefficients, intercept, rSquared, tStats, pValues, residuals}.
adfTest(series): Augmented Dickey-Fuller unit root test. Returns {adfStat, pValue, isStationary}. isStationary=True → use OU/ARMA; False → use GBM or difference the series.
kpssTest(series): KPSS stationarity test (null = stationary). Returns {kpssStat, pValue, isStationary}. Use together with ADF: ADF rejects unit root AND KPSS fails to reject → strong stationarity evidence.
grangerCausality(y, x, maxlag=4): Tests whether x Granger-causes y. Returns {fStat, pValue, granger_causes} per lag. Use for proxy selection: choose proxies that Granger-cause the target series.
ljungBox(series, lags=10): Ljung-Box autocorrelation test. Returns {qStats, pValues} at each lag. Significant autocorrelation → use ARMA or include lagged terms in regression.
whiteTest(y, X): White’s heteroskedasticity test. Returns {testStat, pValue, isHomoskedastic}.
breuschPaganTest(y, X): Breusch-Pagan heteroskedasticity test.
durbinWatson(residuals): Durbin-Watson autocorrelation statistic. Values near 2.0 indicate no autocorrelation.

import sipQuant as sq
import numpy as np

prices = np.array([182.0, 184.5, 187.0, 186.0, 185.5, 183.0, 187.5, 189.0])

# Stationarity tests
adf  = sq.econometrics.adfTest(prices)
kpss = sq.econometrics.kpssTest(prices)
print(f"ADF stationary:  {adf['isStationary']}  (p={adf['pValue']:.4f})")
print(f"KPSS stationary: {kpss['isStationary']}  (p={kpss['pValue']:.4f})")

# Autocorrelation check
returns = np.diff(np.log(prices))
lb = sq.econometrics.ljungBox(returns, lags=5)
print(f"Ljung-Box p-values: {lb['pValues'].round(4)}")

# Proxy selection via Granger causality
# e.g. test if CME Corn Granger-causes local feed grain price
corn_prices = np.array([420.0, 425.0, 422.0, 430.0, 428.0, 435.0, 432.0, 440.0])
gc = sq.econometrics.grangerCausality(prices, corn_prices, maxlag=2)
print(f"Corn Granger-causes hay? {gc[1]['granger_causes']}  (p={gc[1]['pValue']:.4f})")

# Grade surface regression
# Regress adjusted price on grade factors
moisture = np.array([14.0, 15.0, 13.0, 16.0, 14.5, 15.5, 13.5, 14.0])
X = np.column_stack([moisture])
result = sq.econometrics.ols(prices, X)
print(f"Moisture coefficient: {result['coefficients'][0]:.4f}")
print(f"R-squared:            {result['rSquared']:.4f}")