econometrics — Statistical Testing
====================================

.. module:: sipQuant.econometrics

Statistical testing and regression tools. Run these tests before building
any price model to ensure appropriate model selection.

**Standard pre-modelling test sequence:**

1. ``adfTest`` + ``kpssTest`` → stationarity decision
2. ``ljungBox`` → serial correlation check
3. ``grangerCausality`` → proxy/factor selection
4. ``ols`` → factor regression and grade surfaces

Key Functions
-------------

``ols(y, X)``
    OLS regression with robust (HC3) standard errors.
    Returns ``{coefficients, intercept, rSquared, tStats, pValues, residuals}``.

``adfTest(series)``
    Augmented Dickey-Fuller unit root test. Returns ``{adfStat, pValue, isStationary}``.
    ``isStationary=True`` → use OU/ARMA; ``False`` → use GBM or difference the series.

``kpssTest(series)``
    KPSS stationarity test (null = stationary). Returns ``{kpssStat, pValue, isStationary}``.
    Use together with ADF: ADF rejects unit root AND KPSS fails to reject → strong stationarity evidence.

``grangerCausality(y, x, maxlag=4)``
    Tests whether ``x`` Granger-causes ``y``. Returns ``{fStat, pValue, granger_causes}`` per lag.
    Use for proxy selection: choose proxies that Granger-cause the target series.

``ljungBox(series, lags=10)``
    Ljung-Box autocorrelation test. Returns ``{qStats, pValues}`` at each lag.
    Significant autocorrelation → use ARMA or include lagged terms in regression.

``whiteTest(y, X)``
    White's heteroskedasticity test. Returns ``{testStat, pValue, isHomoskedastic}``.

``breuschPaganTest(y, X)``
    Breusch-Pagan heteroskedasticity test.

``durbinWatson(residuals)``
    Durbin-Watson autocorrelation statistic. Values near 2.0 indicate no autocorrelation.

.. code-block:: python

   import sipQuant as sq
   import numpy as np

   prices = np.array([182.0, 184.5, 187.0, 186.0, 185.5, 183.0, 187.5, 189.0])

   # Stationarity tests
   adf  = sq.econometrics.adfTest(prices)
   kpss = sq.econometrics.kpssTest(prices)
   print(f"ADF stationary:  {adf['isStationary']}  (p={adf['pValue']:.4f})")
   print(f"KPSS stationary: {kpss['isStationary']}  (p={kpss['pValue']:.4f})")

   # Autocorrelation check
   returns = np.diff(np.log(prices))
   lb = sq.econometrics.ljungBox(returns, lags=5)
   print(f"Ljung-Box p-values: {lb['pValues'].round(4)}")

   # Proxy selection via Granger causality
   # e.g. test if CME Corn Granger-causes local feed grain price
   corn_prices = np.array([420.0, 425.0, 422.0, 430.0, 428.0, 435.0, 432.0, 440.0])
   gc = sq.econometrics.grangerCausality(prices, corn_prices, maxlag=2)
   print(f"Corn Granger-causes hay? {gc[1]['granger_causes']}  (p={gc[1]['pValue']:.4f})")

   # Grade surface regression
   # Regress adjusted price on grade factors
   moisture = np.array([14.0, 15.0, 13.0, 16.0, 14.5, 15.5, 13.5, 14.0])
   X = np.column_stack([moisture])
   result = sq.econometrics.ols(prices, X)
   print(f"Moisture coefficient: {result['coefficients'][0]:.4f}")
   print(f"R-squared:            {result['rSquared']:.4f}")