This vignette reproduces the industrial-production (IP) growth column of Table 4 in Huang, Jiang, Li, Tong, and Zhou (2022). The exercise forecasts one-month-ahead IP growth using factors extracted from 123 transformed FRED-MD macro variables, and compares two dimension-reduction routes:
- PCA, the standard principal-component factors of the predictor matrix, which ignore the forecast target.
- sPCA, the scaled variant introduced by Huang et al. (2022). Before extracting components, each predictor is scaled by the OLS slope from regressing the target on that predictor. Predictors with stronger predictive content for the target receive larger weight; the irrelevant ones get shrunk toward zero. PCA on the scaled predictors therefore concentrates forecasting-relevant information into the first few factors.
The reported metric is the out-of-sample \(R^2\), \(R^2_{OS}\) (%), relative to an AR benchmark with SIC-selected lag order.
Data
The package ships the authors’ original data:
-
huang2022_macro: a \(720 \times 123\) matrix of transformed FRED-MD predictors (January 1960 to December 2019). -
huang2022_ip: a 720-vector of monthly IP growth (log-differences of the IP index).
Methodology
The forecasting exercise is an expanding-window pseudo out-of-sample experiment that mirrors the paper:
- Initial estimation window: January 1960 to December 1984 (300 observations).
- Out-of-sample period: January 1985 to December 2019 (420 forecasts).
- Benchmark: AR with lag order chosen by SIC (maximum lag 1).
- Factor models: ARDL with the AR lags plus one lag of the PCA or sPCA factors.
For sPCA, the scaling regression uses the predictive
alignment \(y_{t+1} \sim X_{i,t}\) —
that is, today’s predictor is regressed on next month’s target. To
dampen extreme slopes the absolute scaling coefficients are winsorised
at the 90th percentile, matching the authors’ MATLAB code.
spca_est() supports this directly: when
length(target) < nrow(X), the first
length(target) rows are used for the scaling regression
while all rows of X are standardised and used for
factor extraction. This is what makes the predictive alignment possible
without losing observations from the factor panel.
Out-of-sample loop
The function below runs the recursive forecast. At each step it:
- selects an AR lag order via
select_ar_lag_sic()and produces the AR-only forecast; - extracts up to
nfac_maxPCA factors from the standardised predictors and up tonfac_maxsPCA factors with the predictive alignment and winsorisation just described; - builds an ARDL forecast for each number of factors using both factor
sets, with the AR coefficients estimated jointly via
estimate_ardl_multi().
The loop runs ~420 iterations and takes a few minutes, so it is set
to eval = FALSE here:
run_oos <- function(y, Z, h = 1, p_max = 1, nfac_max = 5) {
TT <- length(y)
M <- (1984 - 1959) * 12
NN <- TT - M
FC_AR <- rep(NA, NN - (h - 1))
FC_PCA <- matrix(NA, NN - (h - 1), nfac_max)
FC_sPCA <- matrix(NA, NN - (h - 1), nfac_max)
actual_y <- rep(NA, NN - (h - 1))
for (n in seq_len(NN - (h - 1))) {
actual_y[n] <- mean(y[(M + n):(M + n + h - 1)])
y_n <- y[1:(M + n - 1)]
Z_n <- Z[1:(M + n - 1), ]
Zs_n <- oos_standardize(Z_n)
T_n <- length(y_n)
y_n_h <- vapply(
seq_len(T_n - (h - 1)),
function(t) mean(y_n[t:(t + h - 1)]),
numeric(1)
)
# --- AR benchmark with SIC lag selection ---
p_ar <- select_ar_lag_sic(y_n, h, p_max)
if (p_ar > 0L) {
ar_out <- estimate_ar_res(y_n, h, p_ar)
y_n_last <- rev(y_n[(T_n - p_ar + 1):T_n])
FC_AR[n] <- sum(c(1, y_n_last) * ar_out$a_hat)
} else {
FC_AR[n] <- mean(y_n)
}
# --- PCA factors ---
pca_fit <- pca_est(X = Zs_n, nfac = nfac_max)
z_pc_n <- predict(pca_fit, Zs_n)
# --- sPCA factors (predictive alignment + winsorization) ---
spca_fit <- spca_est(
target = y_n_h[2:length(y_n_h)],
X = Z_n,
nfac = nfac_max,
winsorize = TRUE,
winsor_probs = c(0, 90)
)
z_trans_n <- predict(spca_fit, Z_n)
# --- ARDL forecast for each number of factors ---
for (cc in seq_len(nfac_max)) {
for (jj in 1:2) {
z_f <- if (jj == 1) {
z_pc_n[, 1:cc, drop = FALSE]
} else {
z_trans_n[, 1:cc, drop = FALSE]
}
p_ardl <- c(p_ar, 1)
if (p_ar > 0L) {
c_hat <- estimate_ardl_multi(y_n, z_f, h, p_ardl)
y_n_last <- rev(y_n[(T_n - p_ar + 1):T_n])
fc <- sum(c(1, y_n_last, z_f[T_n, ]) * c_hat)
} else {
dep <- y_n_h[2:length(y_n_h)]
reg <- cbind(1, z_f[1:(length(y_n_h) - 1 - (h - 1)), 1:cc])
c_hat <- lm.fit(x = reg, y = dep)$coefficients
fc <- sum(c(1, z_f[T_n, 1:cc]) * c_hat)
}
if (jj == 1) FC_PCA[n, cc] <- fc
if (jj == 2) FC_sPCA[n, cc] <- fc
}
}
}
# R²_OS for each number of factors
r2_pca <- r2_spca <- numeric(nfac_max)
sse_ar <- sum((actual_y - FC_AR)^2)
for (cc in seq_len(nfac_max)) {
r2_pca[cc] <- 100 * (1 - sum((actual_y - FC_PCA[, cc])^2) / sse_ar)
r2_spca[cc] <- 100 * (1 - sum((actual_y - FC_sPCA[, cc])^2) / sse_ar)
}
data.frame(K = seq_len(nfac_max), PCA = round(r2_pca, 2), sPCA = round(r2_spca, 2))
}
# Run
res <- run_oos(huang2022_ip, huang2022_macro, h = 1, p_max = 1, nfac_max = 5)
print(res)Results
Running the code above produces:
K PCA sPCA
1 8.97 9.65
2 8.06 10.68
3 8.22 11.09
4 7.99 11.97
5 7.88 13.17
With five factors, PCA reaches \(R^2_{OS}\) = 7.88% and sPCA reaches \(R^2_{OS}\) = 13.17% — both matching the paper to two decimals. sPCA dominates PCA at every factor count, which is precisely the result the paper highlights: when many candidate predictors are weak or irrelevant, weighting each one by its target-predictive slope lets PCA focus on the components that actually carry forecasting power.
Key spca_est() features used
Predictive alignment. Passing a
targetthat is one observation shorter thanX(i.e. \(T-1\) versus \(T\)) makes the scaling regression pair \(X_{i,t}\) with \(y_{t+1}\), while factors are still extracted from the full \(T\)-row predictor matrix.Winsorisation.
winsorize = TRUEwithwinsor_probs = c(0, 90)caps the absolute scaling slopes at their 90th percentile, reproducing the trimming used in the authors’ MATLAB code.predict(). Projects the trainingXonto the estimated sPCA loadings using the training-window standardisation and scaling, so in-sample and out-of-sample factor draws are constructed consistently.
