Time Series Forecasting Through the Lens of Supervised Learning
A portfolio-style guide: problem formulation, multi-step strategies, preprocessing, baselines, equations, and code.
For recruiters (quick scan)
I treat forecasting as a supervised learning problem with strict respect for temporal ordering. I build robust pipelines around:
- Clear problem formulation: univariate/multivariate inputs and outputs, single-step vs multi-step horizons.
- Strong baselines: ARIMA/ETS or simple persistence benchmarks before deep learning.
- Time-aware validation: walk-forward / rolling evaluation (not random cross-validation).
- Data readiness: missing-value handling, normalization, detrending, differencing when needed.
- Deployable modeling choices: prefer simpler classical methods when accuracy is comparable.
Forecasting as a supervised learning problem
Forecasting can be formulated as regression (continuous targets like temperature, prices) or as classification (discrete targets like “sunny” vs “not sunny”). The learning setup is supervised: we map historical inputs to future outputs—while preserving time order.
Input–output structures in time series
Definitions
- Univariate input: one sequence is used as input.
- Multivariate input: several sequences are used as input.
- Univariate output: one sequence is predicted as output.
- Multivariate output: several sequences are predicted as output.
Concrete example
Predicting the number of sunny days in a new month using the history of: sunny days, temperature, and atmospheric pressure is a multivariate input – univariate output task.
In multivariate input–univariate output forecasting, one series is often treated as the primary target and others as secondary explanatory signals.
For multivariate input–multivariate output tasks, two broad modeling patterns are common: many-to-one vs many-to-many (depending on whether we output one step or a sequence).
Single-step vs multi-step forecasting
A single-step model predicts only the next value. A multi-step model predicts several future values—often harder due to error propagation.
Three ways to build multi-step forecasts
- Single multi-step model: one model outputs all horizons at once.
- Multiple single-step models (direct strategy): one model per horizon.
- Recursive single-step model: reuse one model repeatedly, feeding predictions back as inputs.
Ways to fill missing values (N/A)
Common strategies include:
- Linear interpolation
- Fill with time series mean
- Fill with zeros (only if zero is meaningful)
- Fill with the next known value (back-fill)
- Fill with the last known value (forward-fill)
Core preprocessing methods
- Normalization / scaling (stabilizes optimization, comparable feature ranges)
- Trend removal (detrending) to focus on stationary behavior
- Differencing to reduce non-stationarity
Loss functions for regression forecasting (equations)
Two standard losses are:
Mean Absolute Error (MAE): \[ \mathrm{MAE} = \frac{1}{n}\sum_{i=1}^n \left|y_i - \hat{y}_i\right| \]
Mean Squared Error (MSE): \[ \mathrm{MSE} = \frac{1}{n}\sum_{i=1}^n \left(y_i - \hat{y}_i\right)^2 \]
MAE is more robust to outliers; MSE penalizes large errors more strongly.
Evaluation: what works for time series (and what doesn’t)
Standard random cross-validation often performs poorly for forecasting because it breaks temporal dependence. Instead, use walk-forward (rolling) evaluation.
- Naive baseline (persistence): \( \hat{y}_{t+1} = y_t \)
- Classical baselines: ARIMA / ETS / seasonal naive
- ML baselines: linear regression on lagged features, tree-based models, small neural nets
Feature engineering & sliding windows
Feature engineering means enriching the input with useful signals (lags, rolling mean/variance, calendar features, etc.).
A sliding window turns sequences into supervised samples: each input window becomes one training example.
Figures
Python snippets
1) Sliding window dataset (univariate)
# Turn a 1D series into (X, y) for supervised learning
import numpy as np
def make_sliding_window(series, window=24, horizon=1):
series = np.asarray(series, dtype=float)
X, y = [], []
for t in range(window, len(series) - horizon + 1):
X.append(series[t-window:t])
y.append(series[t:t+horizon]) # horizon=1 gives a single-step target
return np.array(X), np.array(y)
# Example
ts = np.sin(np.linspace(0, 10, 200)) + 0.1*np.random.randn(200)
X, y = make_sliding_window(ts, window=20, horizon=1)
print(X.shape, y.shape) # (samples, window), (samples, horizon)
2) Recursive multi-step forecast using a single-step model
# If you only have a single-step model but need k-step forecasts:
# predict -> append prediction -> predict again (recursive strategy)
def recursive_forecast(model_predict_one, last_window, steps=5):
window = np.array(last_window, dtype=float).copy()
preds = []
for _ in range(steps):
yhat = float(model_predict_one(window)) # returns next step
preds.append(yhat)
window = np.roll(window, -1)
window[-1] = yhat
return np.array(preds)
# Example "model": naive persistence
def persistence_model(window):
return window[-1]
last = ts[-20:]
print(recursive_forecast(persistence_model, last, steps=3))
3) Walk-forward validation (time-aware evaluation)
# Rolling evaluation: train on past, test on future, repeat
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
def walk_forward_mae(series, window=24, start=100, step=1):
series = np.asarray(series, dtype=float)
maes = []
for split in range(start, len(series)-1, step):
train = series[:split]
test_next = series[split] # "future" point
Xtr, ytr = make_sliding_window(train, window=window, horizon=1)
model = LinearRegression().fit(Xtr, ytr.ravel())
last_window = train[-window:]
pred = model.predict(last_window.reshape(1, -1))[0]
maes.append(abs(test_next - pred))
return float(np.mean(maes))
print("Walk-forward MAE:", walk_forward_mae(ts, window=20, start=120))
4) Simple preprocessing: differencing and inverse transform
# Differencing can reduce trend/non-stationarity
def difference(series):
series = np.asarray(series, dtype=float)
return series[1:] - series[:-1], series[0]
def invert_difference(diff_series, first_value):
out = [first_value]
for d in diff_series:
out.append(out[-1] + d)
return np.array(out)
diff_ts, first = difference(ts)
recovered = invert_difference(diff_ts, first)
print(np.max(np.abs(recovered - ts))) # should be ~0 (numerical error only)