Time Series Forecasting Through the Lens of Supervised Learning

A portfolio-style guide: problem formulation, multi-step strategies, preprocessing, baselines, equations, and code.

📅 January 2026 ⏱️ ~8 min read 🏷️ Time Series • Machine Learning • Forecasting

Contents

For recruiters
Forecasting as supervised learning
Input–output setups
Multi-step strategies
Missing values
Preprocessing
Loss functions (equations)
Evaluation & baselines
Feature engineering & sliding windows
Figures
Python snippets

For recruiters (quick scan)

I treat forecasting as a supervised learning problem with strict respect for temporal ordering. I build robust pipelines around:

Clear problem formulation: univariate/multivariate inputs and outputs, single-step vs multi-step horizons.
Strong baselines: ARIMA/ETS or simple persistence benchmarks before deep learning.
Time-aware validation: walk-forward / rolling evaluation (not random cross-validation).
Data readiness: missing-value handling, normalization, detrending, differencing when needed.
Deployable modeling choices: prefer simpler classical methods when accuracy is comparable.

Walk-forward validation Recursive forecasting Feature engineering Baselines-first

Forecasting as a supervised learning problem

Forecasting can be formulated as regression (continuous targets like temperature, prices) or as classification (discrete targets like “sunny” vs “not sunny”). The learning setup is supervised: we map historical inputs to future outputs—while preserving time order.

Input–output structures in time series

Definitions

Univariate input: one sequence is used as input.
Multivariate input: several sequences are used as input.
Univariate output: one sequence is predicted as output.
Multivariate output: several sequences are predicted as output.

Concrete example

Predicting the number of sunny days in a new month using the history of: sunny days, temperature, and atmospheric pressure is a multivariate input – univariate output task.

In multivariate input–univariate output forecasting, one series is often treated as the primary target and others as secondary explanatory signals.

For multivariate input–multivariate output tasks, two broad modeling patterns are common: many-to-one vs many-to-many (depending on whether we output one step or a sequence).

Single-step vs multi-step forecasting

A single-step model predicts only the next value. A multi-step model predicts several future values—often harder due to error propagation.

Three ways to build multi-step forecasts

Single multi-step model: one model outputs all horizons at once.
Multiple single-step models (direct strategy): one model per horizon.
Recursive single-step model: reuse one model repeatedly, feeding predictions back as inputs.

Practical scenario: You have a single-step model and need a 2-step forecast quickly. Use the recursive approach: predict \( \hat{y}_{t+1} \), append it to the input window, then predict \( \hat{y}_{t+2} \).

Ways to fill missing values (N/A)

Common strategies include:

Linear interpolation
Fill with time series mean
Fill with zeros (only if zero is meaningful)
Fill with the next known value (back-fill)
Fill with the last known value (forward-fill)

Core preprocessing methods

Normalization / scaling (stabilizes optimization, comparable feature ranges)
Trend removal (detrending) to focus on stationary behavior
Differencing to reduce non-stationarity

A simple differencing transform is: \( x'_t = x_t - x_{t-1} \). It often helps classical models and can also help ML models by removing slow drift.

Loss functions for regression forecasting (equations)

Two standard losses are:

Mean Absolute Error (MAE): \[ \mathrm{MAE} = \frac{1}{n}\sum_{i=1}^n \left|y_i - \hat{y}_i\right| \]

Mean Squared Error (MSE): \[ \mathrm{MSE} = \frac{1}{n}\sum_{i=1}^n \left(y_i - \hat{y}_i\right)^2 \]

MAE is more robust to outliers; MSE penalizes large errors more strongly.

Evaluation: what works for time series (and what doesn’t)

Standard random cross-validation often performs poorly for forecasting because it breaks temporal dependence. Instead, use walk-forward (rolling) evaluation.

Baseline-first rule: always compare your model to alternatives. If a classical method achieves high accuracy, prefer it—simplicity and interpretability matter.

Naive baseline (persistence): \( \hat{y}_{t+1} = y_t \)
Classical baselines: ARIMA / ETS / seasonal naive
ML baselines: linear regression on lagged features, tree-based models, small neural nets

Feature engineering & sliding windows

Feature engineering means enriching the input with useful signals (lags, rolling mean/variance, calendar features, etc.).

A sliding window turns sequences into supervised samples: each input window becomes one training example.

Figures

A simple illustration of a time series. Real forecasting requires respecting time order and validating on future segments.

Sliding window turns sequential data into supervised learning samples: past values/features → next value (or future horizon).

Python snippets

1) Sliding window dataset (univariate)

# Turn a 1D series into (X, y) for supervised learning
import numpy as np

def make_sliding_window(series, window=24, horizon=1):
    series = np.asarray(series, dtype=float)
    X, y = [], []
    for t in range(window, len(series) - horizon + 1):
        X.append(series[t-window:t])
        y.append(series[t:t+horizon])  # horizon=1 gives a single-step target
    return np.array(X), np.array(y)

# Example
ts = np.sin(np.linspace(0, 10, 200)) + 0.1*np.random.randn(200)
X, y = make_sliding_window(ts, window=20, horizon=1)
print(X.shape, y.shape)  # (samples, window), (samples, horizon)

2) Recursive multi-step forecast using a single-step model

# If you only have a single-step model but need k-step forecasts:
# predict -> append prediction -> predict again (recursive strategy)

def recursive_forecast(model_predict_one, last_window, steps=5):
    window = np.array(last_window, dtype=float).copy()
    preds = []
    for _ in range(steps):
        yhat = float(model_predict_one(window))  # returns next step
        preds.append(yhat)
        window = np.roll(window, -1)
        window[-1] = yhat
    return np.array(preds)

# Example "model": naive persistence
def persistence_model(window):
    return window[-1]

last = ts[-20:]
print(recursive_forecast(persistence_model, last, steps=3))

3) Walk-forward validation (time-aware evaluation)

# Rolling evaluation: train on past, test on future, repeat
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

def walk_forward_mae(series, window=24, start=100, step=1):
    series = np.asarray(series, dtype=float)
    maes = []
    for split in range(start, len(series)-1, step):
        train = series[:split]
        test_next = series[split]  # "future" point

        Xtr, ytr = make_sliding_window(train, window=window, horizon=1)
        model = LinearRegression().fit(Xtr, ytr.ravel())

        last_window = train[-window:]
        pred = model.predict(last_window.reshape(1, -1))[0]
        maes.append(abs(test_next - pred))
    return float(np.mean(maes))

print("Walk-forward MAE:", walk_forward_mae(ts, window=20, start=120))

4) Simple preprocessing: differencing and inverse transform

# Differencing can reduce trend/non-stationarity
def difference(series):
    series = np.asarray(series, dtype=float)
    return series[1:] - series[:-1], series[0]

def invert_difference(diff_series, first_value):
    out = [first_value]
    for d in diff_series:
        out.append(out[-1] + d)
    return np.array(out)

diff_ts, first = difference(ts)
recovered = invert_difference(diff_ts, first)
print(np.max(np.abs(recovered - ts)))  # should be ~0 (numerical error only)

Note: For multivariate input (ex: sunny days + temperature + pressure), you build X as a matrix of features per time step (and per window), then predict a univariate target (sunny days) or multivariate targets, depending on the problem.

↑ Back to top Back to blog