[WORK EXPERIENCE] Python - OVS Sales Forecasting System

This project consists of a production-grade Machine Learning system designed to forecast end-of-day sales at multiple checkpoints during the day for a large-scale retail network.
The system generates 5 intraday predictions (12:00, 14:00, 17:00, 19:00, 21:00) using transaction data accumulated up to each timestamp.

The main goal was to replace a legacy rule-based algorithm that relied on static historical averages and required manual adjustments for special cases such as holidays, promotions, and new stores.

Key Features

  • Near real-time predictions: sales data refreshed every 30 minutes
  • Scalable coverage: deployed across 1,000+ stores
  • Multi-checkpoint forecasting: dynamic updates throughout the day
  • Robustness: handles cold-start scenarios and non-recurring calendar events (e.g. Easter, Black Friday)
  • Model explainability: SHAP values used to interpret feature contributions at prediction level

Machine Learning Approach

The core model is based on XGBoost, chosen for its efficiency and ability to model non-linear relationships in structured data.

The system relies on extensive feature engineering, including:

  • Intraday signals: cumulative sales up to prediction time
  • Lag features: historical sales from similar and previous days [most important features]
  • Calendar features: holidays, seasonal patterns, weekday groupings
  • Store metadata: location, cluster, and operational characteristics

Unlike the previous deterministic system, the model is data-driven and non-deterministic, continuously adapting predictions based on real-time performance.

Engineering & MLOps

The entire pipeline is built on Databricks, with a strong focus on production reliability and scalability:

  • Data processing: PySpark for large-scale feature computation
  • Model lifecycle: MLflow for experiment tracking, versioning, and reproducibility
  • Deployment: fully code-driven using Databricks Asset Bundles (no notebooks)
  • Inference: automated batch jobs generating predictions at scheduled checkpoints

The system manages the full ML lifecycle:

  • training
  • hyperparameter tuning
  • validation
  • deployment
  • monitoring

Impact

  • Replaced a rigid rule-based system, eliminating manual interventions
  • Improved robustness on edge cases such as new stores and irregular holidays
  • Enabled real-time business monitoring for high-level stakeholders

Notes

Due to company constraints, source code is not publicly available.

python xgboost pyspark mlflow databricks forecasting mlops explainability