[WORK EXPERIENCE] Python - OVS Sales Forecasting System
This project consists of a production-grade Machine Learning system designed to forecast end-of-day sales at multiple checkpoints during the day for a large-scale retail network.
The system generates 5 intraday predictions (12:00, 14:00, 17:00, 19:00, 21:00) using transaction data accumulated up to each timestamp.
The main goal was to replace a legacy rule-based algorithm that relied on static historical averages and required manual adjustments for special cases such as holidays, promotions, and new stores.
Key Features
- Near real-time predictions: sales data refreshed every 30 minutes
- Scalable coverage: deployed across 1,000+ stores
- Multi-checkpoint forecasting: dynamic updates throughout the day
- Robustness: handles cold-start scenarios and non-recurring calendar events (e.g. Easter, Black Friday)
- Model explainability: SHAP values used to interpret feature contributions at prediction level
Machine Learning Approach
The core model is based on XGBoost, chosen for its efficiency and ability to model non-linear relationships in structured data.
The system relies on extensive feature engineering, including:
- Intraday signals: cumulative sales up to prediction time
- Lag features: historical sales from similar and previous days [most important features]
- Calendar features: holidays, seasonal patterns, weekday groupings
- Store metadata: location, cluster, and operational characteristics
Unlike the previous deterministic system, the model is data-driven and non-deterministic, continuously adapting predictions based on real-time performance.
Engineering & MLOps
The entire pipeline is built on Databricks, with a strong focus on production reliability and scalability:
- Data processing: PySpark for large-scale feature computation
- Model lifecycle: MLflow for experiment tracking, versioning, and reproducibility
- Deployment: fully code-driven using Databricks Asset Bundles (no notebooks)
- Inference: automated batch jobs generating predictions at scheduled checkpoints
The system manages the full ML lifecycle:
- training
- hyperparameter tuning
- validation
- deployment
- monitoring
Impact
- Replaced a rigid rule-based system, eliminating manual interventions
- Improved robustness on edge cases such as new stores and irregular holidays
- Enabled real-time business monitoring for high-level stakeholders
Notes
Due to company constraints, source code is not publicly available.
python xgboost pyspark mlflow databricks forecasting mlops explainability
