
An End-to-End Data Platform
The system is built to answer a simple question:
“If I am a coffee producer, is today a good day to sell?”
To do that, it must connect multiple data sources, produce robust forecasts and translate them into actionable strategies.
01
Automated global data ingestion
02
Unified hierarchical dataset across regions and commodities
03
14-day forecasting engine
04
Strategy simulator that reflects real producer constraints
05
Strategy simulator that reflects real producer constraints
06
Daily automation across AWS and Databricks
-
Lambda functions perform scheduled API calls
-
EventBridge orchestrates daily jobs
-
S3 acts as our central data lake
Our ingestion layer runs on AWS:
%201_32_39%20p_%20m_.png)
%201_30_51%20p_%20m_.png)
-
Run scheduled ETL pipelines
-
Use Delta Live Tables for incremental processing
-
Track experiments and models with MLflow
In Databricks, we:

What We Built
A robust, scalable infrastructure designed to handle data just like a top-tier financial institution would.
Our ingestion layer runs on AWS:
-
Lambda functions perform scheduled API calls
-
EventBridge orchestrates daily jobs
-
S3 acts as our central data lake
In Databricks, we:
-
Run scheduled ETL pipeline
-
Use Delta Live Tables for incremental processing
-
Track experiments and models with MLflow
-
Multimodal Data and a Unified Hierarchical Model. The platform integrates:
-
Climate data (temperature, rainfall, humidity)
-
Coffee and sugar futures (OHLCV)
-
Volatility indices such as VIX
-
FX rates
-
Global news and sentiment via GDELT
-
29 coffee regions and 38 sugar regions
These inputs are organized into a hierarchical model that lets each forecasting configuration pick the most relevant features, whether global averages or region-level signals.
This hierarchical approach allows each model to analyze the information that best fits its needs.
-
Configurable Model Registry and Automated ML Workflow:
Instead of hardcoding a single model, we use a configuration-based model registry. Each model is stored as data, including its type, parameters, feature functions and forecast horizon.
This enables:
-
Rapid experimentation across many configurations
-
Parallel training and evaluation
-
Clean, maintainable architecture
Pipeline:
-
Train – Fit SARIMAX using weather, market and FX features
-
Backtest – Walk-forward validation across rolling windows
-
Publish – Generate daily forecasts and 2,000 sample paths
-
Forecasts, Risk Distributions and Scenario Analysis
Each model generates approximately 2,000 simulated price paths over a 14-day horizon. From these paths, we derive:
-
Good, bad and extreme market scenarios
-
Expected price levels
-
Volatility and dispersion
-
Risk metrics such as VaR and CVaR
For a producer, this translates into a simple but powerful question:
Is it statistically better to sell now, or to wait?
-
From Forecasts to Decisions: Strategy Simulator
The strategy simulator recreates the life of a coffee producer within the model:
-
Harvested coffee enters inventory gradually
-
Storage costs increase over time
-
Coffee must be sold before the next harvest
-
Nine strategies are tested: four baselines and five forecast-enhanced
We run these strategies over eight years of historical data to evaluate whether forecasts lead to materially better income.
-
Current Results
-0.26
t-statics
0.80
p-valeu
Effect size
Negligible
Our current models do not yet outperform the best baseline strategy in a statistically significant way (p-value ≈ 0.80, negligible effect size). This is expected at an early stage and highlights the need for more expressive models and richer features.
Limitations
-
SARIMAX is sensitive to extreme shocks
-
Climate data has limited regional resolution
-
Sentiment features are not fully integrated
-
Forecast horizon is restricted to 14 days
-
Producers still lack direct, low-bandwidth access (e.g., WhatsApp)
Road Map
-
Integrate global sentiment from GDELT into production models
-
Experiment with LSTM and TimesFM for sequence forecasting
-
Explore ensemble forecasting across multiple model families
-
Automate end-to-end daily runs
Prototype a WhatsApp interface for producers with limited internet access


SOCIAL IMPACT
This system is not only about data and infrastructure. It is about economic dignity and fairness. By making market intelligence accessible to small producers, we move one step closer to a future where the value of their work is protected with the same tools used by global traders.