Architecture & Platform

Databricks DAIS 2026 Hackathon

Technical reference — pipeline, ML models, Genie AI/BI, consumption layers, and CI/CD. Read the backdrop →

70

Customers

$9.2M

Revenue

12

SKUs

24mo

History

8

Gold Views

3

ML Models

DatabricksDelta LakeMLflowUnity CatalogXGBoostscikit-learnOptunaLakeflow DLTGenie AI/BINext.js 14OpenAI GPT-4oVercelGitHub Actions

Data Pipeline — Bronze → Silver → Gold → ML

1
Data SourcesRaw inputs
SAP HANA — KNA1, VBAK, VBAP, ZCUST_INTERACTIONS, MARD_STOCKSalesforce CRM — Accounts, Opportunities, Cases
2
Bronze LayerWeekly ingestion · sap_bronze schema
bronze_sap_kna1_customersbronze_sap_vbak_ordersbronze_sap_vbap_order_itemsbronze_sap_zcust_interactionsbronze_sap_mard_stock
3
Silver — Lakeflow DLTSpark Declarative Pipeline · serverless
dim_customer_unifiedfact_sap_ordersfact_customer_interactionsfact_opportunityfact_case
4
Gold LayerBusiness views · gold_pres schema
gold_customer_360gold_sales_to_fulfillment_pipelinegold_demand_vs_supply_gapmetrics_customer_healthmetrics_sales_performancemetrics_product_trends
5
ML ModelsMLflow · Unity Catalog · @champion alias
deal2delivery_churn_model @championdeal2delivery_demand_forecast @championdeal2delivery_customer_segments @champion

Next.js App on Vercel

Public · business stakeholders

Dashboard + KPIsDemand ForecastScenario SimulatorInventory GapCustomer Risk + SegmentsOpenAI GPT-4o insights

Databricks Genie AI/BI

Internal · data teams

Natural language queriesAI/BI DashboardsLLM-as-a-Judge evaluationClaude Opus remediation

ML Models

Customer Segmentation

K-Means Clustering (k=5)

deal2delivery_customer_segments@champion

5 RFM segments: Champions, Loyal, At-Risk, Hibernating, Prospects
Features: revenue, order count, days silent, sentiment, open cases, engagement
StandardScaler — prevents revenue dominating clustering
RFM score (0–100): recency 30% + frequency 30% + monetary 40%
Inertia trackedSilhouette score6 featurescustomer_segments table

Churn Prediction v2

XGBoost Classifier

deal2delivery_churn_model@champion

Composite behavioral label — 5 weighted signals
Inactivity 35% · Cases 20% · Sentiment 20% · Neg ratio 15% · Revenue 10%
Optuna Bayesian hyperparameter search — 15 trials
Stratified 5-fold CV · Output: probability + High/Medium/Low tier
CV AUC trackedCV F1 tracked30+ featuresMLflow Traced

Demand Forecast

XGBoost Regressor

deal2delivery_demand_forecast@champion

Per-SKU monthly quantity forecast using lag features
Lag features: prev month, prev quarter, rolling 3 & 6 month avg
Time-based train/test split — last 15% of months as test
6-month forward predictions → demand_forecast_predictions gold table
MAE trackedRMSE trackedMAPE tracked12 SKUs × 6M

Genie AI/BI — Natural Language Analytics

Databricks Genie space backed by all Gold views. Business users ask questions in plain English — Genie generates SQL and returns data-driven answers. A self-improving evaluation loop runs LLM-as-a-Judge scorers on every conversation.

"Which customers haven't ordered in 90 days?"
"What's the revenue trend for computing products?"
"Show me top 5 customers by lifetime value"
"Which SKUs are critically understocked?"

LLM-as-a-Judge Scorers

RelevanceToQueryResponse addresses the question
RetrievalGroundednessAnswer grounded in actual data
genie_response_qualityData-driven, not vague
genie_sql_qualityCorrect aggregations, no SELECT *
SafetyNo harmful content
has_responseNon-empty answer returned
no_errorCompleted without errors

CI/CD — GitHub Actions + Databricks Asset Bundles

feature/*

PR to develop

develop

dev

Push (auto-deploy)

main

staging

Push (auto-deploy)

manual

prod

workflow_dispatch

Performance — Three-Layer Cache

Layer 1

Next.js ISR

5 min

API responses cached at Vercel CDN. Tab switches within 5 min hit zero Databricks queries.

Layer 2

Databricks SQL Result Cache

24 h

Identical queries return in ~200ms from warehouse cache — no re-execution on Delta.

Layer 3

Delta Cache

In-memory

Hot table data cached on cluster SSD. Repeated reads skip cloud storage entirely.

Deal2Delivery

Databricks DAIS 2026 Community Virtual Hackathon