This project demonstrates the design and implementation of a complete data pipeline for e-commerce analytics.
It transforms raw transactional data into business-ready insights using scalable data engineering practices.
The pipeline follows the Medallion Architecture:
Raw Layer: Ingested CSV data from source
Bronze Layer: Data standardization and ingestion metadata
Silver Layer: Data cleaning, validation, and joins
Gold Layer: Aggregated metrics for business analysis
Databricks
PySpark
Delta Lake
SQL
Databricks Jobs (orchestration)
End-to-end data pipeline (RAW → GOLD)
Data modeling using a fact table
Business metrics: revenue, orders, average order value
Automated workflow using Databricks Jobs
Dashboard for business insights
This pipeline enables:
Faster access to reliable data
Better decision-making through clear metrics
Scalable data processing workflows
Designed the data architecture
Developed the pipeline using PySpark
Implemented data transformations and joins
Created analytical datasets
Built dashboard and orchestrated workflows