I build the pipelines that turn raw, messy data into something a business can trust and act on — from PySpark ingestion to validated, dashboard-ready insight.
Raw inputs to the pipeline — who I am and where I'm coming from.
I'm a B.Tech Computer Engineering student who'd rather ship something real than just study the theory behind it. That instinct produced DataForge ETL, a full-stack enterprise ETL platform combining PySpark, Great Expectations, and Claude AI, and RetailPulse360, an end-to-end retail analytics build that segments 540,000+ transactions with K-Means clustering and Power BI. Both started as personal projects and ended up looking like production systems — which is exactly the kind of work I want to keep doing. I'm looking for a Software Developer, Data Analyst, or Data Engineer role where I can turn data into decisions people actually act on.
Every skill below is run through the same kind of expectation suite I built into DataForge — type inferred, checked, scored.
| python | type: language | expect proficiency in {advanced, expert} | passed | 98% | |
| sql | type: language | expect query_optimization to be true | passed | 94% | |
| pl_sql | type: language | expect procedural_db_logic to be true | passed | 88% | |
| java | type: language | expect not_null | passed | 80% |
| pandas / numpy | type: library | expect daily_use to be true | passed | 97% | |
| scikit_learn | type: library | expect production_model to exist | passed | 92% | |
| k_means_clustering | type: ml_algorithm | expect applied_at_scale >= 500000 rows | passed | 93% | |
| rfm_analysis | type: method | expect segments_to_be_actionable | passed | 91% | |
| feature_engineering | type: method | expect not_null | passed | 90% |
| power_bi | type: tool | expect dashboards_to_be_dynamic | passed | 95% | |
| tableau | type: tool | expect not_null | passed | 82% | |
| matplotlib | type: library | expect chart_type in {line, bar, scatter} | passed | 89% |
| database_design | type: skill | expect schema_to_be_normalized | passed | 90% | |
| git | type: tool | expect history_to_be_clean | passed | 93% | |
| linux | type: platform | expect not_null | passed | 85% |
Where raw inputs get reshaped into something usable — two end-to-end builds.
Where the pipeline gets tested against a real, time-boxed problem.
Clean output, ready for the next step.
Open to Software Developer, Data Analyst, and Data Engineer roles. Reach out any way that's easiest for you.