ved_madurwar — pipeline
$
source: B.Tech Computer Engineering, Vishwakarma University
stack:  Python · SQL · PySpark · Scikit-Learn · Power BI
status: 5/5 stages passed — ready to deploy

Ved Madurwar

software developer · data analyst · data engineer

I build the pipelines that turn raw, messy data into something a business can trust and act on — from PySpark ingestion to validated, dashboard-ready insight.

01 / ingest

About

Raw inputs to the pipeline — who I am and where I'm coming from.

I'm a B.Tech Computer Engineering student who'd rather ship something real than just study the theory behind it. That instinct produced DataForge ETL, a full-stack enterprise ETL platform combining PySpark, Great Expectations, and Claude AI, and RetailPulse360, an end-to-end retail analytics build that segments 540,000+ transactions with K-Means clustering and Power BI. Both started as personal projects and ended up looking like production systems — which is exactly the kind of work I want to keep doing. I'm looking for a Software Developer, Data Analyst, or Data Engineer role where I can turn data into decisions people actually act on.

location
education
cgpa
languages
focus
status
02 / validate

Skills

Every skill below is run through the same kind of expectation suite I built into DataForge — type inferred, checked, scored.

suite programming_languages
pythontype: languageexpect proficiency in {advanced, expert}passed
98%
sqltype: languageexpect query_optimization to be truepassed
94%
pl_sqltype: languageexpect procedural_db_logic to be truepassed
88%
javatype: languageexpect not_nullpassed
80%
suite data_analysis_and_ml
pandas / numpytype: libraryexpect daily_use to be truepassed
97%
scikit_learntype: libraryexpect production_model to existpassed
92%
k_means_clusteringtype: ml_algorithmexpect applied_at_scale >= 500000 rowspassed
93%
rfm_analysistype: methodexpect segments_to_be_actionablepassed
91%
feature_engineeringtype: methodexpect not_nullpassed
90%
suite visualization
power_bitype: toolexpect dashboards_to_be_dynamicpassed
95%
tableautype: toolexpect not_nullpassed
82%
matplotlibtype: libraryexpect chart_type in {line, bar, scatter}passed
89%
suite databases_and_tools
database_designtype: skillexpect schema_to_be_normalizedpassed
90%
gittype: toolexpect history_to_be_cleanpassed
93%
linuxtype: platformexpect not_nullpassed
85%
03 / transform

Projects

Where raw inputs get reshaped into something usable — two end-to-end builds.

enterprise data engineering platform · personal project · 2025
dataforge-etl.onrender.com ↗
ingest transform validate analyze deliver
  • Built a production-grade full-stack ETL platform in Python and Flask, ingesting CSV, JSON, XLSX, and Parquet through configurable end-to-end pipeline workflows.
  • Combined PySpark and Pandas for scalable transformation, with real-time multi-step execution and automated column profiling.
  • Wired in Great Expectations for automated data quality validation — pass/fail reports and expectation suites across 24,000+ record datasets.
  • Integrated Claude AI for column analysis, PII detection, semantic type inference, and natural-language queries translated straight into executable Pandas code.
  • Added Slack and email pipeline-completion alerts, plus multi-format export to clean CSV and Parquet.
96.4%data quality score
24,000+records validated
6+tools integrated
4file formats supported

advanced retail data analytics · personal project · June 2025
github.com/Ved2705/RetailPulse360 ↗
ingest transform validate analyze deliver
  • Ran end-to-end preprocessing, cleaning, and feature engineering on the Online Retail II dataset — roughly 540,000 records.
  • Applied RFM (Recency, Frequency, Monetary) analysis with K-Means clustering to segment customers into actionable groups.
  • Designed dynamic Power BI dashboards covering sales trends, product performance, customer segments, and return patterns.
  • Used Scikit-Learn and Pandas for encoding, scaling, and transformation to prepare the data for modeling.
  • Version-controlled the full project on GitHub, with an emphasis on reproducibility and clean analysis.
540K+records processed
RFM+ K-Means segmentation
4dashboard views shipped
0Industry-standard tools integrated into one live platform
0Records processed end-to-end across both projects
0Selected for NASA Space Apps Challenge, a global hackathon
04 / analyze

Experience

Where the pipeline gets tested against a real, time-boxed problem.

Technical Team Member — NASA Space Apps Challenge

Sep 2024 – Nov 2024 · Virtual
  • Took part in a globally recognized hackathon where cross-functional teams tackled real-world technology challenges.
  • Designed and managed a relational database to store, track, and query all registered participants for the event.
  • Built and deployed a Python-based Discord bot to automate new participant registrations on the test platform.
  • Worked in an agile team environment, applying database and software development skills under tight deadlines.
05 / deliver

Contact

Clean output, ready for the next step.

pipeline_output.json

Let's talk data

Open to Software Developer, Data Analyst, and Data Engineer roles. Reach out any way that's easiest for you.