Ved Madurwar — Data Engineer & Analyst

01 / ingest

About

Raw inputs to the pipeline — who I am and where I'm coming from.

I'm a B.Tech Computer Engineering student who'd rather ship something real than just study the theory behind it. That instinct produced DataForge ETL, a full-stack enterprise ETL platform combining PySpark, Great Expectations, and Claude AI, and RetailPulse360, an end-to-end retail analytics build that segments 540,000+ transactions with K-Means clustering and Power BI. Both started as personal projects and ended up looking like production systems — which is exactly the kind of work I want to keep doing. I'm looking for a Software Developer, Data Analyst, or Data Engineer role where I can turn data into decisions people actually act on.

location

education

cgpa

languages

focus

status

02 / validate

Skills

Every skill below is run through the same kind of expectation suite I built into DataForge — type inferred, checked, scored.

suite programming_languages

python	type: language	expect proficiency in {advanced, expert}	passed	98%
sql	type: language	expect query_optimization to be true	passed	94%
pl_sql	type: language	expect procedural_db_logic to be true	passed	88%
java	type: language	expect not_null	passed	80%

suite data_analysis_and_ml

pandas / numpy	type: library	expect daily_use to be true	passed	97%
scikit_learn	type: library	expect production_model to exist	passed	92%
k_means_clustering	type: ml_algorithm	expect applied_at_scale >= 500000 rows	passed	93%
rfm_analysis	type: method	expect segments_to_be_actionable	passed	91%
feature_engineering	type: method	expect not_null	passed	90%

suite visualization

power_bi	type: tool	expect dashboards_to_be_dynamic	passed	95%
tableau	type: tool	expect not_null	passed	82%
matplotlib	type: library	expect chart_type in {line, bar, scatter}	passed	89%

suite databases_and_tools

database_design	type: skill	expect schema_to_be_normalized	passed	90%
git	type: tool	expect history_to_be_clean	passed	93%
linux	type: platform	expect not_null	passed	85%

03 / transform

Projects

Where raw inputs get reshaped into something usable — two end-to-end builds.

enterprise data engineering platform · personal project · 2025

dataforge-etl.onrender.com ↗

ingest transform validate analyze deliver

Built a production-grade full-stack ETL platform in Python and Flask, ingesting CSV, JSON, XLSX, and Parquet through configurable end-to-end pipeline workflows.
Combined PySpark and Pandas for scalable transformation, with real-time multi-step execution and automated column profiling.
Wired in Great Expectations for automated data quality validation — pass/fail reports and expectation suites across 24,000+ record datasets.
Integrated Claude AI for column analysis, PII detection, semantic type inference, and natural-language queries translated straight into executable Pandas code.
Added Slack and email pipeline-completion alerts, plus multi-format export to clean CSV and Parquet.

96.4%data quality score

24,000+records validated

6+tools integrated

4file formats supported

advanced retail data analytics · personal project · June 2025

github.com/Ved2705/RetailPulse360 ↗

ingest transform validate analyze deliver

Ran end-to-end preprocessing, cleaning, and feature engineering on the Online Retail II dataset — roughly 540,000 records.
Applied RFM (Recency, Frequency, Monetary) analysis with K-Means clustering to segment customers into actionable groups.
Designed dynamic Power BI dashboards covering sales trends, product performance, customer segments, and return patterns.
Used Scikit-Learn and Pandas for encoding, scaling, and transformation to prepare the data for modeling.
Version-controlled the full project on GitHub, with an emphasis on reproducibility and clean analysis.

540K+records processed

RFM+ K-Means segmentation

4dashboard views shipped

0Industry-standard tools integrated into one live platform

0Records processed end-to-end across both projects

0Selected for NASA Space Apps Challenge, a global hackathon

04 / analyze

Experience

Where the pipeline gets tested against a real, time-boxed problem.

Technical Team Member — NASA Space Apps Challenge

Sep 2024 – Nov 2024 · Virtual

Took part in a globally recognized hackathon where cross-functional teams tackled real-world technology challenges.
Designed and managed a relational database to store, track, and query all registered participants for the event.
Built and deployed a Python-based Discord bot to automate new participant registrations on the test platform.
Worked in an agile team environment, applying database and software development skills under tight deadlines.

05 / deliver

Contact

Clean output, ready for the next step.

pipeline_output.json

Let's talk data

Open to Software Developer, Data Analyst, and Data Engineer roles. Reach out any way that's easiest for you.

✉ ved.madurwar27@gmail.com ☎ +91-9819200207 ⌥ github.com/Ved2705 in linkedin.com/in/ved-madurwar27