I build finance-grade data platforms.
Throughput
0 TB/day
0M events/day
Reliability
SLO 0%
p95 freshness < 15 min
Adoption
0 Teams
0 dashboards supported
Platform Modernization→
On-prem environment lacked interactive compute. Enabled scalable data science with significant cost avoidance vs Databricks.
Ingestion Reliability→
Missing/late events in payments data streaming. Implemented idempotency achieving zero data loss during ingestion spikes.
Enterprise Metadata & Lineage→
High change risk during migrations. Implemented catalog + lineage + governance saving hours/week across teams.
Vesta Corp
Principal Data Engineer
Owned payments and finance data platform modernization processing 5–7M txns/day. Built streaming durability on Redis Streams, migrated analytics workloads to a hybrid on-prem Kubernetes/Trino ecosystem (cutting compute spend ~85%), and implemented OpenMetadata for automated lineage and governance. Led a 5-engineer squad across multiple projects.
Nationwide Insurance
Staff Data Engineer
Led enterprise migration from Hive Metastore to Unity Catalog, standardizing RBAC and policy controls. Implemented Lakehouse Federation patterns to enable cross-source discovery and developed an automated PySpark data quality framework across Delta Lake and Snowflake feeds.
Hyundai Autoever America
Lead Big Data Engineer
Architected ingestion pipelines supporting 65 on-prem Hadoop clusters for connected car and AI services. Rolled out Trino/Presto distributed SQL layers to reduce transformation overhead and empower ad-hoc analytics at scale.
Nationwide Insurance
Senior Data Engineer
Built multi-TB/day ETL into S3 and Delta Lake on AWS. Created a standalone PySpark validation framework to guarantee post-run integrity, and tuned Spark clusters for massive throughput and memory stability.
Santander Bank
Data Engineer
Built scalable Hadoop and Spark environments for model development workflows and production data processing. Modernized legacy Hive and SQL logic by converting pipelines into PySpark and Spark SQL.
Platforms & Paved RoadsOnboarding new datasets in < 1 day. Building frameworks that allow analysts to ship safely and independently.
Streaming ReliabilityIdempotency, replayable queues, and late-event handling. Utilizing Redis Streams and Kafka to ensure exact correctness.
Governance & LineageImplementing OpenMetadata and Unity Catalog alongside automated data quality checks tailored for finance-grade compliance.
Safaai
Civic TechGamified Civic Engagement Platform
A closed-loop platform solving municipal maintenance. Citizens report issues, cities fix them, citizens verify them. Driven by gamification and rigorous anti-abuse architecture.
Lessons Learned
- Designing reputation systems requires severe rate-limiting to prevent sybil attacks.
- Geospatial querying at scale demands specialized indexing (PostGIS).
Nestmates
Consumer AppHousehold Coordination & Squads
A coordination app turning household management multiplayer. Translates chores into missions and roommate groups into squads, complete with real-time state sync and rules engines.
Lessons Learned
- Optimistic UI updates are critical for a responsive feeling in multi-player state apps.
- Complex household rules engines scale best with event-driven architecture.
The standard, on purpose.
- Freshness SLO: < 15 min (depending on tier)
- Correctness: Automated PySpark validation frameworks deployed inline
- Cost: Optimized cost per TB avoiding vendor lock-in
- Onboarding: New dataset pipelines deployed in < 1 day