About Me

I build finance-grade data platforms.

Data Architect & Data Engineering Lead — reliability, governance, cost, adoption.

Throughput

0 TB/day

0M events/day

Reliability

SLO 0%

p95 freshness < 15 min

Adoption

0 Teams

0 dashboards supported

Case Studies
Hybrid Analytics

Platform Modernization

On-prem environment lacked interactive compute. Enabled scalable data science with significant cost avoidance vs Databricks.

Kubernetes, Trino, Airflow
Streaming

Ingestion Reliability

Missing/late events in payments data streaming. Implemented idempotency achieving zero data loss during ingestion spikes.

Redis Streams, PostgreSQL, Python
Governance

Enterprise Metadata & Lineage

High change risk during migrations. Implemented catalog + lineage + governance saving hours/week across teams.

OpenMetadata, Unity Catalog
Work Experience
07/2024 - Present

Vesta Corp

Principal Data Engineer

Owned payments and finance data platform modernization processing 5–7M txns/day. Built streaming durability on Redis Streams, migrated analytics workloads to a hybrid on-prem Kubernetes/Trino ecosystem (cutting compute spend ~85%), and implemented OpenMetadata for automated lineage and governance. Led a 5-engineer squad across multiple projects.

11/2023 - 07/2024

Nationwide Insurance

Staff Data Engineer

Led enterprise migration from Hive Metastore to Unity Catalog, standardizing RBAC and policy controls. Implemented Lakehouse Federation patterns to enable cross-source discovery and developed an automated PySpark data quality framework across Delta Lake and Snowflake feeds.

10/2022 - 11/2023

Hyundai Autoever America

Lead Big Data Engineer

Architected ingestion pipelines supporting 65 on-prem Hadoop clusters for connected car and AI services. Rolled out Trino/Presto distributed SQL layers to reduce transformation overhead and empower ad-hoc analytics at scale.

01/2020 - 10/2022

Nationwide Insurance

Senior Data Engineer

Built multi-TB/day ETL into S3 and Delta Lake on AWS. Created a standalone PySpark validation framework to guarantee post-run integrity, and tuned Spark clusters for massive throughput and memory stability.

12/2018 - 01/2020

Santander Bank

Data Engineer

Built scalable Hadoop and Spark environments for model development workflows and production data processing. Modernized legacy Hive and SQL logic by converting pipelines into PySpark and Spark SQL.

What I Build
StreamingReliability
DataGovernance
CostOptimization

Platforms & Paved RoadsOnboarding new datasets in < 1 day. Building frameworks that allow analysts to ship safely and independently.

Streaming ReliabilityIdempotency, replayable queues, and late-event handling. Utilizing Redis Streams and Kafka to ensure exact correctness.

Governance & LineageImplementing OpenMetadata and Unity Catalog alongside automated data quality checks tailored for finance-grade compliance.

Ventures & Personal Projects

Safaai

Civic Tech

Gamified Civic Engagement Platform

A closed-loop platform solving municipal maintenance. Citizens report issues, cities fix them, citizens verify them. Driven by gamification and rigorous anti-abuse architecture.

report_problemReport
engineeringTriage & Fix
verifiedVerify & Reward

Lessons Learned

  • Designing reputation systems requires severe rate-limiting to prevent sybil attacks.
  • Geospatial querying at scale demands specialized indexing (PostGIS).
RBACAutomated TriagingReal-time Computation

Nestmates

Consumer App

Household Coordination & Squads

A coordination app turning household management multiplayer. Translates chores into missions and roommate groups into squads, complete with real-time state sync and rules engines.

group_addForm Squad
task_altAssign Mission
syncState Sync

Lessons Learned

  • Optimistic UI updates are critical for a responsive feeling in multi-player state apps.
  • Complex household rules engines scale best with event-driven architecture.
Entity ResolutionRules EngineState Sync
Platform Guarantees

The standard, on purpose.