Senior Data Engineer
Quantiphi Analytics Solution Pvt LtdJob Description
Senior Data Engineer
While technology is the heart of our business, a global and diverse culture is the heart of our success. We love our people and we take pride in catering them to a culture built on transparency, diversity, integrity, learning and growth.
If working in an environment that encourages you to innovate and excel, not just in professional but personal life, interests you- you would enjoy your career with Quantiphi!
Data Engineer
Exp Range : 4 - 8 Years
Location : Mumbai , Bangalore, Trivandrum
Role review :
The Data Engineer is the implementation backbone of the platform. You will build and operate the ingestion pipelines, the dbt transformation layer, the FHIR serialization pipelines, the FHIR-Repository integration components, and the data products that depend on them. You will work within a spec-driven development framework — every component begins with an approved specification, and your work is generated, refined, and verified against that specification.
This is a hands-on role. You will write Python, PyFlink, PySpark, dbt SQL, and supporting Java for FHIR-Repository interceptor work, with Code agents as a coding assistant under your direction. Code review, test authorship, and operational ownership of the components you build are part of the role. The platform handles regulated healthcare data at high volume, and operational discipline matters.
Key Responsibilities
Build and maintain the ingestion pipelines — Apache Flink streaming jobs on Dataproc for HL7v2 and FHIR feeds, PySpark batch jobs for CCDA and CSV bulk loads. Implement against the CDM_ingest_mapping shared Python library that defines SourceSpec instances per source × format combination
Implement source-format parsers (HL7v2, CCDA, CSV, FHIR) as Python classes per the parser component spec. Write fixture-driven tests covering both well-formed inputs and DLQ-routing scenarios
Implement and maintain the synchronous Informatica MDM call pattern in the ingestion path — batched calls, timeout handling, circuit breaker behavior, DLQ routing for MDM failures. Implement the asynchronous MDM event consumer that applies ECI changes to CDM
Build the dbt transformation layer end to end — staging models per source, intermediate models that union and resolve REFs, CDM target models (DIM/FACT/BRIDGE/REF) that apply SCD2 via shared macros, and data product models. Write the dbt YAML schemas, tests, and documentation that accompany every model
Implement and maintain the shared dbt macro library — hash_key, scd2_merge, attribute_hash, restate_merge, audit_columns. Macros are the most-reused code; their correctness is non-negotiable and they require golden tests
Build the FHIR serialization layer — flat FHIR Iceberg tables (one per resource type) materialized via dbt, the PySpark bundling pipeline that produces FHIR Bundles for Kafka publication, and the FHIR validator integration that gates publication on US Core 6.1 conformance
Build and maintain FHIR-Repository integration components — the Java/Kotlin egress interceptor that captures client-originated FHIR changes, the Flink loopback consumer that merges those changes into CDM, the bundle consumer that ingests CDM-originated bundles into FHIR-Repository. Implement origin-tag-based loop prevention
Implement Cloud Composer DAGs to orchestrate dbt runs, batch ingestion jobs, maintenance operations (Iceberg compaction, snapshot expiration, orphan file cleanup), and data product refresh schedules
Work within the spec-driven development framework — draft unit specs for new components, work with peers on spec review, generate implementation and tests using Code agents with the spec as primary context, iterate until tests pass, and submit code review packages that include the spec, tests, and implementation together
Implement and monitor data quality checks at every layer — DBT tests for staging and CDM, FHIR validator output for serialization, Iceberg metadata observations for storage health, freshness monitors at the source-to-CDM boundary
Participate in code reviews, on-call rotations, and incident response. The platform serves regulated healthcare workloads; operational stability is part of the engineering responsibility, not a separate function
Optimize pipeline performance — Flink TaskManager sizing, Iceberg compaction tuning, dbt incremental strategy selection, Starburst cluster scaling decisions. Profile production performance and propose changes when SLOs are at risk
Document the components you build through the SDD framework — every code file references its spec; every spec change is reviewed; every acceptance criterion has a corresponding test
Required Skills and Qualifications
Bachelor's or Master's degree in Computer Science, Engineering, or a related quantitative field
3+ years of hands-on data engineering experience
Strong proficiency in Python and SQL. PySpark and PyFlink familiarity strongly preferred. Some Java or Kotlin exposure useful for FHIR-Repository interceptor work (one or two engineers on the team will lead this; the rest contribute as needed)
Hands-on experience with Google Cloud Platform — Cloud Storage, Dataproc, Cloud Composer, Cloud Run or GKE for container workloads, Secret Manager, IAM. Experience with the Dataproc Flink optional component is a strong plus
Production experience with dbt — incremental materialization strategies, custom macros, tests, sources, sequencing, and project organization for large model graphs. dbt-trino adapter experience is a plus
Production experience with Apache Iceberg — table creation, partitioning, compaction, snapshot operations, schema evolution. Familiarity with reading and writing Iceberg from multiple engines (Spark, Flink, Trino) is valuable
Experience with Apache Kafka — producers, consumers, partitioning, consumer-group semantics, retention and compaction, and integration with stream processors. Confluent Cloud experience preferred
Experience with streaming data processing — Apache Flink in production preferred; Apache Spark Structured Streaming acceptable as adjacent experience
Familiarity with healthcare data standards — at minimum, HL7v2 message structure and FHIR R4 resource shapes. Hands-on parsing experience for one or both is preferred
Experience with version control (Git), branch-based development workflows, pull request reviews, and CI/CD pipelines (GitHub Actions, GitLab CI, or Cloud Build)
Comfortable working with AI coding assistants (Code agents, Cursor, Copilot) as collaborators. The team uses Code agents as part of the spec-driven development workflow; ability to write effective prompts, review generated code critically, and iterate constructively is a meaningful part of the role
Strong problem-solving skills, debugging discipline, attention to detail, and ability to operate in an agile team environment
Nice-to-Have Skills
FHIR-Repository or HAPI FHIR experience — interceptor authorship, MDM module configuration, FHIR API customization
Informatica MDM experience
Atlan experience for governance and lineage integration
Java or Kotlin proficiency for FHIR-Repository interceptor and HAPI FHIR work
Apache Flink production operational experience including stateful jobs, exactly-once semantics, and savepoint/checkpoint management
Experience with FHIR profile validation tools (HL7 FHIR validator, Inferno, or HAPI's validation modules)
Experience contributing to or operating spec-driven or contract-first development workflows
Experience with Iceberg's Polaris, Nessie, or Snowflake Open Catalog as alternatives to BigLake (useful for portability discussions)
If you like wild growth and working with happy, enthusiastic over-achievers, you'll enjoy your career with us!
Experience Level
Senior LevelJob role
Job requirements
About company
Similar jobs you can apply for
Software / Web DeveloperWeb Developer
Zarna Enterprises
Engineering Trainee
Omfurn India Limited
AI Automation Specialist
Mcm Bpo Private LimitedQuality Control Engineer
Y J Associates