Technical Lead - Data Engineering with Databricks and PySpark
CRISIL LtdJob Description
Technical Lead – Databricks & PySpark
Department
None
Job Description
We are seeking a highly skilled Technical Lead with strong expertise in Databricks, Python, and PySpark to lead data engineering initiatives. The ideal candidate will drive the design, development, and optimization of scalable data pipelines while mentoring a team of engineers and collaborating with cross-functional stakeholders.
Key Responsibilities
- Lead the design and development of data pipelines and ETL/ELT workflows using Databricks and PySpark
- Architect and implement scalable, high-performance data solutions on cloud platforms (AWS/GCP)
- Collaborate with data architects, analysts, and business teams to translate requirements into technical solutions
- Optimize data processing jobs for performance, reliability, and cost efficiency
- Ensure data quality, governance, and security standards are followed
- Mentor and guide junior engineers; perform code reviews and enforce best practices
- Drive adoption of CI/CD, DevOps, and automated testing in data engineering workflows
- Troubleshoot and resolve production issues, ensuring high availability of data systems
Required Skills & Qualifications
- Strong experience in Python and PySpark development
- Hands-on expertise with Databricks (workflows, Delta Lake, notebooks, cluster management)
- Solid understanding of data engineering concepts, distributed computing, and big data processing
- Experience with SQL and relational/NoSQL databases
- Expertise in data modeling, partitioning, and performance tuning
- Proficiency with cloud platforms (AWS/GCP equivalents)
- Familiarity with Delta Lake, streaming (Structured Streaming), and batch workloads
- Strong knowledge of Git, CI/CD pipelines, and DevOps practices
- Experience with workflow orchestration tools (Airflow, Temporal, etc.)
Preferred Qualifications
- Experience with data warehousing and lakehouse architecture
- Knowledge of ML pipelines or MLOps integration
- Exposure to data governance tools and frameworks
- Certification in Databricks is a plus
Leadership & Soft Skills
- Proven experience in technical leadership and team management
- Strong problem-solving and analytical abilities
- Excellent communication and stakeholder management skills
- Ability to work in an agile environment and handle multiple priorities
Key Deliverables
- High-quality, scalable data pipelines
- Optimized data workflows in Databricks
- Well-documented architecture and processes
- Mentored and productive engineering team
Case Study: Financial Data Engineering Solution on Databricks
Background
A financial services company processes large volumes of data from multiple systems:
- Trade transactions (Equities, Derivatives, FX)
- Market data feeds (real-time stock prices, indices)
- Customer/account data (KYC, portfolios)
- Risk and compliance data
The existing system suffers from:
- High latency in risk reporting
- Data inconsistency across systems
- Lack of real-time insights
- Scalability challenges
The company wants to implement a modern lakehouse architecture using Databricks to enable real-time risk analytics, regulatory reporting, and portfolio insights.
Objective
Design and build a scalable, secure, and high-performance financial data platform using Databricks and PySpark to support:
- Near real-time trade and risk analytics
- Regulatory reporting (e.g., daily reporting, audit trails)
- Historical analysis for portfolio performance
Task Requirements
1. Data Ingestion
- Ingest data from:
- Trade data (batch files / APIs)
- Real-time market feeds (Kafka/Event Hub)
- Reference data (customer, instruments)
- Use:
- Databricks Auto Loader for batch ingestion
- Structured Streaming for real-time feeds
2. Data Transformation
- Perform:
- Data cleansing (nulls, incorrect formats)
- Trade enrichment (join with instrument & customer data)
- Currency conversion using FX rates
- Implement key business logic:
- Daily P&L calculations
- Exposure aggregation (by asset class, customer, region)
- Risk metrics (VaR, notional exposure)
3. Data Storage (Lakehouse Design)
- Implement Medallion Architecture:
- Bronze: Raw ingested data
- Silver: Cleaned & standardized data
- Gold: Aggregated datasets for reporting
- Use Delta Lake features:
- ACID transactions
- Time travel (for audit and compliance)
- Schema evolution
4. Performance Optimization
- Optimize PySpark pipelines:
- Partitioning by trade date, asset class
- Z-ordering on frequently queried columns (e.g., account_id)
- Cache intermediate datasets
- Tune cluster configurations (autoscaling, job clusters)
5. Data Quality & Governance
- Implement:
- Data validation rules (e.g., missing trade IDs, invalid prices)
- Reconciliation checks (trade counts vs source)
- Ensure:
- Data lineage tracking
- Role-based access control (RBAC)
- Sensitive data masking (PII, financial data)
6. Streaming & Real-Time Processing
- Build streaming pipelines for:
- Real-time market data ingestion
- Intraday risk calculations
- Ensure:
- Low latency processing
- Fault-tolerant design (checkpointing, retries)
7. Orchestration
- Implement pipeline orchestration using:
- Databricks Workflows / Airflow / Azure Data Factory
- Handle:
- Dependencies (e.g., reference data before trade enrichment)
- Job retries and alerts
8. CI/CD & Deployment
- Use Git-based workflows:
- Branching strategy
- Code reviews
- Implement CI/CD pipelines for:
- Automated testing
- Deployment to environments (Dev/Test/Prod)
Open Positions
1
Mandatory Skills
Pyspark,databrics,Data Engineer,Lead Data Engineer,Python
Education Qualification
Post Graduation or Graduation in Computers or it's equalent
Experience
10 to 12 years
Job role
Job requirements
About company
Similar jobs you can apply for
Manufacturing / ProductionQA / QC Executive
Essence EcocraftsJunior Software Developer
Mahek Marketing India
Website Developer
PG Skill Technologies Private Limited
Wordpress Developer
Marqetrix Web Solutions
Website Mainetenance
Maurya Ethnic WearProject Engineer
V Tech Technologies Private LimitedYou can expect a minimum salary of 0 INR. The salary offered will depend on your skills, experience and performance in the interview.
The candidate should have completed the required education and people who have 10 to 12 years are eligible to apply for this job. You can apply for more jobs in Mumbai/Bombay to get hired quickly.
The candidate should have sound communication skills and sound communication skills for this job.
Both Male and Female candidates can apply for this job.
No, it's not a work from home job and can't be done online. You can explore and apply for other work from home jobs in Mumbai/Bombay at apna.
No work-related deposit needs to be made during your employment with the company.
Go to the apna app and apply for this job. Click on the apply button and call HR directly to schedule your interview.
The last date to apply for this job is . For more details, download apna app and find Full Time jobs in Mumbai/Bombay . Through apna, you can find jobs in 64 cities across India. Join NOW!