PySpark Data Engineer with Cloudera and Cloud Expertise

Synechron Technologies

PySpark Data Engineer with Cloudera and Cloud Expertise

Synechron Technologies

Bengaluru/Bangalore

Not disclosed

Work from Office

Full Time

Min. 5 years

Job Description

PySpark Data Engineer with Cloudera and Cloud Expertise

Job Summary
Synechron is seeking a highly experienced PySpark Data Engineer to develop, optimize, and maintain scalable data pipelines within the Cloudera Data Platform (CDP). This role is essential in ensuring high data quality, availability, and performance across enterprise data ecosystems. The successful candidate will leverage extensive big data and cloud-native processing expertise to support business analytics, reporting, and data science initiatives, driving impactful insights and operational efficiency.

Software Requirements

Required:
- Advanced proficiency in PySpark, including handling DataFrames, RDDs, and optimization techniques for large-scale data processing
- Strong experience with Cloudera Data Platform components such as Cloudera Manager, Hive, Impala, HDFS, and HBase
- In-depth knowledge of Hadoop ecosystem technologies (Hadoop, Kafka) and distributed computing frameworks
- SQL expertise and experience with data warehousing concepts (Hive, Impala)
- Linux scripting skills (Bash, Python) for automation and operational workflows
- Experience with orchestration tools like Apache Oozie or Apache Airflow

Preferred:
- Cloud data services (AWS EMR, Azure HDInsight, GCP Dataproc) for scalable data processing
- Data modeling, metadata management, and data governance tools
- CI/CD pipelines setup using Jenkins, GitLab, or similar tools

Overall Responsibilities

Design, develop, and optimize highly scalable data pipelines using PySpark within the Cloudera Data Platform to support business intelligence and analytics.
Manage end-to-end data ingestion processes from various sources such as relational databases, APIs, and file systems.
Execute data transformation, cleansing, and aggregation processes on large datasets to facilitate reporting and data science activities.
Conduct performance tuning of PySpark jobs and optimize cluster resource utilization.
Implement data quality checks, validation routines, and monitoring to ensure data accuracy and consistency.
Automate data workflows and pipeline orchestration to reduce manual intervention and improve efficiency.
Troubleshoot data pipeline issues and drive operational stability across data ecosystems.
Collaborate with data analysts, data scientists, and platform engineers to understand data requirements and improve system performance.
Maintain detailed documentation for data pipelines, workflows, configurations, and operational procedures.
Support data governance, security, and compliance initiatives aligned with enterprise standards.

Technical Skills (By Category)

Programming & Data Processing (Essential):
- PySpark (DataFrames, RDDs, optimization)
- SQL (Hive, Impala, relational databases)
- Linux scripting (Bash, Python) for automation

Data Ecosystem & Storage (Essential):
- Hadoop ecosystem (HDFS, Hive, Impala, HBase)
- Kafka or similar messaging systems for data streaming

Cloud & Orchestration (Preferred):
- Cloud-native data processing (AWS EMR, Azure HDInsight, GCP Dataproc)
- Orchestration tools (Apache Airflow, Oozie)

Tools & Frameworks (Preferred):
- CI/CD with Jenkins, GitLab CI
- Data governance and metadata tools (e.g., Apache Atlas, Collibra)

Experience Requirements

Minimum of 5+ years working in data engineering roles with significant PySpark expertise.
Proven experience building and managing large-scale data pipelines in enterprise environments.
Strong background in big data ecosystems, cloud data services, and data warehousing.
Demonstrated ability to optimize Spark jobs and troubleshoot distributed data processing issues.
Experience supporting financial or regulated industries is advantageous.
Support pathways include extensive hands-on experience in large data ecosystems supporting analytics and reporting.

Day-to-Day Activities

Develop, optimize, and monitor scalable data pipelines for ingestion, transformation, and redistribution of data.
Troubleshoot data processing issues proactively, perform root cause analysis, and implement fixes.
Collaborate with data analysts, data scientists, and platform teams to design data models and pipelines based on business needs.
Automate operational workflows using orchestration tools to enhance pipeline reliability.
Conduct performance tuning, cluster management, and resource optimization for Spark jobs.
Validate data quality, correctness, and completeness through routine reviews and monitoring.
Document architecture, workflows, and procedures for operational governance.
Support data privacy, security, and compliance measures within data ecosystems.

Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
5+ years of hands-on experience with PySpark, big data ecosystems, and distributed processing.
Proven expertise supporting large-scale data pipelines in enterprise or financial industry environments.
Experience with Cloudera Data Platform components (Hive, Impala, HDFS, HBase).
Strong SQL and data modeling skills.
Support experience supporting cloud data processing environments (AWS, Azure, GCP) is advantageous.
Relevant certifications (e.g., AWS Big Data Specialty, Cloudera Certified Data Engineer) are preferred.

Professional Competencies

Strong analytical and troubleshooting skills for complex data pipeline issues.
Ability to work independently and collaboratively across teams.
Effective communication skills to convey technical details to non-technical stakeholders.
Adaptability to evolving technologies and data processing requirements.
Focus on operational excellence, data quality, and process automation.
Ownership mindset to ensure data integrity, performance, and reliability.

SYNECHRON’S DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Experience Level

Senior Level

Job role

Work locationBengaluru - BCIT, India

DepartmentData Science & Analytics

Role / CategoryDBA / Data warehousing

Employment typeFull Time

ShiftDay Shift

Job requirements

ExperienceMin. 5 years

About company

NameSynechron Technologies

Job posted by Synechron Technologies

Similar jobs you can apply for

Software / Web Developer

You can expect a minimum salary of 0 INR. The salary offered will depend on your skills, experience and performance in the interview.

The candidate should have completed the required education and people who have 5 to 31 years are eligible to apply for this job. You can apply for more jobs in Bengaluru/Bangalore to get hired quickly.

The candidate should have sound communication skills and sound communication skills for this job.

Both Male and Female candidates can apply for this job.

No, it's not a work from home job and can't be done online. You can explore and apply for other work from home jobs in Bengaluru/Bangalore at apna.

No work-related deposit needs to be made during your employment with the company.

Go to the apna app and apply for this job. Click on the apply button and call HR directly to schedule your interview.

The last date to apply for this job is . For more details, download apna app and find Full Time jobs in Bengaluru/Bangalore . Through apna, you can find jobs in 64 cities across India. Join NOW!

PySpark Data Engineer with Cloudera and Cloud Expertise

Job details

About Company

Job Description

Experience Level

Job role

Job requirements

About company

Similar jobs you can apply for

Software / Web Developer Intern

App Developer

Quality Assurance Officer

Quality Engineer

Package Consultant – SAP HANA SCM PM

DevOps Engineer

FAQs about this job

How much salary can I expect as a PySpark Data Engineer with Cloudera and Cloud Expertise in Synechron Technologies in Bengaluru/Bangalore?

What is the eligibility criteria to apply for PySpark Data Engineer with Cloudera and Cloud Expertise in Synechron Technologies in Bengaluru/Bangalore?

Is there any specific skill required for this job?

Who can apply for this job?

Is it a work from home job?

Are there any charges or deposits required while applying for the role or while joining?

How can I apply for this job?

What is the last date to apply?

Software / Web Developer Intern

App Developer

Quality Assurance Officer

Quality Engineer

Package Consultant – SAP HANA SCM PM

DevOps Engineer