Synechron Technologies

PySpark Data Engineer with Cloudera and Cloud Expertise

Synechron Technologies
Bengaluru/Bangalore
Not disclosed
Work from OfficeWork from Office
Full TimeFull Time
Min. 5 yearsMin. 5 years

Job Description

PySpark Data Engineer with Cloudera and Cloud Expertise

Job Summary
Synechron is seeking a highly experienced PySpark Data Engineer to develop, optimize, and maintain scalable data pipelines within the Cloudera Data Platform (CDP). This role is essential in ensuring high data quality, availability, and performance across enterprise data ecosystems. The successful candidate will leverage extensive big data and cloud-native processing expertise to support business analytics, reporting, and data science initiatives, driving impactful insights and operational efficiency.

Software Requirements

  • Required:

    • Advanced proficiency in PySpark, including handling DataFrames, RDDs, and optimization techniques for large-scale data processing

    • Strong experience with Cloudera Data Platform components such as Cloudera Manager, Hive, Impala, HDFS, and HBase

    • In-depth knowledge of Hadoop ecosystem technologies (Hadoop, Kafka) and distributed computing frameworks

    • SQL expertise and experience with data warehousing concepts (Hive, Impala)

    • Linux scripting skills (Bash, Python) for automation and operational workflows

    • Experience with orchestration tools like Apache Oozie or Apache Airflow

  • Preferred:

    • Cloud data services (AWS EMR, Azure HDInsight, GCP Dataproc) for scalable data processing

    • Data modeling, metadata management, and data governance tools

    • CI/CD pipelines setup using Jenkins, GitLab, or similar tools

Overall Responsibilities

  • Design, develop, and optimize highly scalable data pipelines using PySpark within the Cloudera Data Platform to support business intelligence and analytics.

  • Manage end-to-end data ingestion processes from various sources such as relational databases, APIs, and file systems.

  • Execute data transformation, cleansing, and aggregation processes on large datasets to facilitate reporting and data science activities.

  • Conduct performance tuning of PySpark jobs and optimize cluster resource utilization.

  • Implement data quality checks, validation routines, and monitoring to ensure data accuracy and consistency.

  • Automate data workflows and pipeline orchestration to reduce manual intervention and improve efficiency.

  • Troubleshoot data pipeline issues and drive operational stability across data ecosystems.

  • Collaborate with data analysts, data scientists, and platform engineers to understand data requirements and improve system performance.

  • Maintain detailed documentation for data pipelines, workflows, configurations, and operational procedures.

  • Support data governance, security, and compliance initiatives aligned with enterprise standards.

Technical Skills (By Category)

  • Programming & Data Processing (Essential):

    • PySpark (DataFrames, RDDs, optimization)

    • SQL (Hive, Impala, relational databases)

    • Linux scripting (Bash, Python) for automation

  • Data Ecosystem & Storage (Essential):

    • Hadoop ecosystem (HDFS, Hive, Impala, HBase)

    • Kafka or similar messaging systems for data streaming

  • Cloud & Orchestration (Preferred):

    • Cloud-native data processing (AWS EMR, Azure HDInsight, GCP Dataproc)

    • Orchestration tools (Apache Airflow, Oozie)

  • Tools & Frameworks (Preferred):

    • CI/CD with Jenkins, GitLab CI

    • Data governance and metadata tools (e.g., Apache Atlas, Collibra)

Experience Requirements

  • Minimum of 5+ years working in data engineering roles with significant PySpark expertise.

  • Proven experience building and managing large-scale data pipelines in enterprise environments.

  • Strong background in big data ecosystems, cloud data services, and data warehousing.

  • Demonstrated ability to optimize Spark jobs and troubleshoot distributed data processing issues.

  • Experience supporting financial or regulated industries is advantageous.

  • Support pathways include extensive hands-on experience in large data ecosystems supporting analytics and reporting.

Day-to-Day Activities

  • Develop, optimize, and monitor scalable data pipelines for ingestion, transformation, and redistribution of data.

  • Troubleshoot data processing issues proactively, perform root cause analysis, and implement fixes.

  • Collaborate with data analysts, data scientists, and platform teams to design data models and pipelines based on business needs.

  • Automate operational workflows using orchestration tools to enhance pipeline reliability.

  • Conduct performance tuning, cluster management, and resource optimization for Spark jobs.

  • Validate data quality, correctness, and completeness through routine reviews and monitoring.

  • Document architecture, workflows, and procedures for operational governance.

  • Support data privacy, security, and compliance measures within data ecosystems.

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.

  • 5+ years of hands-on experience with PySpark, big data ecosystems, and distributed processing.

  • Proven expertise supporting large-scale data pipelines in enterprise or financial industry environments.

  • Experience with Cloudera Data Platform components (Hive, Impala, HDFS, HBase).

  • Strong SQL and data modeling skills.

  • Support experience supporting cloud data processing environments (AWS, Azure, GCP) is advantageous.

  • Relevant certifications (e.g., AWS Big Data Specialty, Cloudera Certified Data Engineer) are preferred.

Professional Competencies

  • Strong analytical and troubleshooting skills for complex data pipeline issues.

  • Ability to work independently and collaboratively across teams.

  • Effective communication skills to convey technical details to non-technical stakeholders.

  • Adaptability to evolving technologies and data processing requirements.

  • Focus on operational excellence, data quality, and process automation.

  • Ownership mindset to ensure data integrity, performance, and reliability.

S​YNECHRON’S DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.


All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Experience Level

Senior Level

Job role

Work location
Work locationBengaluru - BCIT, India
Department
DepartmentData Science & Analytics
Role / Category
Role / CategoryDBA / Data warehousing
Employment type
Employment typeFull Time
Shift
ShiftDay Shift

Job requirements

Experience
ExperienceMin. 5 years

About company

Name
NameSynechron Technologies
Job posted by Synechron Technologies

Similar jobs you can apply for

Software / Web Developer
BNV Software

Software / Web Developer Intern

BNV Software
Domlur, Bengaluru/Bangalore
₹12,000 - ₹18,000
Work from Office
Full Time
Freshers only
Good (Intermediate / Advanced) English
Minchu Productions

App Developer

Minchu Productions
Jaya Nagar, Bengaluru/Bangalore
₹25,000 - ₹25,000
Work from Office
Full Time
Any experience
Good (Intermediate / Advanced) English
Jai Finance India Limited

Quality Assurance Officer

Jai Finance India Limited
BTM Layout, Bengaluru/Bangalore
₹25,000 - ₹30,000
Work from Office
Full Time
Min. 1 year
Good (Intermediate / Advanced) English
Ace Carbo Nitriders

Quality Engineer

Ace Carbo Nitriders
Peenya, Bengaluru/Bangalore
₹18,000 - ₹30,000
Work from Office
Full Time
Any experience
Basic English
360 Bytes Tech Venture Private Limited

Package Consultant – SAP HANA SCM PM

360 Bytes Tech Venture Private Limited
Bengaluru/Bangalore
₹1,00,000 - ₹1,15,000
Work from Office
Full Time
Min. 10 years
Good (Intermediate / Advanced) English
Digitory Solutions

DevOps Engineer

Digitory Solutions
Basavanagudi, Bengaluru/Bangalore
₹20,000 - ₹50,000
Work from Office
Full Time
Min. 1 year
Good (Intermediate / Advanced) English

You can expect a minimum salary of 0 INR. The salary offered will depend on your skills, experience and performance in the interview.

The candidate should have completed the required education and people who have 5 to 31 years are eligible to apply for this job. You can apply for more jobs in Bengaluru/Bangalore to get hired quickly.

The candidate should have sound communication skills and sound communication skills for this job.

Both Male and Female candidates can apply for this job.

No, it's not a work from home job and can't be done online. You can explore and apply for other work from home jobs in Bengaluru/Bangalore at apna.

No work-related deposit needs to be made during your employment with the company.

Go to the apna app and apply for this job. Click on the apply button and call HR directly to schedule your interview.

The last date to apply for this job is . For more details, download apna app and find Full Time jobs in Bengaluru/Bangalore . Through apna, you can find jobs in 64 cities across India. Join NOW!