Big Data Engineer - PySpark & Hadoop Specialist

Synechron Technologies

Big Data Engineer - PySpark & Hadoop Specialist

Synechron Technologies

Pune

Not disclosed

Work from Office

Full Time

Min. 4 years

Job Description

Big Data Engineer | PySpark, Hadoop Ecosystem, Cloud Integration & Data Migration

Job Summary
Synechron is seeking a seasoned Big Data Engineer specialized in PySpark to support complex data processing and ETL workflows within enterprise environments. The role involves designing, developing, and optimizing scalable data pipelines supporting analytics, data migration, and high-volume processing needs. The candidate will leverage their expertise in Hadoop ecosystem components, distributed computing, and storage formats to deliver high-performance, maintainable solutions aligned with business and regulatory requirements.

Software Requirements

Required Software Proficiency:

SQL (T-SQL, HiveQL, or ANSI SQL) — strong skills supporting data validation, query optimization, and data management (4+ years)
Hadoop Ecosystem: HDFS, Hive, Pig, Sqoop, Spark, or Impala — extensive experience supporting large-scale data processing and pipeline development (4+ years)
Data Ingestion and ETL tools supporting enterprise workflows — proven ability to develop and optimize data pipelines (4+ years)
Distributed computing concepts (MapReduce, Spark) supporting high-volume data processing
Knowledge of file formats: Parquet, ORC, Avro, JSON, CSV — supporting data storage and retrieval efficiency
Performance tuning for queries and data pipelines supporting operational and analytical workloads
Scripting skills: Python, Shell, or Scala support automation and pipeline scripting (preferred)

Preferred Software Skills:

Cloud data platforms (Azure, AWS, GCP) supporting scalable data processing (supporting deployment, storage, and processing)
Data workflow orchestration tools supporting automation of data pipelines (e.g., Apache Airflow, Oozie)

Overall Responsibilities

Design, develop, and optimize scalable data pipelines supporting analytics, migration, and operational reporting
Build high-performance ETL workflows using PySpark, Spark SQL, and Hadoop ecosystem components
Support data ingestion, transformation, and validation activities ensuring data quality and consistency
Collaborate with data science, data engineering, and business teams to translate requirements into technical solutions
Tune performance of data queries, Spark jobs, and storage formats to support high-volume workloads
Implement data governance, security, and compliance practices supporting industry standards and regulations
Maintain operational documentation, data lineage, and best practices for pipeline management
Lead efforts to improve automation, pipeline reliability, and system scalability supporting enterprise growth

Technical Skills (By Category)

Languages & Data Tools (Essential):
- Python, Spark SQL, HiveQL, or ANSI SQL supporting scalable data transformations and queries
- Hadoop ecosystem components: HDFS, Hive, Pig, Sqoop, Impala supporting large-scale data pipelines

Databases & Data Management:
- Relational: SQL Server, Oracle, PostgreSQL support for transactional and reference data validation
- Data storage formats: Parquet, ORC, Avro support efficient data management and retrieval

Cloud & Infrastructure:
- Support for cloud platforms (Azure, AWS, GCP) supporting scalable storage and processing (preferred)
- Data orchestration tools supporting automation (e.g., Airflow, Oozie) (preferred)

Frameworks & Libraries:
- PySpark, Spark SQL support for large-scale data transformation and processing

Tools & Methodologies:
- ETL/ELT development, workflow automation, performance tuning practices supporting agile environments

Security & Governance:
- Data masking, encryption, and access controls aligned with compliance standards (HIPAA, GDPR) support

Experience Requirements

4+ years of experience supporting large-scale data processing, data pipelines, and ETL workflows in enterprise environments
Proven expertise in Hadoop ecosystem components, Spark, and distributed data processing support
Experience in data validation, reconciliation, and storage optimization supporting analytics and migration
Knowledge in supporting regulated environments with compliance, security, and data governance standards (preferred)
Alternative pathways include extensive experience in data engineering, supporting high-volume data systems, and automation

Day-to-Day Activities

Develop, test, and optimize data pipelines using PySpark, Hive, and Hadoop ecosystem components
Support data ingestion, transformation, and validation supporting business analytics and migration projects
Monitor system performance, troubleshoot data processing issues, and implement optimizations
Collaborate with data analysts, data scientists, and enterprise data teams on technical solutions
Support cloud or on-premises data warehouse environments supporting enterprise analytics
Implement and support data governance practices, security controls, and compliance measures
Maintain detailed documentation supporting operational procedures, data flows, and data lineage
Automate workflows and iteratively improve pipeline reliability and performance

Qualifications

Bachelor’s or Master’s degree in Data Engineering, Computer Science, or a related field
4+ years supporting big data solutions, ETL workflows, and data migration in enterprise settings
Experience with Hadoop ecosystem, Spark, and distributed data processing platforms
Support for cloud data services supporting large-scale, high-volume workloads (preferred)
Certifications in Hadoop, Spark, or cloud platforms (e.g., AWS, GCP, Azure) are a plus

Professional Competencies

Strong analytical and troubleshooting skills supporting complex data workflows
Leadership skills to guide junior team members and promote best practices in data engineering
Excellent communication for stakeholder engagement, documentation, and reporting
Adaptability to evolving data standards, tools, and regulatory frameworks
Commitment to data quality, security, and operational efficiency
Time management and organizational skills for handling multiple data projects in a fast-paced environment

SYNECHRON’S DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Experience Level

Senior Level

Job role

Work locationPune - Hinjewadi (Ascendas), India

DepartmentData Science & Analytics

Role / CategoryDBA / Data warehousing

Employment typeFull Time

ShiftDay Shift

Job requirements

ExperienceMin. 4 years

About company

NameSynechron Technologies

Job posted by Synechron Technologies

Similar jobs you can apply for

Software / Web Developer

You can expect a minimum salary of 0 INR. The salary offered will depend on your skills, experience and performance in the interview.

The candidate should have completed the required education and people who have 4 to 31 years are eligible to apply for this job. You can apply for more jobs in Pune to get hired quickly.

The candidate should have sound communication skills and sound communication skills for this job.

Both Male and Female candidates can apply for this job.

No, it's not a work from home job and can't be done online. You can explore and apply for other work from home jobs in Pune at apna.

No work-related deposit needs to be made during your employment with the company.

Go to the apna app and apply for this job. Click on the apply button and call HR directly to schedule your interview.

The last date to apply for this job is . For more details, download apna app and find Full Time jobs in Pune . Through apna, you can find jobs in 64 cities across India. Join NOW!

Big Data Engineer - PySpark & Hadoop Specialist

Job details

About Company

Job Description

Experience Level

Job role

Job requirements

About company

Similar jobs you can apply for

Web Developer

AI/ML Engineer & Application Developer

Business Interns

QC Specialist

Quality Control

Quality Engineer

FAQs about this job

How much salary can I expect as a Big Data Engineer - PySpark & Hadoop Specialist in Synechron Technologies in Pune?

What is the eligibility criteria to apply for Big Data Engineer - PySpark & Hadoop Specialist in Synechron Technologies in Pune?

Is there any specific skill required for this job?

Who can apply for this job?

Is it a work from home job?

Are there any charges or deposits required while applying for the role or while joining?

How can I apply for this job?

What is the last date to apply?

Web Developer

AI/ML Engineer & Application Developer

Business Interns

QC Specialist

Quality Control

Quality Engineer