Data Engineer with Databricks and Python Skills

Kpmg India Services Llp

Bengaluru/Bangalore

Not disclosed

Work from Office

Full Time

Min. 2 years

Job Details

Job Description

Engineer (A2 DES, Databricks, PySpark, Python)

Roles & responsibilities

Role Overview: The Associate 2 - “Data Engineer with Databricks/Python skills” will be part of the GDC Technology Solutions (GTS) team, working in a technical role in the Audit Data & Analytics domain that requires developing expertise in KPMG proprietary D&A (Data and analytics)) tools and audit methodology. He/she will be a part of the team responsible for extracting and processing datasets from client ERP systems (SAP/Oracle/Microsoft Dynamics) or other sources to provide insights through data warehousing, ETL and dashboarding solutions to Audit/internal teams and be involved in developing solutions using a variety of tools & technologies

The Associate 2 - “Data Engineer” will be predominantly responsible for:

Data Engineering

·Understand requirements, validate assumptions, and develop solutions using Azure Databricks, Azure Data Factory or Python. Able to handle any data mapping changes and customizations within Databricks using PySpark ·Build Azure Databricks notebooks to perform data transformations, create tables, and ensure data quality and consistency. Leverage Unity Catalog for data governance and maintaining a unified data view across the organization ·Analyze enormous volumes of data using Azure Databricks and Apache Spark. Create pipelines and workflows to support data analytics, machine learning, and other data-driven applications ·Able to integrate Azure Databricks with ERP systems or third part systems using APIs and build Python or PySpark notebooks to apply business transformation logic as per the common data model ·Debug, optimize and performance tune and resolve issues, if any, with limited guidance, when processing large data sets and propose possible solutions ·Must have experience in concepts like Partitioning, optimization, and performance tuning for improving the performance of the process ·Implement best practices of Azure Databricks design, development, Testing and documentation ·Work with Audit engagement teams to interpret the results and provide meaningful audit insights from the reports ·Participate in team meetings, brainstorming sessions, and project planning activities ·Stay up-to-date with the latest advancements in Azure Databricks, Cloud and AI development, to drive innovation and maintain a competitive edge ·Enthusiastic to learn and use Azure AI services in business processes. ·Work experience on using Microsoft Fabric is an added advantage ·Write production ready code ·Design, develop, and maintain scalable and efficient data pipelines to process large datasets from various sources using Azure Data Factory (ADF). ·Integrate data from multiple data sources and ensure data consistency, quality, and accuracy, leveraging Azure Data Lake Storage (ADLS). ·Design and implement ETL (Extract, Transform, Load) processes to ensure seamless data flow across systems using Azure ·Work experience on Microsoft Fabric is an added advantage ·Enthusiastic to learn, adapt and integrate Gen AI into the business process and should have experience working with Azure AI services ·Optimize data storage and retrieval processes to enhance system performance and reduce latency.

Technical Skills

Primary Skills:

Ø2-4 years of experience in data engineering, with a strong focus on Databricks, PySpark, Python and Spark SQL. ØProven experience in implementing ETL processes and data pipelines ØHands-on experience with Azure Databricks, Azure Data Factory (ADF), Azure Data Lake Storage (ADLS) ØAbility to write reusable, testable, and efficient code ØDevelop low-latency, high-availability, and high-performance applications ØUnderstanding of fundamental design principles behind a scalable application ØGood knowledge of Azure cloud services ØFamiliarity with Generative AI and its applications in data engineering ØKnowledge of Microsoft Fabric and Azure AI services is an added advantage

 Enabling Skills

·Excellent analytical, and problem-solving skills ·Quick learning ability and adaptability ·Effective communication skills ·Attention to detail and good team player ·Willingness and ability to deliver within tight timelines ·Flexible to work timings and willingness to work on different projects/technologies

Roles & responsibilities

Role Overview: The Associate 2 - “Data Engineer with Databricks/Python skills” will be part of the GDC Technology Solutions (GTS) team, working in a technical role in the Audit Data & Analytics domain that requires developing expertise in KPMG proprietary D&A (Data and analytics)) tools and audit methodology. He/she will be a part of the team responsible for extracting and processing datasets from client ERP systems (SAP/Oracle/Microsoft Dynamics) or other sources to provide insights through data warehousing, ETL and dashboarding solutions to Audit/internal teams and be involved in developing solutions using a variety of tools & technologies

The Associate 2 - “Data Engineer” will be predominantly responsible for:

Data Engineering

·Understand requirements, validate assumptions, and develop solutions using Azure Databricks, Azure Data Factory or Python. Able to handle any data mapping changes and customizations within Databricks using PySpark ·Build Azure Databricks notebooks to perform data transformations, create tables, and ensure data quality and consistency. Leverage Unity Catalog for data governance and maintaining a unified data view across the organization ·Analyze enormous volumes of data using Azure Databricks and Apache Spark. Create pipelines and workflows to support data analytics, machine learning, and other data-driven applications ·Able to integrate Azure Databricks with ERP systems or third part systems using APIs and build Python or PySpark notebooks to apply business transformation logic as per the common data model ·Debug, optimize and performance tune and resolve issues, if any, with limited guidance, when processing large data sets and propose possible solutions ·Must have experience in concepts like Partitioning, optimization, and performance tuning for improving the performance of the process ·Implement best practices of Azure Databricks design, development, Testing and documentation ·Work with Audit engagement teams to interpret the results and provide meaningful audit insights from the reports ·Participate in team meetings, brainstorming sessions, and project planning activities ·Stay up-to-date with the latest advancements in Azure Databricks, Cloud and AI development, to drive innovation and maintain a competitive edge ·Enthusiastic to learn and use Azure AI services in business processes. ·Work experience on using Microsoft Fabric is an added advantage ·Write production ready code ·Design, develop, and maintain scalable and efficient data pipelines to process large datasets from various sources using Azure Data Factory (ADF). ·Integrate data from multiple data sources and ensure data consistency, quality, and accuracy, leveraging Azure Data Lake Storage (ADLS). ·Design and implement ETL (Extract, Transform, Load) processes to ensure seamless data flow across systems using Azure ·Work experience on Microsoft Fabric is an added advantage ·Enthusiastic to learn, adapt and integrate Gen AI into the business process and should have experience working with Azure AI services ·Optimize data storage and retrieval processes to enhance system performance and reduce latency.

Technical Skills

Primary Skills:

Ø2-4 years of experience in data engineering, with a strong focus on Databricks, PySpark, Python and Spark SQL. ØProven experience in implementing ETL processes and data pipelines ØHands-on experience with Azure Databricks, Azure Data Factory (ADF), Azure Data Lake Storage (ADLS) ØAbility to write reusable, testable, and efficient code ØDevelop low-latency, high-availability, and high-performance applications ØUnderstanding of fundamental design principles behind a scalable application ØGood knowledge of Azure cloud services ØFamiliarity with Generative AI and its applications in data engineering ØKnowledge of Microsoft Fabric and Azure AI services is an added advantage

 Enabling Skills

·Excellent analytical, and problem-solving skills ·Quick learning ability and adaptability ·Effective communication skills ·Attention to detail and good team player ·Willingness and ability to deliver within tight timelines ·Flexible to work timings and willingness to work on different projects/technologies

Education Requirements

·B. Tech/B.E/MCA (Computer Science / Information Technology)

Primary Skills:

Ø2-4 years of experience in data engineering, with a strong focus on Databricks, PySpark, Python and Spark SQL. ØProven experience in implementing ETL processes and data pipelines ØHands-on experience with Azure Databricks, Azure Data Factory (ADF), Azure Data Lake Storage (ADLS) ØAbility to write reusable, testable, and efficient code ØDevelop low-latency, high-availability, and high-performance applications ØUnderstanding of fundamental design principles behind a scalable application ØGood knowledge of Azure cloud services ØFamiliarity with Generative AI and its applications in data engineering ØKnowledge of Microsoft Fabric and Azure AI services is an added advantage

Job role

Work location

Bangalore, Karnataka, India

Department

Software Engineering

Role / Category

Data Science & Machine Learning

Employment type

Full Time

Shift

Day Shift

Job requirements

Experience

Min. 2 years

About company

Name

Kpmg India Services Llp

Job posted by Kpmg India Services Llp

Apply on company website