Data Engineer

Virtusa Consulting Services

Chennai

Not disclosed

Work from Office

Full Time

Min. 9 years

Job Details

Job Description

Data Engineer Python, AWS

Description

Skillset required for this project : Python, AWS, Starbust, Neo4j
Roles and Responsibilities : Data Engineer Role
1. Data Pipeline Development:
Design, build, and maintain scalable data pipelines in AWS Glue using Python code language to ingest, transform, and load data from sources (AWS S3 bucket) into data lakes (AWS S3 bucket).
2. Data Integration:
Integrate data from multiple sources and systems (AWS S3 bucket) to enable unified and comprehensive views of clinical data.
Implement ETL (Extract, Transform, Load) processes to prepare raw data for analysis.
3. Database Management:
Manage and optimize databases (e.g., Starbust) for performance, scalability, and reliability.
Implement database schemas and indexes to support efficient data querying and retrieval.
4. Data Governance and Security:
Implement data governance policies and procedures to ensure data security, compliance, and privacy using AZ GxP process.
Establish access controls, encryption, and auditing mechanisms to protect sensitive data.
5. Monitoring and Optimization:
Monitor data pipelines and systems for performance issues, bottlenecks, and anomalies.
Implement optimizations to improve data processing efficiency and reduce latency.
6. Collaboration and Documentation:
Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and support analytics initiatives.
Document data engineering processes, workflows, and architectures for knowledge sharing and future reference.
Collaborate with testers while testing the developed pipeline and after which promoting to higher environment.
7. Deployment and Jira Test Execution:
Deploying the code from Github repository from the master branch to higher environments like SIT, PPT and PROD.
While deploying we capture evidence and execute the deployment scripts with different steps and corresponding evidence in Jira.
8. Automation:
Automation to trigger the code automatically using AWS resources without manual intervention.