Associate Director - Global Technology Services (Run)
Kpmg India Services LlpJob Description
Associate Director-GTS Run
Roles & responsibilities
Own end-to-end engineering, reliability, availability, scalability, performance, and capacity planning for mission-critical Audit portfolio platforms, ensuring enterprise-grade operational excellence.
Provide 24x7x365 production support, including weekends and holidays, participating in a global follow-the-sun on-call model and ensuring rapid incident response and service restoration.
Ensure ≥99.0% (target 99.9%+) availability through SRE best practices including SLIs/SLOs, error budgets, proactive monitoring, intelligent alerting, and automated remediation.
Architect and operate secure, scalable, and resilient Azure cloud environments leveraging AKS, App Services, VM Scale Sets, Azure SQL, Data Lake, Azure Storage, and Microsoft Fabric for large-scale data and analytics workloads.
Implement and manage Infrastructure as Code using Terraform, ensuring consistent, repeatable, and compliant infrastructure provisioning across environments.
Drive DevOps and platform engineering practices using Azure DevOps, CI/CD pipelines, GitOps, and release automation, enabling faster and more reliable deployments.
Design and manage containerized and microservices architectures using AKS, including scaling, networking, security, service mesh integration, and zero-downtime deployments.
Implement deep observability using Azure Monitor, Application Insights, Log Analytics, KQL, and Azure Managed Prometheus, enabling full-stack monitoring, distributed tracing, and performance insights.
Proactively monitor production systems to detect early signals of failures, prevent performance degradation, and eliminate capacity bottlenecks using predictive and AI-driven insights.
Build and integrate AIOps and AI-powered automation, including anomaly detection, predictive alerting, automated incident triage, and self-healing infrastructure.
Lead major incident management, root cause analysis (RCA), and problem management, ensuring blameless postmortems and continuous reliability improvements.
Design and validate high availability and disaster recovery architectures, including multi-region deployments, failover strategies, backup/restore, and RTO/RPO adherence.
Plan and execute ITDR drills, disaster recovery testing, and audit readiness activities, including evidence collection and compliance validation.
Manage environment lifecycle including infrastructure upgrades, secure deployments, vulnerability remediation, patching, and end-of-life transitions.
Implement strong security and secrets management using Azure Key Vault, managed identities, RBAC, and zero-trust architecture principles.
Ensure compliance with enterprise and regulatory standards through policy enforcement, audit controls, and governance frameworks.
Optimize cloud spend using FinOps practices, including cost allocation, tagging, rightsizing, reserved instances, and continuous cost-performance optimization.
Manage and optimize data platforms including Data Lake, Azure Storage, and Fabric, ensuring high availability, scalability, data integrity, and performance.
Establish advanced capacity planning and forecasting models, leveraging historical telemetry and AI-driven predictions.
Automate operational workflows using scripting (PowerShell, Python, Bash) and orchestration tools to minimize manual intervention and reduce MTTR.
Drive resilience engineering practices including chaos engineering, failure testing, and system hardening to improve overall platform reliability.
Collaborate with engineering, architecture, cloud, security, and business teams to design and operate scalable, secure, and compliant solutions.
Build and maintain comprehensive technical documentation, runbooks, and operational playbooks for consistent and efficient support.
Act as a senior technical leader and individual contributor, influencing architecture decisions, driving innovation, and mentoring engineers across teams without direct people management.
Roles & responsibilities
Own end-to-end engineering, reliability, availability, scalability, performance, and capacity planning for mission-critical Audit portfolio platforms, ensuring enterprise-grade operational excellence.
Provide 24x7x365 production support, including weekends and holidays, participating in a global follow-the-sun on-call model and ensuring rapid incident response and service restoration.
Ensure ≥99.0% (target 99.9%+) availability through SRE best practices including SLIs/SLOs, error budgets, proactive monitoring, intelligent alerting, and automated remediation.
Architect and operate secure, scalable, and resilient Azure cloud environments leveraging AKS, App Services, VM Scale Sets, Azure SQL, Data Lake, Azure Storage, and Microsoft Fabric for large-scale data and analytics workloads.
Implement and manage Infrastructure as Code using Terraform, ensuring consistent, repeatable, and compliant infrastructure provisioning across environments.
Drive DevOps and platform engineering practices using Azure DevOps, CI/CD pipelines, GitOps, and release automation, enabling faster and more reliable deployments.
Design and manage containerized and microservices architectures using AKS, including scaling, networking, security, service mesh integration, and zero-downtime deployments.
Implement deep observability using Azure Monitor, Application Insights, Log Analytics, KQL, and Azure Managed Prometheus, enabling full-stack monitoring, distributed tracing, and performance insights.
Proactively monitor production systems to detect early signals of failures, prevent performance degradation, and eliminate capacity bottlenecks using predictive and AI-driven insights.
Build and integrate AIOps and AI-powered automation, including anomaly detection, predictive alerting, automated incident triage, and self-healing infrastructure.
Lead major incident management, root cause analysis (RCA), and problem management, ensuring blameless postmortems and continuous reliability improvements.
Design and validate high availability and disaster recovery architectures, including multi-region deployments, failover strategies, backup/restore, and RTO/RPO adherence.
Plan and execute ITDR drills, disaster recovery testing, and audit readiness activities, including evidence collection and compliance validation.
Manage environment lifecycle including infrastructure upgrades, secure deployments, vulnerability remediation, patching, and end-of-life transitions.
Implement strong security and secrets management using Azure Key Vault, managed identities, RBAC, and zero-trust architecture principles.
Ensure compliance with enterprise and regulatory standards through policy enforcement, audit controls, and governance frameworks.
Optimize cloud spend using FinOps practices, including cost allocation, tagging, rightsizing, reserved instances, and continuous cost-performance optimization.
Manage and optimize data platforms including Data Lake, Azure Storage, and Fabric, ensuring high availability, scalability, data integrity, and performance.
Establish advanced capacity planning and forecasting models, leveraging historical telemetry and AI-driven predictions.
Automate operational workflows using scripting (PowerShell, Python, Bash) and orchestration tools to minimize manual intervention and reduce MTTR.
Drive resilience engineering practices including chaos engineering, failure testing, and system hardening to improve overall platform reliability.
Collaborate with engineering, architecture, cloud, security, and business teams to design and operate scalable, secure, and compliant solutions.
Build and maintain comprehensive technical documentation, runbooks, and operational playbooks for consistent and efficient support.
Act as a senior technical leader and individual contributor, influencing architecture decisions, driving innovation, and mentoring engineers across teams without direct people management.
This role is for you if you have the below
Educational qualifications
Bachelor's degree in Computer Science
Work experience
10+ Years of Experience
Experience Level
Mid LevelJob role
Job requirements
About company
Similar jobs you can apply for
Teacher / Faculty / TutorKindergarten Teacher
Dinesh Joshi FoundationDigital Marketing Specialist
Wildn Woods Homestay
Site Civil Engineer
Aditya ConstructionsIT Support Engineer
Katze Technologies
Purchase Executive
Smruthi Gowda Infrastructure Private LimitedStore Helper
M/s JK FootwearYou can expect a minimum salary of 0 INR. The salary offered will depend on your skills, experience and performance in the interview.
The candidate should have completed the required education and people who have 10 to 31 years are eligible to apply for this job. You can apply for more jobs in Bengaluru/Bangalore to get hired quickly.
The candidate should have sound communication skills and sound communication skills for this job.
Both Male and Female candidates can apply for this job.
No, it's not a work from home job and can't be done online. You can explore and apply for other work from home jobs in Bengaluru/Bangalore at apna.
No work-related deposit needs to be made during your employment with the company.
Go to the apna app and apply for this job. Click on the apply button and call HR directly to schedule your interview.
The last date to apply for this job is . For more details, download apna app and find Full Time jobs in Bengaluru/Bangalore . Through apna, you can find jobs in 64 cities across India. Join NOW!