Site Reliability Engineering Manager

Ford Motor

Chennai

Not disclosed

Work from Office

Full Time

Min. 10 years

Job Details

Job Description

Manager, SRE

Collaboration & Communication Excellence: They possess strong communication and influencing skills, effectively collaborating with senior leadership, engineers, and cross-functional teams. They clearly articulate complex technical concepts and successfully lead geographically distributed teams while fostering strong partnerships across engineering, product, operations, security, and business functions.

Team Development & Cultural Stewardship: The SRE Manager promotes an inclusive and psychologically safe environment, mentors team members, encourages innovation, and builds a culture of continuous learning and blameless improvement.

Technical Acumen & Innovation Driver: They demonstrate deep technical curiosity, attention to detail, and adaptability, guiding teams through evolving technologies and challenges.

Accountability & Ownership: They take full responsibility for the reliability, performance, and health of critical applications while driving measurable outcomes and team accountability. 
SRE Strategy & Best Practices: Expertise in SLOs/SLIs, error budgets, incident response, and reliability improvements.
Architecture & Modern Platform Engineering: Cloud-native, microservices, Kubernetes, hybrid cloud.
Automation, CI/CD & Observability: IaC, CI/CD, monitoring, AIOps.
Infrastructure & Security: Cloud security, networking, databases, disaster recovery.


Define SRE Strategy & Vision: Develop and drive the long-term SRE strategy and roadmap for the Marketing and Sales technology portfolio, aligning reliability goals with business objectives. Establish enterprise SRE standards, including SLOs, SLIs, and error budgets, and translate technical metrics into meaningful business health indicators.

Lead the "Paved Road" Initiative & Platform Engineering: Build and enhance shared SRE platforms, tools, and services that enable secure, reliable deployments. Promote automation, self-service capabilities, and an automation-first culture to reduce operational toil and improve efficiency.

Drive Observability, AIOps & Performance Strategy: Lead observability initiatives with robust monitoring, logging, and alerting while advancing AIOps capabilities using AI/ML for anomaly detection, predictive insights, and proactive risk mitigation.

Architectural Leadership & Collaboration for Reliability: Partner with engineering and architecture teams to design scalable, secure, and resilient systems that follow SRE best practices.

Oversee Incident Management & Resilience: Lead incident response, promote blameless post-mortems, improve MTTR, drive resilience testing, and oversee 24x7 first-responder operations.

Cross-Functional Engagement & Governance: Collaborate across teams to embed SRE practices, ensure compliance, and lead the SRE Community of Practice.

Reporting, Vendor & Budget Management: Deliver executive reporting on system health, manage vendor partnerships, and optimize SRE budgets and cloud spend.

  • Bachelor's degree in Computer Science, Engineering, or a related technical field (Master's degree preferred).
  • Progressive Leadership: 10+ years of progressive experience in Site Reliability Engineering, including a minimum of 5+ years of proven leadership experience managing and mentoring SRE teams.
  • Cloud Expertise: Extensive experience designing, deploying, and operating mid to large-scale public cloud environments. GCP expertise is a must-have, with additional experience in AWS or Azure being a significant advantage.
  • Infrastructure as Code (IaC): Demonstrated expertise and hands-on experience in implementing and driving Infrastructure as Code (IaC) strategies, particularly with Terraform Enterprise.
  • SRE Frameworks & Observability: Strong track record of defining and implementing comprehensive SRE frameworks, including Service Level Objectives, Service Level Indicator, and Error Budgets. Proven experience in developing and implementing robust observability solutions (monitoring, logging, tracing) using tools such as Dynatrace, Grafana, Prometheus, and native cloud monitoring services.
  • Modern Application Architectures: Experience with microservices architectures, Spring Boot, and both NoSQL and SQL datastores.
  • Enterprise CMS (Plus): Familiarity with Adobe Experience Manager (AEM) or similar enterprise Content Management System (CMS) platforms is a plus.

Experience Level

Senior Level

Job role

Work location

Chennai, Tamil Nadu, India

Department

Software Engineering

Role / Category

Software Development

Employment type

Full Time

Shift

Day Shift

Job requirements

Experience

Min. 10 years

About company

Name

Ford Motor

Job posted by Ford Motor

Apply on company website