Senior Machine Learning Operations Architect
Ernst & Young LLP ( EY India )
Apply on company website
Senior Machine Learning Operations Architect
Ernst & Young LLP ( EY India )
Thiruvananthapuram
Not disclosed
Job Details
Job Description
EY - GDS Consulting - AIA - ML Ops - Senior Manager
At EY, we’re all in to shape your future with confidence.
We’ll help you succeed in a globally connected powerhouse of diverse teams and take your career wherever you want it to go.
Join EY and help to build a better working world.
Job Title: Senior ML Ops Architect
Job Type: Full-time
Job Description
We are seeking a visionary, senior ML Ops Architect to define and govern the multi-cloud architecture (AWS & Azure), long-term strategy, and organizational adoption model for all Machine Learning Operations (ML Ops) and AI Operations (AI Ops) across enterprise. This role has the ultimate responsibility for ensuring AI delivery is compliant, highly available, cost-optimized, and drives competitive advantage within the highly regulated banking and insurance sectors.
The ideal candidate possesses 15+ years of strategic leadership experience, exceptional technical depth in both AWS and Azure, and a proven ability to influence C-suite decisions, manage large-scale technical risk, and institute enterprise-wide Model Risk Management (MRM) frameworks.
Key Responsibilities
Architecture & Strategic Vision
- Design the comprehensive, 5–10-year architectural vision for a unified ML Ops platform that strategically leverages both AWS (SageMaker, EKS) and Azure (Azure ML, AKS) services to maximize resilience and capability.
- Establish and lead the ML/AI Architecture Review Board (ARB), setting global standards for technology stack selection, architectural patterns, and security guardrails for all AI production deployments.
- Direct the enterprise-wide adoption and governance of IaC using Terraform or equivalent tools to ensure consistent, auditable, and secure provisioning of multi-cloud infrastructure (compute, networking, security groups, data plane).
- Serve as the top-tier subject matter expert, mentoring and influencing a large cohort of technical leads and engineers on advanced architectural concepts, technical debt management, and engineering excellence.
Automation, Scalability & Resilience
- Architect and oversee the implementation of automated, end-to-end Continuous Integration, Continuous Delivery, and Continuous Training pipelines that facilitate rapid, zero-downtime model deployments and rollbacks across hybrid/multi-cloud environments.
- Design the architecture for containerized ML workloads and inference services using enterprise-scale Kubernetes (AKS/EKS) clusters, focusing on service mesh implementation, efficient autoscaling strategies, and network isolation.
- Ensure the ML platform architecture can handle the massive scale and high throughput required for real-time risk, fraud, and customer interaction models within financial services.
Governance, Observability & Security (AI Ops)
- Architect and enforce robust Model Risk Management (MRM) frameworks, embedding regulatory compliance, audit trails, model versioning, and explainability (XAI) requirements directly into the ML Ops pipelines to meet banking/insurance sector mandates.
- Define the enterprise standard for AI Ops observability, leveraging unified monitoring tools (e.g., Prometheus/Grafana) to track multi-cloud system health, proactively detect and auto-remediate Model Drift, Data Quality issues, and prediction latency.
- Implement strategic architectural patterns and governance policies to drive maximum cost-efficiency and transparency across all Azure and AWS ML/compute resources, including chargeback and budget enforcement.
- Design and mandate secure data governance, Role-Based Access Control (RBAC), and Secrets Management across the multi-cloud architecture, ensuring data isolation and secure cross-cloud communication.
Required Skills & Experience
- 15+ years of professional experience in Enterprise Architecture, Software Engineering, or Strategic IT Leadership.
- 7+ years in a dedicated ML Ops Architect, Chief Architect with direct responsibility for enterprise-wide platform governance.
- Deep expertise in designing and implementing enterprise-grade ML Ops platforms, preferably in the banking and insurance sectors.
- Expert-level architectural proficiency and hands-on experience in both AWS and Azure:
- Azure: Azure Machine Learning, AKS, Azure DevOps, Azure Security Center, Azure Governance.
- AWS: AWS SageMaker, EKS, Lambda, S3, IAM, AWS Code Services.
- Demonstrated success in designing and deploying highly regulated, production-grade ML Ops solutions at enterprise scale.
- Mastery of Infrastructure as Code (IaC), specifically Terraform, for consistent multi-cloud deployment.
- Expert knowledge of Kubernetes orchestration and containerization (Docker).
- Proven experience implementing Model Risk Management (MRM) and XAI frameworks in a regulated environment.
- Strategic understanding of programming skills, especially Python and major ML frameworks (TensorFlow, PyTorch), sufficient to set and govern enterprise coding and model packaging standards.
- Proven experience designing and governing robust monitoring solutions for production ML systems (e.g., Prometheus, Grafana, Datadog) for enterprise-wide AI Ops.
- Master’s degree in computer science, Engineering, or a related quantitative field.
Preferred Skills
- Cloud Architecture Certifications in both clouds (e.g., Azure Solutions Architect Expert AND AWS Certified Solutions Architect – Professional).
- Experience with advanced MLOps tools like MLflow/DVC for experiment tracking and data versioning.
- Familiarity with governance standards for Generative AI/LLM Ops in a production setting.
Soft Skills
- Proven ability to conceptualize, articulate, and drive the long-term technical vision for the ML platform, anticipating future needs, disruptive technologies (e.g., Generative AI), and technological trends.
- Proven track record of driving cultural and process transformation across large, multi-disciplinary organizations.
- Proven ability to communicate complex architectural concepts effectively to technical teams, senior leadership, and non-technical stakeholders (including Risk and Compliance), driving consensus and influencing strategic decisions.
- Strong commitment to defining, documenting, and enforcing enterprise-wide engineering standards, architectural patterns, and security policies across multiple, geographically dispersed teams.
- Demonstrated ability to lead without direct authority, effectively bridging the gap between Data Science, Software Engineering, Risk Management, and Business units to ensure alignment and platform adoption.
- Excellent judgment in assessing technical debt, balancing architectural trade-offs (e.g., speed vs. stability vs. cost), and strategically mitigating high-impact production and regulatory risks.
EY | Building a better working world
EY is building a better working world by creating new value for clients, people, society and the planet, while building trust in capital markets.
Enabled by data, AI and advanced technology, EY teams help clients shape the future with confidence and develop answers for the most pressing issues of today and tomorrow.
EY teams work across a full spectrum of services in assurance, consulting, tax, strategy and transactions. Fueled by sector insights, a globally connected, multi-disciplinary network and diverse ecosystem partners, EY teams can provide services in more than 150 countries and territories.
Job role
Work location
Trivandrum, KL, IN, 695581 +3 more…
Department
Consulting
Role / Category
Data Science & Machine Learning
Employment type
Full Time
Shift
Day Shift
Job requirements
Experience
Min. 15 years
About company
Name
Ernst & Young LLP ( EY India )
Job posted by Ernst & Young LLP ( EY India )
Apply on company website