Senior Agentic AI Technical Operations Engineer

Larsen & Toubro Infotech Ltd (LTI)

Mumbai/Bombay

Not disclosed

Work from Office

Full Time

Min. 4 years

Job Details

Job Description

Senior Specialist - Architecture

Senior Agentic AI Tech Ops Engineer AI Center of Excellence

Company Ingram Micro Location North America US North America Remote

Reports to Head of AI Center of Excellence AI CoE

About Us

Ingram Micro is a leading global IT distributor connecting technology solution providers with vendors worldwide We are at the forefront of leveraging cuttingedge technologies to drive innovation and efficiency across the IT ecosystem Our AI Center of Excellence CoE is a dynamic and strategic group dedicated to solving complex business problems and creating new value streams through the application of Artificial Intelligence with a strong focus on developing and operationalizing autonomous and intelligent Agentic AI systems

Role Summary

We are seeking a proactive and technically skilled Agentic AI Tech Ops Engineer to join our AI CoE This role is crucial for ensuring the reliability scalability and efficient operation of our cuttingedge AI and Agentic AI systems in production environments You will be responsible for deploying monitoring maintaining and troubleshooting our AI agents and the underlying infrastructure Working closely with AI developers architects and data scientists you will implement and manage Agentic OpsMLOps practices automate operational tasks and contribute to building a robust and resilient operational framework for our AI initiatives

Key Responsibilities

Deployment Infrastructure Management

oDeploy configure and manage AI models agentic systems and supporting infrastructure in cloud eg GCP and onpremise environments

oImplement and maintain CICD pipelines for AIML models and agentic applications MLOpsAgent Ops

oManage and optimize cloud resources ensuring costeffectiveness and scalability for AI workloads

oCollaborate with infrastructure teams to ensure network storage and compute resources meet the demands of AI systems

Monitoring Logging ing

oDevelop and implement comprehensive monitoring logging and ing solutions for AI agents and infrastructure to ensure high availability and performance

oProactively identify and address potential issues performance bottlenecks and anomalies in production AI systems

oTrack key operational metrics and create dashboards for system health and performance

Incident Response Troubleshooting

oProvide operational support for production AI systems including incident response root cause analysis and resolution of technical issues

oDevelop and maintain runbooks and standard operating procedures for common operational tasks and incident management

oParticipate in oncall rotations as needed to support critical AI services

Automation Operational Excellence

oAutomate routine operational tasks deployment processes and system maintenance activities using scripting eg Python Bash and automation tools

oContribute to the development and enforcement of operational best practices security standards and compliance requirements for AI systems

oWork with development teams to improve the deployability manageability and observability of AI applications

Collaboration Documentation

oCollaborate effectively with AI developers data scientists AI architects and other stakeholders to ensure smooth transitions from development to production

oMaintain clear and comprehensive documentation for system configurations operational procedures and troubleshooting guides

oProvide feedback to development teams on operational aspects and system performance

Required Qualifications Experience

Bachelors degree in Computer Science Information Technology Engineering or a related technical field

47 years of experience in a MLOps or Agent Ops role preferably supporting AIML or dataintensive applications

Handson experience with cloud computing platforms eg Google Cloud Platform especially Vertex AI and managing cloudbased infrastructure

Proficiency in scripting languages such as Python Bash or PowerShell for automation

Experience with CICD tools and practices eg Bitbucket GitLab CI GitHub Actions

Familiarity with containerization technologies eg Docker Kubernetes and orchestration

Experience with monitoring and logging tools eg Prometheus Grafana ELK Stack Datadog Google Cloud Monitoring Langfuse

Understanding of networking concepts security best practices and infrastructureascode IaC principles eg Terraform Ansible

Strong troubleshooting and problemsolving skills with an analytical mindset

Excellent communication skills and ability to work collaboratively in a team environment

A proactive approach to identifying and resolving issues and improving system reliability

Preferred Qualifications Experience

Masters degree in a relevant field

Specific experience in MLOps or Agent Ops including deploying and managing machine learning models or large language model applicatio

Job role

Work location

Navi Mumbai

Department

IT & Information Security

Role / Category

IT Security

Employment type

Full Time

Shift

Day Shift

Job requirements

Experience

Min. 4 years

About company

Name

Larsen & Toubro Infotech Ltd (LTI)

Job posted by Larsen & Toubro Infotech Ltd (LTI)

Apply on company website