Senior Agentic AI Technical Operations Engineer
Larsen & Toubro Infotech Ltd (LTI)
Apply on company website
Senior Agentic AI Technical Operations Engineer
Larsen & Toubro Infotech Ltd (LTI)
Mumbai/Bombay
Not disclosed
Job Details
Job Description
Senior Specialist - Architecture
Senior Agentic AI Tech Ops Engineer AI Center of Excellence
Company Ingram Micro Location North America US North America Remote
Reports to Head of AI Center of Excellence AI CoE
About Us
Ingram Micro is a leading global IT distributor connecting technology solution providers with vendors worldwide We are at the forefront of leveraging cuttingedge technologies to drive innovation and efficiency across the IT ecosystem Our AI Center of Excellence CoE is a dynamic and strategic group dedicated to solving complex business problems and creating new value streams through the application of Artificial Intelligence with a strong focus on developing and operationalizing autonomous and intelligent Agentic AI systems
Role Summary
We are seeking a proactive and technically skilled Agentic AI Tech Ops Engineer to join our AI CoE This role is crucial for ensuring the reliability scalability and efficient operation of our cuttingedge AI and Agentic AI systems in production environments You will be responsible for deploying monitoring maintaining and troubleshooting our AI agents and the underlying infrastructure Working closely with AI developers architects and data scientists you will implement and manage Agentic OpsMLOps practices automate operational tasks and contribute to building a robust and resilient operational framework for our AI initiatives
Key Responsibilities
Deployment Infrastructure Management
oDeploy configure and manage AI models agentic systems and supporting infrastructure in cloud eg GCP and onpremise environments
oImplement and maintain CICD pipelines for AIML models and agentic applications MLOpsAgent Ops
oManage and optimize cloud resources ensuring costeffectiveness and scalability for AI workloads
oCollaborate with infrastructure teams to ensure network storage and compute resources meet the demands of AI systems
Monitoring Logging ing
oDevelop and implement comprehensive monitoring logging and ing solutions for AI agents and infrastructure to ensure high availability and performance
oProactively identify and address potential issues performance bottlenecks and anomalies in production AI systems
oTrack key operational metrics and create dashboards for system health and performance
Incident Response Troubleshooting
oProvide operational support for production AI systems including incident response root cause analysis and resolution of technical issues
oDevelop and maintain runbooks and standard operating procedures for common operational tasks and incident management
oParticipate in oncall rotations as needed to support critical AI services
Automation Operational Excellence
oAutomate routine operational tasks deployment processes and system maintenance activities using scripting eg Python Bash and automation tools
oContribute to the development and enforcement of operational best practices security standards and compliance requirements for AI systems
oWork with development teams to improve the deployability manageability and observability of AI applications
Collaboration Documentation
oCollaborate effectively with AI developers data scientists AI architects and other stakeholders to ensure smooth transitions from development to production
oMaintain clear and comprehensive documentation for system configurations operational procedures and troubleshooting guides
oProvide feedback to development teams on operational aspects and system performance
Required Qualifications Experience
Bachelors degree in Computer Science Information Technology Engineering or a related technical field
47 years of experience in a MLOps or Agent Ops role preferably supporting AIML or dataintensive applications
Handson experience with cloud computing platforms eg Google Cloud Platform especially Vertex AI and managing cloudbased infrastructure
Proficiency in scripting languages such as Python Bash or PowerShell for automation
Experience with CICD tools and practices eg Bitbucket GitLab CI GitHub Actions
Familiarity with containerization technologies eg Docker Kubernetes and orchestration
Experience with monitoring and logging tools eg Prometheus Grafana ELK Stack Datadog Google Cloud Monitoring Langfuse
Understanding of networking concepts security best practices and infrastructureascode IaC principles eg Terraform Ansible
Strong troubleshooting and problemsolving skills with an analytical mindset
Excellent communication skills and ability to work collaboratively in a team environment
A proactive approach to identifying and resolving issues and improving system reliability
Preferred Qualifications Experience
Masters degree in a relevant field
Specific experience in MLOps or Agent Ops including deploying and managing machine learning models or large language model applicatio
Job role
Work location
Navi Mumbai
Department
IT & Information Security
Role / Category
IT Security
Employment type
Full Time
Shift
Day Shift
Job requirements
Experience
Min. 4 years
About company
Name
Larsen & Toubro Infotech Ltd (LTI)
Job posted by Larsen & Toubro Infotech Ltd (LTI)
Apply on company website