Google India Pvt Ltd

Software Engineer - TPU Infrastructure

Google India Pvt Ltd
Hyderabad
Not disclosed
Work from OfficeWork from Office
Full TimeFull Time
Min. 2 yearsMin. 2 years

Job Description

Software Engineer, TPU Infrastructure, Google Cloud

Minimum qualifications:

  • Bachelor’s degree or equivalent practical experience.
  • 2 years of experience in backend Infrastructure development.
  • Experience in general purpose coding languages like C++, Go, or Python development.
  • Experience with algorithms, data structures, software development, and distributed computing.

Preferred qualifications:

  • Experience designing reliable, fault-tolerant and high performance distributed systems.
  • Experience with building cloud based services ideally with GCP.
  • Experience with large-scale distributed systems or Machine Learning (ML) systems (training and serving for computer vision, speech recognition, natural language processing, machine translation models).
  • Experience with reliability, large-scale distributed systems, Go, Google Cloud Platform, tensor processing unit (TPU), and service level objectives.

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

The TPU Infra team is the engine behind Google’s AI Hypercomputer, responsible for the technical strategy and execution of the Machine Learning (ML) Compute IaaS platforms.


In this role, you will be architecting, implementing, and leading the infrastructure software solutions that manage the massive global fleet.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.


In this role, you will be architecting, implementing, and leading the infrastructure software solutions that manage the massive global fleet.

Responsibilities

  • Design and build scalable software capabilities to manage the availability, scheduling, and reliability of the Cloud TPU Hypercomputer stack (VMs, Networking, Storage, GKE etc.).
  • Architect infrastructure solutions to ensure industry-leading availability guarantees for large-scale training and inference workloads.
  • Develop telemetry and tooling to establish service level objectives (SLO) and service level agreements (SLA), and to enable rapid debugging of complex infrastructure issues across the fleet.
  • Collaborate with platform, hardware, networking, and SRE teams to scale and manage accelerator capacity, including new TPU generations, ensure a seamless experience for customers.
  • Design and implement reliable ML infrastructure that enables training and serving cutting edge models at massive scale, troubleshoot complex distributed system issues across the stack (hardware, kernel, network), build the automation, tooling, and telemetry needed to turn operational findings into permanent software fixes and improved SLOs.
Design and build scalable software capabilities to manage the availability, scheduling, and reliability of the Cloud TPU Hypercomputer stack (VMs, Networking, Storage, GKE etc.).

Experience Level

Mid Level

Job role

Work location
Work locationHyderabad, Telangana, India
Department
DepartmentSoftware Engineering
Role / Category
Role / CategorySoftware Development
Employment type
Employment typeFull Time
Shift
ShiftDay Shift

Job requirements

Experience
ExperienceMin. 2 years

About company

Name
NameGoogle India Pvt Ltd
Job posted by Google India Pvt Ltd

Similar jobs you can apply for

IT Support
SJCS Technologies

Technical Team Lead

SJCS Technologies
Begumpet, Hyderabad
₹35,000 - ₹45,000
Work from Office
Full Time
Min. 2 years
Basic English
Tapasya College of Commerce and Management

Junior Full Stack Developer

Tapasya College of Commerce and Management
Madhapur, Hyderabad
₹20,000 - ₹25,000
Work from Office
Full Time
Any experience
Good (Intermediate / Advanced) English
RMG Flexipack

Digital Marketing Executive

RMG Flexipack
Jubilee Hills, Hyderabad
₹20,000 - ₹30,000
Work from Office
Full Time
Min. 1 year
Good (Intermediate / Advanced) English
Rudhra Constructions Private Limited

Quality Control Engineer

Rudhra Constructions Private Limited
Nizampet, Hyderabad
₹18,000 - ₹22,000
Work from Office
Full Time
Min. 1 year
Basic English

Software Developer

Tech Visionaries
Hyderabad
₹35,000 - ₹45,000
Work from Office
Full Time
Freshers only
Good (Intermediate / Advanced) English

Software Engineer

Mindwave Infomatics
Banjara Hills, Hyderabad
₹30,000 - ₹60,000
Work from Office
Full Time
Any experience
Good (Intermediate / Advanced) English