Nvidia Graphics Pvt Ltd

DevOps Engineer - High Performance Computing and Job Scheduler Management

Nvidia Graphics Pvt Ltd
Bengaluru/Bangalore
Not disclosed
Work from OfficeWork from Office
Full TimeFull Time
Min. 3 yearsMin. 3 years

Job Description

DevOps Engineer, HPC and LSF

NVIDIA is the leader in AI, machine learning and datacenter acceleration. NVIDIA is expanding that leadership into datacenter networking with ethernet switches, NICs and DPUs NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing.

As a member of the Hardware Infrastructure Farm team, you will provide leadership in the design and implementation of ground breaking compute clusters that powers all silicon development across NVIDIA. We seek an expert to build and operate these clusters at high reliability, efficiency, and performance and drive foundational improvements and automation to improve engineer's productivity. As a Site Reliability Engineer, you are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a broad spectrum of problems. Practices such as limiting time spent on reactive operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting dynamic day-to-day work. SRE's culture of diversity, intellectual curiosity, problem solving and openness is important to our success.

What you’ll be doing:

  • Manage and support workload and resource schedulers in a large-scale HPC environment.

  • Automate Everything: Develop automation scripts to automate deployment, configuration management, and operational monitoring.

  • Develop solutions for complex computing resource management requirements.

  • Extract and leverage grid performance metrics for troubleshooting and performance optimization.

  • Troubleshoot Complex Issues: Perform comprehensive troubleshooting from bare metal to application level, ensuring system reliability and efficiency.

  • Develop, define and document standard methodologies to share with internal teams.

  • Collaborate with domain experts to improve how our chip development process utilizes our infrastructure.

  • Directly contribute to the overall quality and improve time to market for our next generation chips.

What we need to see:

  • Extensive knowledge with job scheduler administration (e.g. IBM Spectrum LSF or SLURM).

  • Proficient in administering Centos/RHEL Linux distributions.

  • In depth understating of container technologies like Docker.

  • Proficiency in UNIX scripting languages and Python.

  • Excellent problem-solving skills, with the ability to analyze complex systems, identify bottlenecks, and implement scalable solutions.

  • Excellent communication and teamwork skills, with the ability to work effectively with diverse teams and individuals.

  • 3+ years experience in a large, distributed Linux environment.

  • BS in Computer Science, similar degree or equivalent experience.

Ways to stand out from the crowd:

  • Experience analyzing and tuning performance for a variety of HPC or EDA workloads.

  • Solid understanding of cluster configuration managements tools such as Ansible.

  • Proficiency in Perl for maintaining legacy automation scripts.

  • Deep understanding of distributed system principles.

#LI-Hybrid

Experience Level

Mid Level

Job role

Work location
Work locationIndia, Bengaluru
Department
DepartmentSoftware Engineering
Role / Category
Role / CategoryDevOps
Employment type
Employment typeFull Time
Shift
ShiftDay Shift

Job requirements

Experience
ExperienceMin. 3 years

About company

Name
NameNvidia Graphics Pvt Ltd
Job posted by Nvidia Graphics Pvt Ltd

Similar jobs you can apply for

Software / Web Developer

Full Stack Java Developer

Optalon Hr Consultant Private Limited
HBR Layout, Bengaluru/Bangalore
₹50,000 - ₹66,667
Work from Office
Full Time
Min. 1 year
Good (Intermediate / Advanced) English
Sre Kateel Industries Private Limited

Quality Control Engineer

Sre Kateel Industries Private Limited
Hommadevanahalli, Bengaluru/Bangalore
₹28,000 - ₹35,000
Work from Office
Full Time
Min. 6 months
Basic English

Full Stack Web Developer

Tatvam Ai Labs Private Limited
Basavanagudi, Bengaluru/Bangalore
₹22,000 - ₹26,000
Work from Office
Full Time
Any experience
Basic English
Om Sai Building Solutions

Web Developer

Om Sai Building Solutions
Marathahalli, Bengaluru/Bangalore
₹15,000 - ₹40,000
Work from Office
Full Time
Min. 1 year
Good (Intermediate / Advanced) English
Big Basket

Quality Executive

Big Basket
Bengaluru/Bangalore
₹20,000 - ₹25,000
Work from Office
Full Time
Any experience
Basic English
Randstad India Private Limited

Engineering Trainee

Randstad India Private Limited
Electronics City, Bengaluru/Bangalore
₹20,000 - ₹21,500
Work from Office
Full Time
Freshers only
No English Required

You can expect a minimum salary of 0 INR. The salary offered will depend on your skills, experience and performance in the interview.

The candidate should have completed the required education and people who have 3 to 31 years are eligible to apply for this job. You can apply for more jobs in Bengaluru/Bangalore to get hired quickly.

The candidate should have sound communication skills and sound communication skills for this job.

Both Male and Female candidates can apply for this job.

No, it's not a work from home job and can't be done online. You can explore and apply for other work from home jobs in Bengaluru/Bangalore at apna.

No work-related deposit needs to be made during your employment with the company.

Go to the apna app and apply for this job. Click on the apply button and call HR directly to schedule your interview.

The last date to apply for this job is . For more details, download apna app and find Full Time jobs in Bengaluru/Bangalore . Through apna, you can find jobs in 64 cities across India. Join NOW!