Senior Systems Software Engineer C++ DCGM

Viewed 0 times

Job Description

NVIDIA is looking for outstanding software engineers to work on NVIDIA’s Data Center GPU Manager (DCGM) software. In this role you will work closely with the broader NVIDIA team to design and build Linux-based management agents, CLI tools and end-to-end integration solutions that combine GPUs with the rest of the data center software management ecosystem. We are focused on supporting NVIDIA products across HPC, cloud and enterprise on both bare metal and virtualized platforms as the role of GPUs in all of these environments expands rapidly. Your contributions will span many aspects of GPU system integration, including telemetry and metrics, health checks, diagnostics, configuration, accounting and policy.

These tools fill roles of both passive background monitoring and active online management with a core emphasis on operational transparency and seamless integration in customer environments. Your code will support single node developer systems through large clusters with thousands of nodes. To be successful, you will need to have a strong Linux C/C++ background, familiarity with distributed software development, and a proven work ethic. You will be expected to jump in quickly and provide important contributions from day one.

This is a dynamic work environment with many exciting opportunities awaiting. NVIDIA GPUs are central to many hot trends in the enterprise, cloud and datacenter. Come join us as we craft the future of accelerated computing and AI!. What you’ll be doing: Develop robust, scalable C++ user space data center management system software under Linux Build and maintain user-space libraries, agents, plugins, bindings and CLI tools Enable GPU management integration with the OSS ecosystem, including Kubernetes and Docker Support internal and external users through bug fixes, documentation and feature improvements Maintain high quality products through robust test coverage and smart design What we need to see: BS or higher in Computer Science or equivalent experience. 5+ years of meaningful industry experience with a strong C++ development background Familiarity with modern C++ standards (C++17/C++20).

User space development and debugging expertise under Linux environments Experience with APIs and interface design. Experience with IPC and Multi-threading Outstanding written and verbal interpersonal skills Strong motivation and commitment to learn new skills Ability to implement all aspects of the software development lifecycle Ability to manage time in a fast, heavily multitasked environment Experience writing unit and system tests to ensure the correctness of fixes and new features Ways to stand out from the crowd: Development experience with Python, Go, and Rust. Experience with Jenkins and GitHub/GitLab CI/CD pipelines. Experience with containers, common orchestration frameworks and common logging/telemetry backends Experience with APIs and interface design.

Exposure to GPU programming with CUDA. Experience with enterprise software development. Experience with cross-language interfaces (FFI, swig, etc.) in Go (CGO), Python, and Rust. Experience with metrics gathering/monitoring best practices.

Experience with Open Telemetry, Prometheus, Grafana, DataDog, etc. Good understanding of extensive distributed systems and data-center operations/limitations. NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us.

If you’re creative and autonomous, we want to hear from you!. The base salary range is 148,000 USD – 276,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. You will also be eligible for equity and benefits.

NVIDIA accepts applications on an ongoing basis. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. NVIDIA is a Learning Machine NVIDIA pioneered accelerated computing to tackle challenges no one else can solve.

Our work in AI and the metaverse is transforming the world’s largest industries and profoundly impacting society. Learn more about NVIDIA.


Job Summary

US, WA, Redmond Location

Similar Jobs

The largest community on the web to find and list jobs that aren't restricted by commutes or a specific location.