AI Data Center Engineer Pune, India | Experience: 3β8 Years
AI Data Center Engineer
π Pune, India | π Experience: 3β8 Years
π About the Role
We are hiring an AI Data Center Engineer for a fast-growing, technology-driven organization working on next-generation AI infrastructure.
This role sits at the intersection of high-performance computing (HPC), GPU infrastructure, and advanced networking, enabling large-scale AI/ML workloads in production environments.
If you enjoy building systems from rack to runtime β and solving complex performance challenges β this is a high-impact opportunity.
π§ Key Responsibilities
- Lead and support AI data center deployments, including rack integration, cabling, and hardware setup
- Perform GPU server commissioning, validation, and performance benchmarking
- Deploy and manage InfiniBand fabrics for high-throughput, low-latency workloads
- Configure and maintain Spectrum Ethernet networking (NVIDIA/Mellanox)
- Troubleshoot hardware, networking, and performance issues across GPU clusters
- Collaborate with Cloud, ML, and DevOps teams to ensure infrastructure readiness
- Validate cluster performance for AI/ML training and inference workloads
- Maintain documentation for deployments, configurations, and troubleshooting
- Ensure adherence to data center best practices (power, cooling, redundancy)
π― Required Skills & Experience
- Bachelorβs degree in Computer Science, Electrical Engineering, or related field
- 3+ years of experience in data center, HPC, or infrastructure environments
- Hands-on experience with GPU platforms (NVIDIA A100 / H100 or similar)
- Strong expertise in InfiniBand (RDMA, fabric setup, tuning)
- Experience with Spectrum switches (NVIDIA/Mellanox Ethernet)
- Solid Linux administration skills (RHEL / Ubuntu)
- Strong understanding of networking fundamentals (TCP/IP, VLANs, routing, QoS)
- Exposure to cluster orchestration tools (Kubernetes / Slurm is a plus)
β Preferred Skills
- Experience with AI/ML infrastructure environments
- Knowledge of NVLink, GPUDirect, NCCL tuning
- Familiarity with automation tools (Ansible, Terraform, Python scripting)
- Experience in hyperscale or cloud environments
- Exposure to monitoring tools (Prometheus, Grafana)
π§ Key Competencies
- Strong troubleshooting and problem-solving skills
- Ability to work in fast-paced, high-availability environments
- Excellent documentation and communication skills
- Team-oriented mindset with cross-functional collaboration
π Certifications (Good to Have)
- NVIDIA Certified Professional (NCP)
- CCNA / CCNP or equivalent
- Linux certifications (RHCE or similar)
π‘ Why This Role?
- Work on cutting-edge AI infrastructure projects
- Exposure to GPU clusters, HPC networking, and large-scale deployments
- Opportunity to collaborate with high-impact engineering teams
- Be part of building future-ready AI platforms
π© Apply Now
If you have hands-on experience in GPU infrastructure, InfiniBand networking, and data center environments, we would love to hear from you. Get directionΒ
π Share your profile at: info@orbitconsultancy.in | 8810363997
π Visit: orbitconsultancy.in
