Collaborate with cross-functional teams to understand the unique technical requirements of various AI research projects and translate them into infrastructure design and implementation plans.
Lead the configuration and deployment of hardware resources.
Design and implement software stacks and tools required for AI research, ensuring seamless integration with the infrastructure.
Oversee the implementation and management of virtualization and containerization technologies to optimize resource utilization.
Collaborate with security and compliance teams to enforce data security measures within the AI laboratory environment.
Monitor the performance, availability, and scalability of the AI infrastructure, proactively identifying and addressing any issues.
Stay updated on emerging trends and advancements in AI hardware and software, providing recommendations for continuous improvement.
Mentor and guide junior engineers, providing technical leadership, knowledge sharing, and fostering a culture of innovation.
Collaborate with external vendors and partners to evaluate new technologies and solutions that enhance the AI infrastructure.
Maintain thorough documentation of infrastructure configurations, processes, and best practices.
Implement automation and orchestration solutions to streamline infrastructure provisioning and management.
Contribute to strategic planning for the expansion and enhancement of the AI laboratory\'s infrastructure capabilities.
Requirements
Bachelor\'s degree in Computer Science, Information Technology, or a related field.
[5+] years of experience in designing, deploying, and managing infrastructure for AI research or related technical fields.
Familiarity with virtualization, containerization, and cloud computing technologies.
Expertise in network configuration, security protocols, and compliance measures in technical environments.
Strong knowledge of hardware components used in AI research, including GPUs, TPUs, and other accelerators.
Proficiency in AI software frameworks and tools (e.g., TensorFlow, PyTorch) and their integration with infrastructure is preferred.
Proven track record of implementing automation and orchestration tools to enhance infrastructure operations.
Strong problem-solving skills and the ability to troubleshoot complex technical issues.
Excellent teamwork and communication skills, with the ability to collaborate effectively across disciplines.
Project management experience, including managing timelines, resources, and deliverables
Shortlisted candidates will be offered a 2 Years Direct Contract employment.