The System engineer is responsible in specializing in High-Performance Computing (HPC), you will be a key contributor to the design, implementation, and optimization of complex computational systems. Leveraging your expertise in HPC technologies, you will collaborate with cross-functional teams to ensure the seamless integration and performance of high-performance computing environments.
System Design and Implementation:
Design, implement, and maintain high-performance computing systems to meet the organization\'s computational needs.
Collaborate with stakeholders to understand performance requirements and hardware specifications.
Parallel Computing:
Implement and optimize parallel computing techniques to enhance system performance.
Leverage parallel programming languages and frameworks for efficient task execution.
Cluster Management:
Manage and optimize HPC clusters, ensuring scalability and reliability.
Implement and maintain cluster management tools for efficient resource utilization.
Performance Tuning:
Analyze and fine-tune system configurations, hardware, and software for optimal performance.
Identify and resolve performance bottlenecks in HPC applications.
Job Scheduling:
Utilize job scheduling systems to allocate computational resources and manage workloads efficiently.
Collaborate with users to understand job requirements and prioritize computing tasks.
Networking and Interconnects:
Configure and optimize high-speed interconnects, such as InfiniBand, for fast data transfer between nodes.
Collaborate with network administrators to ensure seamless communication within HPC environments.
Distributed File Systems:
Implement and manage distributed file systems for efficient data storage and retrieval.
Optimize data access and transfer mechanisms to support large-scale computations.
Fault Tolerance and Reliability:
Implement strategies for fault tolerance to ensure system reliability during long-running computations.
Troubleshoot and resolve system issues to minimize downtime.
Documentation:
Create and maintain detailed documentation of HPC system configurations, processes, and best practices.
Develop user guides and training materials for HPC users.
Stay Updated:
Keep abreast of emerging trends and advancements in HPC technologies.
Evaluate and recommend new hardware and software solutions to enhance system capabilities.
REQUIREMENTS
Bachelor\'s or master\'s degree in computer science, Information Technology, or a related field.
Proven experience as a Systems Engineer with a focus on High-Performance Computing.
Knowledge of HPC architectures, technologies, and parallel programming languages.
Technical Proficiency:
Familiarity with cluster management tools, job scheduling systems, and distributed file systems.
Experience with high-speed interconnects (e.g., InfiniBand) and networking in HPC environments.
Problem-Solving Skills:
Strong analytical and problem-solving skills to address complex HPC challenges.
Communication:
Excellent communication and collaboration skills to work effectively in interdisciplinary teams.