Job Title: HPC System Admin
Location: Remote (Client is in Tuscaloosa Alabama) – Some travel will be there (Mainly initially)
Duration: 6 + Months
Job Description :
Qualifications:
Additional Skills Needed
Preferred Skills:
Location: Remote (Client is in Tuscaloosa Alabama) – Some travel will be there (Mainly initially)
Duration: 6 + Months
Job Description :
Qualifications:
- The HPC Systems Administrator will help manage daily operations of HPC clusters, research storage, and other infrastructure and technologies. Will also monitor, troubleshoot, and optimize HPC nodes.
- This position will be responsible for hands-on configuring and installing new nodes, storage, infrastructure components, and HPC software.
- HPC systems administration tasks and user support. Providing day-to-day systems administration for UA's HPC clusters, users, software, and research storage. Includes: Linux and HPC cluster management systems administration, user support, and troubleshooting hardware, software, and user issues.
- Installing and supporting new and repurposed HPC hardware (order, rack, cable, configure) for primary and smaller clusters.
- This role will develop and maintain HPC software and tools with the assistance of researchers and other engineers.
- Will also be responsible for assisting faculty and students with HPC use.
Additional Skills Needed
- Three years or more of experience of Linux systems administration (building software, run-time environment configuration, performance monitoring, and shell tools/scripting).
- Hands-on systems hardware/storage array installation and support experience.
- Experience compiling, configuring, and managing Linux-based software.
- Experience with Linux/Unix scripting (shell scripting, Perl, Python).
- Requires occasional travel to UA's HPC facility in Birmingham, AL for hands-on cluster tasks.
Preferred Skills:
- A bachelors' degree or higher in a technical field, like engineering, computer science, or other computational science.
- Experience using or supporting HPC clusters or experience supporting research computing in a scientific or technical environment.
- Experience using or supporting any of the following: GPUs, parallel programming techniques (MPI, OpenMPI), InfiniBand/Mellanox optics, cloud HPC, deploying common frameworks (scikit-learn, TensorFlow, Pytorch, etc.), or data transfer networks (Globus).
- Programming/scripting experience with Conda, MATLAB, C, Java, or Mathematica.