Company

Hewlett Packard Enterprise Development LPSee more

addressAddressSpring, TX
type Form of workFull-Time
CategoryEngineering/Architecture/scientific

Job description

HPC SRE - Sr Site Reliability Engineer
This role has been designated as 'Remote/Teleworker', which means you will primarily work from home.
Who We Are:
Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today's complex world. Our culture thrives on finding new and better ways to accelerate what's next. We know diverse backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE.
Job Description:
HPE is seeking a Senior HPC Systems Engineers to join our AI Cloud team as a Systems Reliability Engineer to build, test and administer large scale HPC clusters in support of the AI Cloud business. This is an exciting opportunity to have a significant impact on a key business with considerable growth potential. In this role, you will have a great deal of creative freedom to define and develop solutions that will support a scaling customer base.
This role can be performed onsite or remotely within the US.
Primary Responsibilities

  • Administer High-Performance Computing infrastructure composed of Linux systems ranging from the world's most powerful compute, storage, and high-speed networking technologies.
  • Maintain the configuration of our resource management system (SLURM) to keep resource allocation efficient and aligned with organizational priorities.
  • Automate configuration management, software updates, and maintenance of system availability using modern DevOps tools (Ansible, Gitlab, etc.)
  • Plan and maintain new systems that support the HPC Software Stack and highly-scalable communication protocols (e.g. MPI).
  • Work directly with developers and hardware architects to debug issues, identify new requirements, and improve workflows
  • Actively communicate with users and management regarding resource planning and allocation.
  • Ensure continuous uptime of HPC systems at large scale
  • Help design and implement security aspects of the computing infrastructure
  • Collaborates with project managers and development partners to ensure effective and efficient delivery, deployment, operation, monitoring, and support of HPC engagements

Required Qualifications:
  • Bachelor's or master's degree in a Computer Science or equivalent
  • 4-8 years of experience in large-scale HPC systems administration and deployment
  • Experience deploying and managing high-speed networks such as SlingShot and InfiniBand.
  • Experience deploying and managing parallel storage technologies.
  • Experience deploying and maintaining HPC software stacks and scheduling frameworks.
  • Deep knowledge of CPU, GPU, memory and storage hardware for HPC systems.
  • An understanding of the security concerns in a cloud environment
  • Strong analytical and problem-solving skills and good communication skills, including English

Preferred Qualifications:
  • Experience in high-performance computing applications and workflows
  • Experience with composable supercomputing platforms
  • Experience in leading a team to develop extremely large scale (>Petascale) HPC systems

Additional Skills:
Cloud Architectures, Cross Domain Knowledge, Design Thinking, Development Fundamentals, DevOps, Distributed Computing, Microservices Fluency, Full Stack Development, Security-First Mindset, User Experience (UX)
What We Can Offer You:
Health & Wellbeing
We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.
Personal & Professional Development
We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have - whether you want to become a knowledge expert in your field or apply your skills to another division.
Diversity, Inclusion & Belonging
We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know diverse backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good.
Let's Stay Connected:
Follow @HPECareers on Instagram to see the latest on people, culture and tech at HPE.
#unitedstates
#highperformancecompute, #Hplabs, #technologyandsoftware
Job:
Engineering
Job Level:
Expert
States with Pay Range Requirement
The expected salary/wage range for a U.S.-based hire filling this position is provided below. Actual offer may vary from this range based upon geographic location, work experience, education/training, and/or skill level. If this is a sales role, then the listed salary range reflects combined base salary and target-level sales compensation pay. If this is a non-sales role, then the listed salary range reflects base salary only. Variable incentives may also be offered. Information about employee benefits offered can be found at https://myhperewards.com/main/new-hire-enrollment.html.
Annual Salary: $95,100.00 - $218,700.00
HPE is an Equal Employment Opportunity/ Veterans/Disabled/LGBT and Affirmative Action employer. We are committed to diversity and building a team that represents a variety of backgrounds, perspectives, and skills. We do not discriminate and all decisions we make are made on the basis of qualifications, merit, and business need. Our goal is to be one global diverse team that is representative of our customers, in an inclusive environment where we can continue to innovate and grow together. Please click here: Equal Employment Opportunity.
Hewlett Packard Enterprise is EEO F/M/Protected Veteran/ Individual with Disabilities.
HPE will comply with all applicable laws related to employer use of arrest and conviction records, including laws requiring employers to consider for employment qualified applicants with criminal histories.
Refer code: 7199488. Hewlett Packard Enterprise Development LP - The previous day - 2023-12-17 16:56

Hewlett Packard Enterprise Development LP

Spring, TX
Jobs feed

Residential Aide

African American Planning Commission Inc

Brooklyn, NY

$45,000 a year

Home Health Aide HHA Manhattan, NY

Elara Caring

Brooklyn, NY

Direct Support Professional (DSP) Emotional Support

Progen Care

Brooklyn, NY

$23 - $27 an hour

PCA/Caregiver - Company Sponsored Training

Anchor Health Home Care

Staten Island, NY

$16 - $23 an hour

Live-Out Caregiver/HHA

Lifeworx

Forest Hills, NY

$21 - $24 an hour

In Home Caregiver

Addus Homecare

Brooklyn, NY

$18.55 an hour

Sugar Waxing Specialist

Sugaring Nyc

Hollywood, FL

Skin Care Specialist

Posh Nails Beauty & Spa

Fort Lauderdale, FL

$600 - $1,500 a week

Waxing Specialist/Cerologist

Waxing The City Of Denver Northfield

Denver, CO

$45,000 - $65,000 a year

Waxing Specialist / Esthetician

Sugaring Nyc

Lutz, FL

$18 - $30 an hour

Share jobs with friends

Related jobs

Hpc Sre - Sr Site Reliability Engineer

NASA AppDat Senior Site Reliability Engineer

Mri Technologies

Houston, TX

a month ago - seen

Site Reliability Engineer

Automox

Austin, TX

a month ago - seen

Site Reliability Engineer - entry level

Volusion

Austin, TX

a month ago - seen

Site Reliability Engineer III

Gm Financial

$97,500 - $185,400 a year

Arlington, TX

a month ago - seen

Senior Cloud Operations Engineer- Site Reliability Engineer Fully Remote

Fathom Management Llc

Buda, TX

2 months ago - seen

Site Reliability Engineer

Frontline Education

Austin, TX

2 months ago - seen

Principal Site Reliability Engineer (Irving, TX)

Gartner, Inc.

Irving, TX

2 months ago - seen

Application Support Site Reliability Engineer

Hitachi Careers

Dallas, TX

2 months ago - seen

Senior Site Reliability Engineer

Logicmonitor

Austin, TX

2 months ago - seen

Site Reliability Engineer Mid

Oscarmike

Irving, TX

2 months ago - seen

Systems Design Engineer - Site Reliability Engr

Advanced Micro Devices, Inc

Austin, TX

2 months ago - seen

Site Reliability Cloud Engineer

Ibm Careers

Austin, TX

2 months ago - seen

Software Engineer (Site Reliability) Operations Lead, Enterprise Systems

Machine Learning And Ai

Austin, TX

3 months ago - seen

Site Reliability Engineer

Pelago

$140,000 - $180,000 a year

West, TX

4 months ago - seen

Senior Site Reliability Engineer (Remote) Spectrum

Evergreen Technologies, LLC.

Fort Worth, TX

4 months ago - seen

[HYBRID] Site Reliability Engineer (AWS & IaC exp req)

#twiceasnice Recruiting

Arlington, TX

5 months ago - seen