Company

Ibm CareersSee more

addressAddressAustin, TX
type Form of workFull-Time
CategoryReal Estate

Job description

We are looking for a dynamic Site Reliability Engineer to join our Cloud IaaS Operations Team in Austin, TX, who is responsive to market needs, to deliver value to our clients in a fast-changing cloud landscape. An SRE individual spends 50% time on toil and 50% on engineering projects. It requires full-stack systems thinking and coding skills, with app/service availability focus that is data-driven and AI including machine learning. The SRE team dedicated to ensuring that the IBM Cloud is at the forefront of cloud technology, from data center design, Storage & Network architecture, and compute clusters to flexible infrastructure services. We are operating IBM's cloud platform, building IBM's next generation cloud platform and VMware solutions to deliver performance and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency, and security. It is an exciting time, and as a team we are driven by this incredible opportunity to thrill our clients.
Primary Roles & Responsibilities:
In this Site Reliability Engineer role, you will work closely with several Data Centers, the entire Cloud organization and IBM vendors to support, maintain and operationally improve the IBM cloud infrastructure. You will focus on the following key responsibilities:
  • Monitor the health of production and test systems
  • Ability to respond promptly to production issues and alerts
  • Execute changes in the production environment through automation and AI
  • Partner with other SRE teams and program managers to deliver mission-critical services to the market
  • Support development of new and existing capabilities for our compute, storage, and network infrastructure services
  • Implement and automate infrastructure solutions that support IBM Cloud products and infrastructure
  • Support the compliance and security integrity of the environment
  • Automate health monitoring of the production and test systems
  • Automate return to service procedures for Cloud Service delivery
  • Support the compliance and security integrity of the environment through your work
  • Partner with other teams, functional managers, and program managers to deliver mission-critical services to the market
  • Creating power BI dashboards on historic and prediction data for client use case -should be involved in designing the process and implementation of key entities extraction from millions of unstructured files using python NLP techniques and Apache spark.
  • Expertise in Data Interpretation and Visualization skills
  • Define problems and opportunities in a complex business area
  • Develop advanced analytics products
  • Create and develop end-to-end data driven solutions to support and monitor the health of production and test systems
  • Extract data from multiple varied sources and integrate it for analytics and application development
  • Partner with other SRE teams and program managers to deliver mission-critical services to the market
  • Experience with machine learning engineering to develop self-running AI software to automate predictive models
  • Experience with designing machine learning systems and algorithms to generate accurate predictions.
  • Working knowledge with ServiceNow, JIRA, Confluence, and GitHub
  • Working knowledge with Container technologies: Kubernetes (preferred), Docker, etc.
  • Hands on knowledge of log aggregate software such as Splunk or Elk
  • Must have the ability to perform debugging and problem analysis by examining logs and running Unix commands
Work with Engineering to:
  • Provide initial assessment and possible workaround of production issue
  • Troubleshoot and resolve production issues
Work with Support and Development teams to:
  • Identify and resolve issues
  • Discuss and plan integration tasks
  • Provide technical escalation support for other Infrastructure Operations teams
Refer code: 8500672. Ibm Careers - The previous day - 2024-03-08 13:22

Ibm Careers

Austin, TX
Popular Site Reliability jobs in top cities
Jobs feed

SME Data Scientist

Prescient Edge

Tampa, FL

Data Scientist

Cfd Research Corporation

Florida, United States

Travel Nurse - Dyersville, United States - Fusion Medical Staffing

Fusion Medical Staffing

United States

Data Engineer

Software And Services

Cupertino, CA

Travel Nurse - Seward, United States - NationWide Therapy Group

Nationwide Therapy Group

United States

System Design Engineer

Hardware

Cupertino, CA

Wellness Worker - South Central Region - Now Hiring

Labcorp

Corpus Christi, TX

Project Engineer Data Manager Data Scientist

Spero Technology

Tampa, FL

Product Quality Manager - iPad & Accessories

Operations And Supply Chain

Cupertino, CA

Share jobs with friends

Related jobs

Site Reliability Cloud Engineer

NASA AppDat Senior Site Reliability Engineer

Mri Technologies

Houston, TX

3 weeks ago - seen

Site Reliability Engineer

Automox

Austin, TX

3 weeks ago - seen

Site Reliability Engineer - entry level

Volusion

Austin, TX

4 weeks ago - seen

Site Reliability Engineer III

Gm Financial

$97,500 - $185,400 a year

Arlington, TX

4 weeks ago - seen

Senior Cloud Operations Engineer- Site Reliability Engineer Fully Remote

Fathom Management Llc

Buda, TX

a month ago - seen

Site Reliability Engineer

Frontline Education

Austin, TX

2 months ago - seen

Principal Site Reliability Engineer (Irving, TX)

Gartner, Inc.

Irving, TX

2 months ago - seen

Application Support Site Reliability Engineer

Hitachi Careers

Dallas, TX

2 months ago - seen

Senior Site Reliability Engineer

Logicmonitor

Austin, TX

2 months ago - seen

Site Reliability Engineer Mid

Oscarmike

Irving, TX

2 months ago - seen

Systems Design Engineer - Site Reliability Engr

Advanced Micro Devices, Inc

Austin, TX

2 months ago - seen

Software Engineer (Site Reliability) Operations Lead, Enterprise Systems

Machine Learning And Ai

Austin, TX

3 months ago - seen

Site Reliability Engineer

Pelago

$140,000 - $180,000 a year

West, TX

3 months ago - seen

Senior Site Reliability Engineer (Remote) Spectrum

Evergreen Technologies, LLC.

Fort Worth, TX

4 months ago - seen

Site Reliability Engineering Intern

Copart

Dallas, TX

4 months ago - seen