Site Reliability Cloud Engineer

Company	Ibm CareersSee more
Address	Austin, TX
Form of work	Full-Time
Category	Real Estate

Job description

We are looking for a dynamic Site Reliability Engineer to join our Cloud IaaS Operations Team in Austin, TX, who is responsive to market needs, to deliver value to our clients in a fast-changing cloud landscape. An SRE individual spends 50% time on toil and 50% on engineering projects. It requires full-stack systems thinking and coding skills, with app/service availability focus that is data-driven and AI including machine learning. The SRE team dedicated to ensuring that the IBM Cloud is at the forefront of cloud technology, from data center design, Storage & Network architecture, and compute clusters to flexible infrastructure services. We are operating IBM's cloud platform, building IBM's next generation cloud platform and VMware solutions to deliver performance and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency, and security. It is an exciting time, and as a team we are driven by this incredible opportunity to thrill our clients.
Primary Roles & Responsibilities:
In this Site Reliability Engineer role, you will work closely with several Data Centers, the entire Cloud organization and IBM vendors to support, maintain and operationally improve the IBM cloud infrastructure. You will focus on the following key responsibilities:

Monitor the health of production and test systems
Ability to respond promptly to production issues and alerts
Execute changes in the production environment through automation and AI
Partner with other SRE teams and program managers to deliver mission-critical services to the market
Support development of new and existing capabilities for our compute, storage, and network infrastructure services
Implement and automate infrastructure solutions that support IBM Cloud products and infrastructure
Support the compliance and security integrity of the environment
Automate health monitoring of the production and test systems
Automate return to service procedures for Cloud Service delivery
Support the compliance and security integrity of the environment through your work
Partner with other teams, functional managers, and program managers to deliver mission-critical services to the market
Creating power BI dashboards on historic and prediction data for client use case -should be involved in designing the process and implementation of key entities extraction from millions of unstructured files using python NLP techniques and Apache spark.
Expertise in Data Interpretation and Visualization skills
Define problems and opportunities in a complex business area
Develop advanced analytics products
Create and develop end-to-end data driven solutions to support and monitor the health of production and test systems
Extract data from multiple varied sources and integrate it for analytics and application development
Partner with other SRE teams and program managers to deliver mission-critical services to the market
Experience with machine learning engineering to develop self-running AI software to automate predictive models
Experience with designing machine learning systems and algorithms to generate accurate predictions.
Working knowledge with ServiceNow, JIRA, Confluence, and GitHub
Working knowledge with Container technologies: Kubernetes (preferred), Docker, etc.
Hands on knowledge of log aggregate software such as Splunk or Elk
Must have the ability to perform debugging and problem analysis by examining logs and running Unix commands

Work with Engineering to:

Provide initial assessment and possible workaround of production issue
Troubleshoot and resolve production issues

Work with Support and Development teams to:

Identify and resolve issues
Discuss and plan integration tasks
Provide technical escalation support for other Infrastructure Operations teams

Refer code: 8500672. Ibm Careers - The previous day - 2024-03-08 13:22

Site Reliability Cloud Engineer

Ibm CareersSee more

Job description

SME Data Scientist

Data Scientist

Travel Nurse - Dyersville, United States - Fusion Medical Staffing

Data Engineer

Travel Nurse - Seward, United States - NationWide Therapy Group

System Design Engineer

Wellness Worker - South Central Region - Now Hiring

Administrative and Technical Support Specialist

Project Engineer Data Manager Data Scientist

Product Quality Manager - iPad & Accessories

Related jobs

Site Reliability Cloud Engineer

NASA AppDat Senior Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer - entry level

Site Reliability Engineer III

Senior Cloud Operations Engineer- Site Reliability Engineer Fully Remote

Site Reliability Engineer

Principal Site Reliability Engineer (Irving, TX)

Software Engineer (Site Reliability) Operations Lead, Enterprise Systems

Application Support Site Reliability Engineer

Senior Site Reliability Engineer

Site Reliability Engineer Mid

Systems Design Engineer - Site Reliability Engr

Senior Principal Consultant-Site Reliability Engineering/DevOps Lead

Software Engineer (Site Reliability) Operations Lead, Enterprise Systems

Staff Software Engineer - Site Reliability (Hybrid/Onsite)

Site Reliability Engineer

Senior Site Reliability Engineer (Remote) Spectrum

Site Reliability Engineering Intern

Site Reliability Cloud Engineer

Ibm CareersSee more

Job description

Share jobs with friends

Related jobs

Site Reliability Cloud Engineer

Explore trending job searches in the United States

Top States

Top Cities

Top Job Titles

Highest Paying Jobs