Title: Senior Site Reliability Engineer
JOB SUMMARY:
This position is responsible for design, development and implementation of cloud-based technologies. Provide technical expertise on complex projects and advanced troubleshooting of existing Cloud technology for use by department. Such as guidance and support in the development of progress at all system layers, including data, processing, and back-end systems.
MAJOR DUTIES AND RESPONSIBILITIES
- Performs implementation of software solutions to improve reliability and observability
- Performs technical implementations for our CaaS / PKS platform
- Experience with migrating workloads from on-prem / off-prem cloud
- Recommend settings for applications, operating systems, networks, and cloud services to improve performance, security, and reliability
- Collaborate with a growing team of Cloud Engineers and the Cloud Ops team to develop and support Charter IT Cloud Strategy
- Responsible for the implementation of security best practices and initiatives throughout all layers of the Cloud model
- Operate in an on-call environment
- Actively and consistently supports all cloud efforts to simplify and enhance the customer experience
- Analyze, troubleshoot and resolve system, software, network, and storage failures for a globally distributed cloud infrastructure
- Accountable to help define and drive out best practices for monitoring, security, and platform reliability
REQUIRED QUALIFICATIONS
- Ability to read, write, speak and understand English
- Proven experienced with the VMWare suite of products
- Proven experienced with managing both physical and Virtual infrastructure
- Experienced with multiple operating systems (e.g. Windows and Linux)
- Hands-on experience in one or more of cloud computing services (e.g. AWS, Microsoft Azure, Google Cloud Platforms, IBM, etc.)
- Familiar knowledge hands-on experience with a variety of cloud service models (e.g. Private, Public, Multi-Cloud)
- Ability scripting in one or more languages (e.g. Python, Shell, PowerShell, Ansible or Perl)
- Familiar with CI/ CD experience with Puppet, Ansible, Jenkins
- Experience managing monitoring and alerting tools
- Familiar with containerized workloads (e.g. Kubernetes, Openshift, TKGI)
- Experienced with firewalls, routing and load balancing
- Skilled in troubleshooting methodologies
- Must have excellent written and oral communications, including technical documents, and process documents.
- Requires attention to detail and excellent organizational skills
- Ability to contribute independently as well as be a team player
- Experience managing small projects
- Self-starter, ability to manage tasks with little supervision
Education:
Bachelor's degree in Computer Science or related field, or equivalent experience
Related Work Experience Number of Years
- Network experience 4+
- System Administration experience 4+
- Troubleshooting 4+
- Container Services 1+
- Scripting 2+
PREFERRED QUALIFICATIONS
- Related Work Experience Number of Years
- VMware System Administration experience 5+
- TKGI Enterprise Pivotal Container Services 2+
- VMware NSX-T 2+
- vROPs, Log Insight, vRNI, vRIL 3+
- Cisco networking 3+
- Firewall configuration management 3+
- Load Balancer configuration management 3+
Education
- Bachelor's degree in Computer Science or related field, or equivalent experience