Job Description
Location: Santa Ana, CA
Position Type: Contract to Hire
Salary/Pay: $65 - $70 per hour
We are unable to sponsor at this time
Our client, a renowned Fortune 500 firm consistently ranked among Fortune 100's best companies to work for, offers exciting career growth prospects, a favorable work-life balance, and a renowned work culture. If you're a Senior Site Reliability Engineer seeking these benefits, this role may be a perfect fit for you. They are seeking to hire a Senior Site Reliability Engineer on a contract-to-hire basis.
As a Senior Site Reliability Engineer, your responsibilities include:
- Monitoring and assessing the availability and health of systems and environments, making recommendations to improve services.
- Developing and implementing monitoring and recovery tools to ensure optimal delivery and resilience.
- Collaborating with partner groups to establish Service Level Objectives, Indicators, and Error Budgets.
- Providing expert operational support and engineering for multiple large-scale distributed software applications.
- Leading the development and implementation of departmental automation processes and procedures.
- Acting as a technical point of contact for internal and external customers, offering guidance and support for application and service delivery.
- Advising development and engineering teams on automation and optimization of service availability, scalability, performance, monitoring, and alerting.
- Participating in technical evaluations and proof of concept programs to evaluate and introduce new technologies and tools.
- Being available for on-call support during off-duty hours on a rotating schedule, including weekends and holidays
To be successful in this role, you should have the following:
- Strong understanding of cloud services and architecture, including AWS and Azure.
- Experience with distributed systems (architectures, micro-services, high availability) and proficiency in large-scale enterprise environments.
- Knowledge of container computing, including Docker, Kubernetes, and Service Mesh.
- Ability to build and configure AWS and Azure services, such as LAMBDA and Azure Functions.
- Understanding of proxies and load balancing, including Nginx, HAProxy, and Envoy.
- In terms of monitoring and tools:
- Knowledge of log event aggregation, metric collection, application monitoring, and event handling, including Elastic, SCOM, AppD, Uptrends, AppInsights, and Cloudwatch.
- Strong proficiency in Windows and UNIX/Linux technologies, as well as network triaging, packet loss, and routing.
- Ability to create Service Level Objectives (SLO), Service Level Indicators (SLI), Error Budgeting, and Burn Rates.
- For development:
- Knowledge of "everything as code" methodologies for configuration, infrastructure, and orchestration.
- Familiarity with programming languages, such as .Net, C#, C++, and Python.
- Experience with continuous integration tools, including Chef, Ansible, Jenkins, and Stash/Git.
- Experience with configuration management tools, such as Puppet, Hiera, Terraform, Terragrunt, and Ansible.
- Ability to use scripting languages or other tools for workflow automation.
- Additionally:
- Strong analytical and problem-solving skills to troubleshoot infrastructure issues, potentially across multiple technical disciplines.
- Ability to work effectively as a member of a multi-cultural, multi-location team.
So if you are a Senior Site Reliability Engineer looking for a new role with an outstanding company, apply today!