Job Description
Additional Job Details:
- At least 5 years in a Reliability Engineering, DevOps or infrastructure focused role
- Advanced experience with programming languages (GoLang, Python, Java) Passion for designing and building reliable systems
- Experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment
- Deep systems and infrastructure knowledge
- Advanced knowledge and hands-on experience with CI/CD systems
- Automation advocate - you truly believe in removing operation load with software
Description:
Software Engineer - Site Reliability Engineering (Primary Role)
Qualifications :
- A love of solving hard problems
- Putting your customers first, whether they be internal or external, and making them more productive, happy, and successful
- Experience with Azure AKS, AWS
- Experience with Kubernetes, ECS, EKS, or other container orchestration system
- Some sort of infrastructure-as-code system: Ansible, Terraform, CloudFormation, CDK, etc
- Logging systems: Splunk, EventHub, ELK etc
- Bachelor's degree in Computer Science or similar or equivalent experience
- Experience creating automated solutions & eagerness to automate
Responsibilities :
- Experience monitoring services and infrastructure, log collection, analytics, and application performance monitoring (APM)
- Improve metrics on our main services, and act as a subject matter expert for dev teams
- Recommend and guide improved monitoring and alerting processes
- Identify performance bottlenecks and provide recommendations for improvement
- Proactively identify and solve problems that we didn't even know we had
- Help build, deploy, and scale a load testing environment that is analogous to production
- Enforce security and operational safety controls
- Experience with Performance testing or Chaos testing a plus
- Contribute to the architectural improvements to meet future scaling and observability requirements
- Strong performance issue triaging skills Log analysis, thread dump analysis , heap dump analysis
- Self-motivated individual who is proactive in driving tasks to completion
- Participate in on-call rotation (Team is scattered across America and Europe, so you can sleep at night!), support developers' questions and attending incidents
What are the top 3 skills needed/ required?
5 years in a Reliability Engineering, DevOps or infrastructure focused role
What skills and/or experience would separate the top candidate?
Advanced experience with programming languages (GoLang, Python, Java)Passion for designing and building reliable systems
What makes a candidate profile stand out to you?
5 Years in a Software Engineer with an emphasis in reliability engineering role
Do they need to be in a certain location/ hub or remote?
Hybrid -Sunnyvale, CA or Bentonville or Atlanta office Pref
Does this contract have the opportunity to extend or convert to an FTE?
Right candidate may open up that door
Required Skills : Golang,Java
Additional Skills : Python DeveloperThis is a high PRIORITY requisition. This is a PROACTIVE requisition