Senior Observability Engineer

Company	Latch LlcSee more
Address	Virginia, United States
Form of work	Full-time
Salary	$175,000 - $190,000 a year
Category	Information Technology

Job description

COMPANY PROFILE

Join our dynamic team at Lalaith Astor Technical Consulting House, LLC, where we specialize in providing cutting-edge technical solutions to government agencies. As a woman-owned small business (WOSB) and a member of the SBA 8(a) program, we are a small yet fast-growing Federal IT Contractor. We pride ourselves on a culture of innovation, excellence, and a commitment to delivering high-quality services in complex technical, Internet, and cybersecurity domains.

JOB SUMMARY

The Multi-cloud Site Reliability Engineer (SRE) Subject Matter Expert (SME) will support our customer in providing technical leadership, skills, and solutions necessary to support next generation efforts in this enterprise initiative. The SRE SME will assist the team by leveraging their skills and experience to ensure reliability, availability, and performance of the enterprise services for the client in a high availability environment. The SRE SME will work with the development and operations teams to build and maintain a scalable and robust infrastructure that supports the client’s mission and goals.

RESPONSIBILITIES AND DUTIES

Job responsibilities and duties will include, but are not limited to, the following:

Design and implement highly available and scalable systems, ensuring the reliability and performance of the company's website and applications in multi-cloud environment.
Collaborate with cross-functional teams to define and establish service level objectives (SLOs) and service level agreements (SLAs) for critical systems.
Participate in system design consulting, platform management, and capacity planning.
Monitor systems and applications, proactively identifying and resolving any performance bottlenecks or availability issues.
Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance across systems deployed in AWS, GCP and Azure cloud providers.
Develop and maintain automation scripts, configuration management tools, and infrastructure as code (IaC) templates to automate deployment, scaling, and monitoring tasks across multiple cloud platforms.
Develop and implement guidelines for provisioning, configuring, and optimizing cloud resources to meet performance, scalability, and cost requirements.
Conduct post-incident analyses to identify root causes and implement preventive measures to avoid future incidents.
Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
Create and maintain documentation for system architecture, configuration, and troubleshooting procedures.
Perform capacity planning and resource allocation to ensure optimal system performance and scalability.
Collaborate with development teams to implement and deploy new features and enhancements, ensuring they meet reliability and performance standards.
Stay up to date with industry best practices, new technologies, and emerging trends in site reliability engineering across major cloud service providers.

REQUIRED QUALIFICATIONS AND SKILLS

The selected candidate must have the following qualifications and skills:

Strong knowledge of Linux/Unix and Windows systems and command line tools.
Must have proficiency in scripting languages such as Python, Java Script, Shell, or Perl.
Experience with configuration management tools like Ansible, Puppet, or Chef.
Familiarity with multiple cloud platforms AWS, Azure, and/or Google Cloud.
In-depth understanding and expertise with native cloud tools and solutions
Deep understanding of the cloud infrastructure provided by various providers, such as AWS, Azure, and GCP.
Understanding of networking principles and protocols (TCP/IP, HTTP, DNS, etc.).
Knowledge of containerization technologies (Docker, Kubernetes) and orchestration tools.
Expertise in monitoring and logging tools such as Prometheus, Grafana, ELK stack, or Splunk.
Strong problem-solving and troubleshooting skills, with the ability to analyze and resolve complex technical issues.
Excellent communication and collaboration skills to work effectively with cross-functional teams.
Strong attention to detail and ability to work in a fast-paced, dynamic environment.

DESIRED QUALIFICATIONS AND SKILLS

Experience in architecting and optimizing highly available and scalable systems specifically tailored for government agencies' needs.
Proven track record of collaborating with government stakeholders to define and establish service level objectives (SLOs) and service level agreements (SLAs) aligned with agency mission objectives.
Demonstrated expertise in leveraging native cloud tools and solutions provided by major cloud service providers to enhance the reliability and performance of enterprise services.
Proficiency in containerization technologies and orchestration tools with a focus on ensuring compliance with government security standards and regulations.
Strong familiarity with federal government compliance requirements and security protocols, including FedRAMP, FISMA, and NIST guidelines, ensuring seamless integration of security measures into multi-cloud environments.

REQUIRED EXPERIENCE

Years of Industry Experience: 15+ years
Proven experience as a Site Reliability Engineer or a similar role.
Solid understanding of software development methodologies and DevOps principles.
Experience with agile and iterative development processes.
Familiarity with continuous integration/continuous deployment (CI/CD) pipelines.
Experience with source control systems such as Git.
Knowledge of security best practices and experience implementing security measures in a production environment.
Ability to work independently and handle multiple projects and priorities simultaneously.
Strong analytical and problem-solving skills, with a focus on continuous improvement and automation.

Job Type: Full-time

Pay: $175,000.00 - $190,000.00 per year

Benefits:

401(k)
401(k) matching
Dental insurance
Health insurance
Paid time off
Parental leave
Professional development assistance
Referral program
Vision insurance

Compensation package:

Yearly pay

Experience level:

11+ years

Schedule:

8 hour shift
Day shift
Monday to Friday
On call

Application Question(s):

Are you willing to obtain and maintain a Public Trust background check?
Do you currently hold legal authorization to work for any employer in the United States? Please note that at this time, we are not in a position to sponsor or assume sponsorship of employment visas.

Experience:

a Site Reliability Engineering or Observability Engineering: 8 years (Required)

Work Location: Remote

Benefits

Health insurance, Dental insurance, 401(k), Paid time off, Parental leave, Vision insurance, 401(k) matching, Professional development assistance, Referral program

Refer code: 8855145. Latch Llc - The previous day - 2024-04-03 03:50

National Jewish Health

Colorado, United States

$19 - $23 an hour

just now

Senior Observability Engineer

Latch LlcSee more

Job description

Benefits

Tripulante

Shift Lead

Grocery Cashier 18+ Part Time

Claim Benefit Specialist

Adjunct- Speech (PEP)

BENEFITS REPRESENTATIVE (MULTIPLE OPENINGS)

Heavy Haul Driver-Durham, NC

Virtual Benefits Representative

Leave of Absence Specialist

Patient Insurance Benefit Specialist I

Related jobs

Senior Observability Engineer

Software Engineer - Observability

Lead Data Engineer Observability and Logging

Senior Full Stack Observability Engineer - Digital Velocityat CDW Careers

Software Engineer, Observability

Observability Engineer - Now Hiring

Software Engineer (Observability)

Principal Software Engineer, Full Stack, APM Observability

Sr. Systems Engineer – Observability

Front-End Engineer, Amazon Monitoring and Observability

Senior Front-End Engineer, Amazon Monitoring and Observability

Sr. Site Reliability Engineer (SRE) - Observability

Observability Engineer - REMOTE / ONLY LOCAL

Sr Software Engineer - Front-end, Observability (US Remote Available)

Sr Software Engineer - Front-end, Observability (US Remote Available)

Senior Software Engineer - Observability

Lead Observability Engineer

Software Engineer, Observability

ServiceNow Cloud Observability - Staff Developer Velocity Engineer (Dev Ops)

Senior Observability Engineer

Latch LlcSee more

Job description

Benefits

Share jobs with friends

Related jobs

Senior Observability Engineer

Explore trending job searches in the United States

Top States

Top Cities

Top Job Titles

Highest Paying Jobs