Company

Latch LlcSee more

addressAddressVirginia, United States
type Form of workFull-time
salary Salary$175,000 - $190,000 a year
CategoryInformation Technology

Job description

COMPANY PROFILE

Join our dynamic team at Lalaith Astor Technical Consulting House, LLC, where we specialize in providing cutting-edge technical solutions to government agencies. As a woman-owned small business (WOSB) and a member of the SBA 8(a) program, we are a small yet fast-growing Federal IT Contractor. We pride ourselves on a culture of innovation, excellence, and a commitment to delivering high-quality services in complex technical, Internet, and cybersecurity domains.

JOB SUMMARY

The Multi-cloud Site Reliability Engineer (SRE) Subject Matter Expert (SME) will support our customer in providing technical leadership, skills, and solutions necessary to support next generation efforts in this enterprise initiative. The SRE SME will assist the team by leveraging their skills and experience to ensure reliability, availability, and performance of the enterprise services for the client in a high availability environment. The SRE SME will work with the development and operations teams to build and maintain a scalable and robust infrastructure that supports the client’s mission and goals.

RESPONSIBILITIES AND DUTIES

Job responsibilities and duties will include, but are not limited to, the following:

  • Design and implement highly available and scalable systems, ensuring the reliability and performance of the company's website and applications in multi-cloud environment.
  • Collaborate with cross-functional teams to define and establish service level objectives (SLOs) and service level agreements (SLAs) for critical systems.
  • Participate in system design consulting, platform management, and capacity planning.
  • Monitor systems and applications, proactively identifying and resolving any performance bottlenecks or availability issues.
  • Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance across systems deployed in AWS, GCP and Azure cloud providers.
  • Develop and maintain automation scripts, configuration management tools, and infrastructure as code (IaC) templates to automate deployment, scaling, and monitoring tasks across multiple cloud platforms.
  • Develop and implement guidelines for provisioning, configuring, and optimizing cloud resources to meet performance, scalability, and cost requirements.
  • Conduct post-incident analyses to identify root causes and implement preventive measures to avoid future incidents.
  • Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
  • Create and maintain documentation for system architecture, configuration, and troubleshooting procedures.
  • Perform capacity planning and resource allocation to ensure optimal system performance and scalability.
  • Collaborate with development teams to implement and deploy new features and enhancements, ensuring they meet reliability and performance standards.
  • Stay up to date with industry best practices, new technologies, and emerging trends in site reliability engineering across major cloud service providers.

REQUIRED QUALIFICATIONS AND SKILLS

The selected candidate must have the following qualifications and skills:

  • Strong knowledge of Linux/Unix and Windows systems and command line tools.
  • Must have proficiency in scripting languages such as Python, Java Script, Shell, or Perl.
  • Experience with configuration management tools like Ansible, Puppet, or Chef.
  • Familiarity with multiple cloud platforms AWS, Azure, and/or Google Cloud.
  • In-depth understanding and expertise with native cloud tools and solutions
  • Deep understanding of the cloud infrastructure provided by various providers, such as AWS, Azure, and GCP.
  • Understanding of networking principles and protocols (TCP/IP, HTTP, DNS, etc.).
  • Knowledge of containerization technologies (Docker, Kubernetes) and orchestration tools.
  • Expertise in monitoring and logging tools such as Prometheus, Grafana, ELK stack, or Splunk.
  • Strong problem-solving and troubleshooting skills, with the ability to analyze and resolve complex technical issues.
  • Excellent communication and collaboration skills to work effectively with cross-functional teams.
  • Strong attention to detail and ability to work in a fast-paced, dynamic environment.

DESIRED QUALIFICATIONS AND SKILLS

  • Experience in architecting and optimizing highly available and scalable systems specifically tailored for government agencies' needs.
  • Proven track record of collaborating with government stakeholders to define and establish service level objectives (SLOs) and service level agreements (SLAs) aligned with agency mission objectives.
  • Demonstrated expertise in leveraging native cloud tools and solutions provided by major cloud service providers to enhance the reliability and performance of enterprise services.
  • Proficiency in containerization technologies and orchestration tools with a focus on ensuring compliance with government security standards and regulations.
  • Strong familiarity with federal government compliance requirements and security protocols, including FedRAMP, FISMA, and NIST guidelines, ensuring seamless integration of security measures into multi-cloud environments.

REQUIRED EXPERIENCE

  • Years of Industry Experience: 15+ years
  • Proven experience as a Site Reliability Engineer or a similar role.
  • Solid understanding of software development methodologies and DevOps principles.
  • Experience with agile and iterative development processes.
  • Familiarity with continuous integration/continuous deployment (CI/CD) pipelines.
  • Experience with source control systems such as Git.
  • Knowledge of security best practices and experience implementing security measures in a production environment.
  • Ability to work independently and handle multiple projects and priorities simultaneously.
  • Strong analytical and problem-solving skills, with a focus on continuous improvement and automation.

Job Type: Full-time

Pay: $175,000.00 - $190,000.00 per year

Benefits:

  • 401(k)
  • 401(k) matching
  • Dental insurance
  • Health insurance
  • Paid time off
  • Parental leave
  • Professional development assistance
  • Referral program
  • Vision insurance

Compensation package:

  • Yearly pay

Experience level:

  • 11+ years

Schedule:

  • 8 hour shift
  • Day shift
  • Monday to Friday
  • On call

Application Question(s):

  • Are you willing to obtain and maintain a Public Trust background check?
  • Do you currently hold legal authorization to work for any employer in the United States? Please note that at this time, we are not in a position to sponsor or assume sponsorship of employment visas.

Experience:

  • a Site Reliability Engineering or Observability Engineering: 8 years (Required)

Work Location: Remote

Benefits

Health insurance, Dental insurance, 401(k), Paid time off, Parental leave, Vision insurance, 401(k) matching, Professional development assistance, Referral program
Refer code: 8855145. Latch Llc - The previous day - 2024-04-03 03:50

Latch Llc

Virginia, United States
Jobs feed

Tripulante

Little Caesars

Pine Bluff, AR

$11 an hour

Shift Lead

Walgreens

Pine Bluff, AR

$17 - $19 an hour

Grocery Cashier 18+ Part Time

Hays / Food Smart

Pine Bluff, AR

$22K - $27.9K a year

Claim Benefit Specialist

Cvs Health

Raleigh, NC

$17.00 - $25.65 an hour

Adjunct- Speech (PEP)

Southeast Arkansas College

Pine Bluff, AR

$42.5K - $53.9K a year

BENEFITS REPRESENTATIVE (MULTIPLE OPENINGS)

University Of Washington

Seattle, WA

$3,907 - $5,241 a month

Heavy Haul Driver-Durham, NC

Associates Asset Recovery

Durham, NC

$39.7K - $50.2K a year

Virtual Benefits Representative

Agent Alliance

Kansas City, MO

$52.9K - $67K a year

Leave of Absence Specialist

Iss World Careers

Remote

$78,000 a year

Patient Insurance Benefit Specialist I

National Jewish Health

Colorado, United States

$19 - $23 an hour

Share jobs with friends

Related jobs

Senior Observability Engineer

Software Engineer - Observability

Stripe

United States

2 weeks ago - seen

Lead Data Engineer Observability and Logging

Wells Fargo

SPRINGFIELD, NJ

3 weeks ago - seen

Software Engineer, Observability

Vercel

United States

4 weeks ago - seen

Observability Engineer - Now Hiring

Teksystems

Phoenix, AZ

4 weeks ago - seen

Software Engineer (Observability)

Digitalocean

Kansas, United States

a month ago - seen

Principal Software Engineer, Full Stack, APM Observability

Splunk

Remote - Oregon, United States

a month ago - seen

Sr. Systems Engineer – Observability

Marriott International, Inc

$96,038 - $190,154 a year

Bethesda, MD

2 months ago - seen

Front-End Engineer, Amazon Monitoring and Observability

Amazon Development Center U.s., Inc.

From $115,000 a year

Seattle, WA

2 months ago - seen

Senior Front-End Engineer, Amazon Monitoring and Observability

Amazon Development Center U.s., Inc.

From $134,500 a year

Seattle, WA

2 months ago - seen

Sr. Site Reliability Engineer (SRE) - Observability

Siemens Corp

Charlotte, NC

2 months ago - seen

Observability Engineer - REMOTE / ONLY LOCAL

It Engagements,Inc.

$43.41 - $70.00 an hour

Fort Worth, TX

3 months ago - seen

Sr Software Engineer - Front-end, Observability (US Remote Available)

Splunk

$139,840 - $192,280 a year

Remote

3 months ago - seen

Senior Software Engineer - Observability

Snowflake

$202,000 - $316,200 a year

Bellevue, WA

3 months ago - seen

Lead Observability Engineer

Synergis

Atlanta, GA

4 months ago - seen

Software Engineer, Observability

Box

$149,000 - $186,000 a year

Redwood City, CA

4 months ago - seen