Company

PelotonSee more

addressAddressNew York, NY
type Form of workFull-Time
CategoryInformation Technology

Job description

ABOUT THIS ROLE

Peloton is seeking an outstanding Platform Reliability Engineer with a K8s (Kubernetes) focus to join our Platform team. Our team builds and maintains a multi-cluster, multi-region, reliable, and highly scalable Kubernetes platform. In this role, you will have a rare and great opportunity to work with groundbreaking technologies that encourage innovation and ensure the reliability of running workloads in a flexible, scalable, and secure way.

YOUR DAILY IMPACT AT PELOTON

  • You will be a technical leader within your team, influencing and driving technical investments across partner teams with a "Platform Thinking" attitude. You will help others in design, execution, and problem-solving
  • Architect, develop, test, release, and support CI/CD systems such as Jenkins, GitHub Actions, Gradle, and Artifactory
  • Adhere to best practices in architectural design, testing (unit, integration, visual, and regression), and scrum methodology
  • Assist in planning, execution, and updating of technical roadmaps
  • Host a critical infrastructure that ensures that our developers have the best experience possible on multiple Kubernetes pods across multiple clusters
  • Automatic, fast auto-scaling for Connected Fitness devices and eCommerce platform
  • Develop and manage our Container Orchestration Platform, overseeing a diverse ecosystem of over 2,000 applications. This includes Multi-Cluster/Multi-tenant Kubernetes with 15+ clusters per environment, Istio Multi-cluster Mesh, and an AWS multi-account structure
  • Design, improve, and implement additional services for our centralized Observability Platforms, ensuring efficient log management based on Splunk, and effective monitoring and alerting powered by DataDog and PagerDuty.
  • Provide a platform for machine learning (and other exciting workloads) Allow developers to move quickly and experiment, without getting in the way
  • Promote standard methodologies for building and operating highly reliable systems
  • Consult in code and design reviews, planning, and technical discussions to ensure all are high quality, efficient, and well documented and meet reliability and capacity requirements
  • Automate everything, from infrastructure down to day-to-day tasks
  • Follow standard incident management process and demonstrate ability to conduct timely post-mortems of infrastructure incidents and high judgment in knowing when to triage and when to dive down into a root-cause analysis
  • Assist with all aspects of operational security and compliance, seek out potential threats to security and reliability, and advocate solutions
  • Participate in a rotating on-call duty schedule, providing support and assistance for the services within the Platform team's responsibility

YOU BRING TO PELOTON

  • A degree in Computer Science, Engineering, or a similar field of study or equivalent work experience
  • 3+ years of experience in software engineering, with a solid understanding of Kubernetes and Infrastructure as Code
  • 1+ years of systems configuration and automation experience (e.g. Ansible, Chef, Puppet, Terraform)
  • Extensive knowledge and hands-on experience in AWS Cloud infrastructure and Services, including CI/CD and IaC provisioning tools (Jenkins, ArgoCD, Scalr, Terraform, and Github Actions)
  • Experience in a cloud environment like AWS or GCP, and familiarity with running containerized services
  • Experience with a programming language like Python, Golang or Java.
  • Knowledge of standard practices in observability and monitoring for Kubernetes clusters at scale with experience in cost optimization tools like Kubecost, Goldilocks, etc.
  • Knowledge of standard processes in regards to securing a Kubernetes cluster and its deployments at scale

BONUS

  • Passion for helping development teams make the transition to a container-native world
  • Passion for reliable, scalable, observable software with a sense of ownership
  • Design and operate large, reliable, and scalable distributed systems
  • Knowledge of network infrastructure basics, including DNS, DHCP, firewalling, and load balancing, to facilitate multi-functional collaboration.

#LI-Hybrid

#LI-SW2

Refer code: 8164396. Peloton - The previous day - 2024-02-08 14:01

Peloton

New York, NY
Jobs feed

Junior Accountant

Ascendo

New York, NY

Accounting and Finance intern

Alts| Alteration Specialists + Label

New York, NY

Pulmonary Critical Care - Central Illinois - 500k Potential with 40k Sign-on

Enterprise Medical Recruiting

Illinois, United States

Accounts Payable Analyst - New York

Datadog

New York, NY

Hospitalists: Join West Virginia Academic Affiliated Program

Enterprise Medical Recruiting

West Virginia, United States

Payroll Analyst

Disco

New York, NY

Ophthalmologist - Portland, OR

Matrix Providers

Lorida, FL

Psychiatrist

Yescare

United, PA

Internal Medicine - FQHC - Sign-On Bonus - 30 miles North of Boston

Enterprise Medical Recruiting

Massachusetts, United States

Share jobs with friends

Related jobs

Platform Reliability Engineer

System Performance & Reliability Engineer

Recurrent Energy

New York, NY

7 days ago - seen

Site Reliability Engineer - Security Infrastructure

Palantir Technologies

New York, NY

2 weeks ago - seen

Lead Software Engineer-Site Reliability Engineer

Wells Fargo

New York, NY

2 weeks ago - seen

Senior Software Engineer (Database Reliability Engineer)

Warner Bros. Discovery

New York, NY

3 weeks ago - seen

Site Reliability Engineer AI

York State Department Of Labor

New York, NY

a month ago - seen

Product Reliability Engineer

York State Department Of Labor

New York, NY

2 months ago - seen

Senior Site Reliability Engineer (Remote, AMER)

Nillion

New York, NY

2 months ago - seen

Software Engineer II, Platform & Site Reliability Engineering

Nomad Health

New York, NY

2 months ago - seen

Sr. Software Engineer II, Platform & Site Reliability Engineering

Nomad Health

New York, NY

2 months ago - seen

Platform Reliability Engineer (Remote)

Localize

$105K - $133K a year

Kingston, NY

2 months ago - seen

Site Reliability Engineer II (JR16347)

Teladoc Health

$89,898 - $136,000 a year

Purchase, NY

2 months ago - seen

Planning Engineer, Reliability Studies

York Iso

$74,000 - $123,600 a year

Rensselaer, NY

2 months ago - seen

Sr. Reliability Engineer

Spellman High Voltage

Hauppauge, NY

3 months ago - seen

Junior Site Reliability Engineer

Sesame Workshop – Temporary

$45 - $52 an hour

New York, NY

3 months ago - seen

Senior Site Reliability Engineer

Adp

New York, NY

4 months ago - seen

Senior Site Reliability Engineer

Theguarantors

New York, NY

4 months ago - seen

Senior Software Engineer, SRE (Site Reliability Engineer)

Smartasset

New York, NY

4 months ago - seen

Data Reliability Engineer II

Doubleverify

New York, NY

4 months ago - seen