Company

CoreweaveSee more

addressAddressRoseland, NJ
type Form of workFull-Time
CategoryEngineering/Architecture/scientific

Job description

Job Description

CoreWeave is a specialized cloud provider, delivering a massive scale of GPU compute resources on top of the industry's fastest and most flexible infrastructure. CoreWeave builds cloud solutions for compute intensive use cases — VFX and rendering, machine learning and AI, batch processing, and Pixel Streaming — that are up to 35 times faster and 80% less expensive than the large, generalized public clouds. Learn more at www.coreweave.com.

About the role:

The Cloud Operations Team is the heart of CoreWeave's operational practice. In this role, you'll help define and shape how Site Reliability Engineering (SRE) is implemented at CoreWeave. The Cloud Operations team defines and implements tooling and processes that enable operational best practices and continual improvement across all engineering teams.

An 'SRE of SREs,' you'll define and implement system and workflow automation ensuring service owners can rapidly identify and mitigate availability and performance regressions. Collaborating across engineering, you support service owning SRE's with the 'picks and shovels' they need to excel at running their services.

You will work with a team of 8-10 mixed-specialization engineers and have the opportunity to work on the full gamut of rewarding challenges that come with building the AI Cloud in a communicative, supportive, and high-performing environment.

As a member of the Cloud Operations Team you have the opportunity to:

  • With a customer first mindset, establish reliability and quality assessment patterns for all CoreWeave systems.
  • Improve the performance, security, reliability, and scalability of internal and externally facing services.
  • Develop dashboards, alerts, automated remediation, and insights into the customer experience using observability tools.
  • Create and maintain Kubernetes operators, custom controllers, and other tools to intelligently scale our operational capability.
  • Establish and integrate incident and change management tools and workflows.
  • Act as Incident Commander for priority incidents and lead post mortems.
  • Participate in on-call rotation as needed as we establish and operationalize this new team
  • Enable and evangelize Reliability Engineering across CoreWeave's engineering teams.
  • Grow, change, invest in your teammates, be invested-in, share your ideas, listen to others, be curious, have fun, and, above all, be yourself.

Wondering if you're a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match. Here are some qualities we've found compatible with our team. If a portion of this resonates with you, we'd love to talk.

  • You have experience operating services in production and are interested in driving engineering practices such as: reliability at scale, testing (load, recovery, system etc.), progressive deployments, error budgets, observability, and fault-tolerant design.
  • You have experience automating manual processes and integrating various operations and productivity tools.
  • You've done some Linux shell scripting and/or can navigate a *nix-based operating system (with the right cheat sheet, if required).
  • You are familiar with debugging and administration of linux and Kubernetes environments.
  • You're comfortable with the idea of codifying practices into Kubernetes controllers, operators, and other applications using a modern programming language.
  • You have experience with incident management for your team or an organization.
  • You're comfortable in open source environments.
  • You're excited to join a team with diverse perspectives and backgrounds that believe in tackling challenges, growing hand in hand, and winning together.

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $165,000 to $200,000 annually. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.

Hybrid Workplace

If you reside within a 30-mile radius of our New Jersey, New York, or Philadelphia offices, we're excited for you to join us at the office at least three times a week, recognizing the significance we place on fostering connections, collaboration, and creativity within our office culture. Our commitment to operating as a hybrid workplace underscores our dedication to enabling our employees to tailor their work-life balance to their individual preferences

Why CoreWeave?

At CoreWeave, we work hard, have fun, and move fast! We're in an exciting stage of hyper-growth that you will not want to miss out on. We're not afraid of a little chaos, and we're constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:

  • Be Curious at your Core
  • Act like an Owner
  • Empower Employees
  • Deliver Best In-Class Client Experience
  • Achieve More Together

We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and provides the opportunity to develop innovative solutions to complex problems. As we get set for take off, the growth opportunities within the organization are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!

Benefits

We offer a competitive salary and benefits, including:

  • Medical, dental and vision insurance - 100% paid for the employee
  • Company paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Tuition Reimbursement
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our offices
  • Weekly massages in NJ office
  • A casual work environment
  • Work culture focused on innovative disruption

California Consumer Privacy Act - California applicants only

CoreWeave is an equal opportunity employer, committed to our diversity and inclusiveness. We will consider all qualified applicants without regard to race, color, nationality, gender, gender identity or expression, sexual orientation, religion, disability or age.

Refer code: 8702260. Coreweave - The previous day - 2024-03-23 20:35

Coreweave

Roseland, NJ
Popular Senior Site Reliability Engineer jobs in top cities
Jobs feed

Superintendent - WWTP

Gpac Talent Network

Norfolk, NE

Practice Development Manager

Neurostar

Milwaukee, WI

United States, Wisconsin, Milwaukee

Superintendent - WWTP

Gpac Talent Network

Brookings, SD

Agricultural Loan Officer

Gpac Talent Network

Kankakee, IL

Agricultural Loan Officer

Gpac Talent Network

Emporia, KS

Trust Officer

Gpac Talent Network

Emporia, KS

Crop Insurance Farm Management

Gpac Talent Network

Emporia, KS

Steward - Seaward Services - Explorer

Seaward Services

Charleston, SC

Superintendent - WWTP

Gpac Talent Network

Sioux Falls, SD

Share jobs with friends

Related jobs

Senior Site Reliability Engineer, Cloud Operations

Senior Site Reliability Engineer

Yoh, A Day & Zimmermann Company

Holmdel, NJ

3 months ago - seen