Company

TEKREQS, Inc.See more

addressAddressNew York, NY
type Form of workFull-Time
CategoryEducation/Training

Job description

Job Description

SRE Engineer - Kubernetes for Advanced Compute
Location: New York, NY

Join a Software Product firm located in New York City that runs on data. It's this firm's business and it's their product. It's why thousands of companies partner with this firm. This firm has surpassed petabytes of data, with no end in sight.

The Team:
This team provides the infrastructure that supports core Data Services including the firm’s Data Science and Machine Learning platform, Search Infrastructure, and various others. The team's challenges span both software and hardware and the scale this team works on is massive.

This team is trusted to build the systems that run the firm's newest cutting-edge platforms. This team is also depended upon to manage the configuration, deployment, and operation of the systems that power the data backend of the firm. On this team, you’ll have the ability to truly innovate and invent, helping define the technical foundations of groundbreaking systems. Built with containerization and Kubernetes on top of leading-edge hardware, including GPU's and DNN-specific hardware, this team has built systems that rival super-computing platforms across the world.

We’ll trust you to (Responsibilities):
• Design, build, and automate new solutions centered around the Kubernetes container orchestration platform and its ecosystem of projects
• Be responsible for solutions which maintain configuration and robustness of systems
• Analyze performance, metric placement and interpretation, and capacity planning
• Troubleshoot and debug runtime issues with software and hardware
• Do OS and hardware level optimizations
• Interact with platform developers to understand and validate their workflows, requirements, application performance, and application resilience

What’s in it for you:
• An opportunity to make key technical decisions which help define the future of data and analytics infrastructure platforms
• The chance to apply your existing experience while gaining cutting edge new experience in Kubernetes, containerization, GPU's, Data Science, and distributed database systems
• Your solutions will drive new functionality within the Bloomberg Terminal and other client interfaces - direct drivers of key decisions around the world

You need to have (Required Skills):
• 2+ years Systems Configuration and Automation experience (e.g. Ansible, Chef, Puppet, SaltStack - error handling, idempotency, configuration management)
• 2+ years Linux systems experience (Ubuntu, Debian experience preferred, ideally conversant in Unix networking and C system calls)
• Proven experience in a programming and/or scripting language (e.g. Python, Go / Golang, Java, Ruby)
• A strong familiarity with Continuous Integration and Continuous Deployment methodologies, chat-ops, etc.
• Proven experience building and scaling out mission-critical, elastic load distributed, and high throughput systems

We’d love to see (Any of the following would be considered a plus):
• Experience with networking is a plus (e.g. packet analysis, routing protocols).
• Open source experience is a plus (a well curated blog, upstream accepted contribution or community presence)

How the team gives back:
This new team will make extensive use of Open Source Software. As part of that, this team makes a commitment to upstreaming features they'll be developing within Kubernetes and its ecosystem. Whether pushing bug-fixes upstream, developing new features, giving presentations at conferences/meetups or collaborating with industry leaders open source will be at the heart of this new team. It's not just something the team just does in their free time -- it is how they work.

Refer code: 7563217. TEKREQS, Inc. - The previous day - 2024-01-02 18:17

TEKREQS, Inc.

New York, NY

Share jobs with friends