Company

Grid DynamicsSee more

addressAddressSan Ramon, CA
type Form of workFull-Time
CategoryEngineering/Architecture/scientific

Job description

Description
Position at Grid Dynamics
We are seeking a skilled Site Reliability Engineer to setup and maintain a distributed, high-load infrastructure, operating at scale. Your expertise will be instrumental for overall reliability improvements, in a close partnership with our development teams and aim to design & build new services together. You know how to do troubleshooting of Spark jobs, setting up alerts and monitoring, working with various teams to coordinate a fix, and handling incident management for given services you are managing.
Responsibilities:
  • SRE participates and provides feedback in design, development, and implementation of integration processes for Enterprise Data Lake, Data Warehouse, and BI Applications
  • Collaborates with Architects, Engineers, Business Intelligence Developers, and other teammates to achieve common goals
  • The Data SRE is responsible for end-to-end service availability and performance of the data platforms
  • Responsible for meeting defined organizational SLAs, and ongoing tracking and optimizing service availability using established Key Performance Indicators (KPIs)
  • SRE is responsible for optimizing platform utilization and billing cost containment
  • Lead incident resolution, Root cause analysis (RCAs), blameless post-mortem, and problem management
  • Active participant of Disaster Recovery planning and Business continuity planning and drills
  • Review incident trends, identify recurring issues, build automation to eliminate toil.
  • Communicate progress and resolution to appropriate stakeholders and leadership
  • Lead by example, mentor the team, and establish credibility through quality technical execution.
  • Recommend application changes to improve application performance, reliability, and cost to operate
  • Review existing processes and recommend changes or institute new processes as necessary, including observability, alerting, operations, engineering, and system tuning, etc.
  • Generate high-quality documentation detailing the data platform, common patterns, runbooks, SOPs, knowledge base, etc.
  • Work in shifts in a globally distributed team, with follow-the-sun approach.

Requirements:
  • Degree in computer science/engineering or equivalent experience
  • 2+ years' experience (5+ preferred) in SRE or similar roles
  • Hands-on with Big Data stack, including managing Spark jobs
  • 2+ years' experience (5+ preferred) in Kubernetes
  • Experienced in Python, shell scripts, SQL, and PL/SQL scripting
  • Experienced in Change & Release process, GitHub and CI/CD solutions
  • Experience in on-prem and public cloud platforms (AWS preferred)
  • Experienced in process reviews, continuous improvement, automation, and toil elimination
  • Experienced in high availability (HA), high transaction volume environments, backup/recovery, and disaster recovery
  • Strong background in full-lifecycle support across multiple platforms or languages
  • Ability to interact with tech/non-tech teams in Infrastructure, Network, Development, Business Analysts, and QA teams
  • Experience in analyzing and recommending solutions for production issues
  • Familiarity with Infrastructure as Code (IaC) and Terraform scripting is a plus

What we offer:
  • Opportunity to work on bleeding-edge projects
  • Work with a highly motivated and dedicated team
  • Competitive salary
  • Flexible schedule
  • Benefits package - medical insurance, sports
  • Corporate social events
  • Professional development opportunities

NB:
Placement and Staffing Agencies need not apply. We do not work with C2C at this time.
At this moment, we are not able to process H1B transfers. Applicants with CPT and OPT visas are welcome to apply.
About Us:
Grid Dynamics (Nasdaq:GDYN) is a digital-native technology services provider that accelerates growth and bolsters competitive advantage for Fortune 1000 companies. Grid Dynamics provides digital transformation consulting and implementation services in omnichannel customer experience, big data analytics, search, artificial intelligence, cloud migration, and application modernization. Grid Dynamics achieves high speed-to-market, quality, and efficiency by using technology accelerators, an agile delivery culture, and its pool of global engineering talent. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the US, UK, Netherlands, Mexico, and Central and Eastern Europe.
To learn more about Grid Dynamics, please visit www.griddynamics.com. Follow us on Facebook, Twitter, and LinkedIn.
Refer code: 6963238. Grid Dynamics - The previous day - 2023-12-14 01:01

Grid Dynamics

San Ramon, CA
Jobs feed

Front Desk Specialist - First Shift

Grand Rapids Ophthalmology

Michigan, United States

$30.4K - $38.5K a year

Warehouse Selector - Start ASAP

Ht Hackney

Wyoming, MI

$24.80 an hour

Construction Accounting Assistant

Zahn Builders, Inc

Holland, MI

$25 - $35 an hour

Patient Access Coordinator 11:00 am- 8:00 pm

Rural Psychiatry Associates

Grand Forks, ND

$35.1K - $44.5K a year

Shipping and Receiving (10am to 7pm)

Zeeland Farm Services I

Zeeland, MI

$21 an hour

Gas Station Attendant

Sam’s Club Holland, Mi

Holland, MI

From $16 an hour

Seasonal Rental Yard Associate (Part-Time)

Acme Tools

Grand Forks, ND

$31.2K - $39.5K a year

Crew Member

Jet's Pizza

Muskegon, MI

$10.50 - $15.00 an hour

Front of House Crew Member

Culver's

Grand Forks, ND

$16 - $18 an hour

Share jobs with friends

Related jobs

Site Reliability Engineer- Spark

Site Reliability Engineer, Data Analytics

Software And Services

San Diego, CA

2 days ago - seen

Senior Staff Site Reliability Engineer

Nvidia

$164,000 - $310,500 a year

Santa Clara, CA

3 weeks ago - seen

Senior Software Engineer, Site Reliability Engineering

Forward

$100,000 - $220,000 a year

San Francisco, CA

3 weeks ago - seen

Cloud DevOps / Site Reliability Engineer, Applied Machine Learning

Software And Services

Sunnyvale, CA

4 weeks ago - seen

Site Reliability Engineer - Redis

Software And Services

Cupertino, CA

a month ago - seen

Site Reliability Engineer - Solr

Software And Services

Cupertino, CA

a month ago - seen

Site Reliability Engineer (remote - CA locals only)

Culturetech Solutions

Sacramento, CA

a month ago - seen

Senior Site Reliability Engineer (SRE) - ASE / iCloud

Software And Services

Cupertino, CA

2 months ago - seen

Lead Site Reliability Engineer

Job Board

San Francisco, CA

2 months ago - seen

DevOps & Site Reliability Engineer (SRE)

Hardware

Cupertino, CA

2 months ago - seen

Sr Site Reliability Engineer - Cross Functional

Software And Services

Cupertino, CA

2 months ago - seen

Site Reliability Engineer (SRE) - ASE / iCloud

Software And Services

Cupertino, CA

2 months ago - seen

Senior Site Reliability Engineer - DNS Infrastructure - Direct Hire

Braintrust

San Francisco, CA

2 months ago - seen

Senior Software Engineer, Site Reliability - Direct Hire

Braintrust

San Francisco, CA

2 months ago - seen

Site Reliability Engineer (SRE) 100% remote

Docupace Technologies, Llc

Culver City, CA

2 months ago - seen

Sr Site Reliability Engineer (Hybrid)

Nbcuniversal

Los Angeles, CA

3 months ago - seen