Company

Data Engineer with Kafka - HAN IT StaffingSee more

addressAddressWayne, NJ
type Form of workFull-Time
CategoryEngineering/Architecture/scientific

Job description

Site Reliability Engineer
Malvern, PA- Three months RemoteBriefly describe the duties and responsibilitiesDescription
• Are you an engineer who loves to solve impactful complex operational problems?
• Are you passionate about finding opportunities to improve system performance and efficiency, scalability, fault tolerance, and self-healing capabilities?
• Are you excited about Chaos Engineering? Do you want to apply these principles and creatively experiment with our systems to Client hidden weaknesses?
• Are you obsessed with understanding systems inner state, interactions between systems or observability-driven development?
If the above holds, then the Lead Site Reliability Engineer opportunity at Vanguard is for you! A successful candidate will likely have experience in being a Full Stack Engineer who has supported their applications operationally. You will be solutioning reliability problems across product families and continuously seeking opportunities to improve our systems' "-ilities”. You will also help define, maintain, and carry out subdivisional Reliability Engineering standards, contribute to enterprise-wide libraries for reliability, and train product SRE and product family SRE leads within the subdivision.
In this role you will:
1. Instrument, enhance and advocate for system observability. Identify and develop solutions to bridge systems observability gaps.
2. Collaborates with internal teams to evaluate the health, stability and reliability of systems/platforms. Looks for opportunity to improve system performance efficiency and resiliency.
3. Develops and communicates new standards and newly available tools and frameworks across subdivisions. Enforces reliability standards. Designs and develops new automated solutions for reliability.
4. Provides technical leadership, consultancy, and coaching on designing and implementing both traditional and serverless architectures in AWS with an emphasis on repeatability, scaling options, resilience, reliability, telemetry, networking, etc., including design patterns for resilient systems
5. Leads failure modes analysis spanning product families when new features and architecture patterns are introduced. Facilitates post-incident reviews for any high severity client impacting events local to the product family.
6. Leads cross-product or cross-subdivision chaos experimentation.
7. Designs, reviews, and coaches others on performance tests using appropriate components (e.g., requests per minute, # of threads, the construction of a request with headers and cookies)
8. Consults, reviews, coaches, and influences architectural decisions, including non-functional aspects, proposing potential technical solutions/enhancements, and explaining convincingly which is better and why.
9. Contributes to or leads Reliability Engineering and Resilience communities of
practice. Remains informed about Site Reliability Engineering activities happening within the subdivision.
10. Works with product owners to set subdivision goals for higher availability and SRE impact, and tracks progress toward achieving them.
11. Provides technical leadership, guidance, consulting, training, and governance on SRE to one or more product families in a subdivision.
12. Identifies opportunities to automate away toil and develops solutions, monitors error budget exhaustion rates, configures auto scaling thresholds for the product, and incorporates resilience patterns, such as circuit breakers, into the application code. Develops complex deployment and/or routing strategies for high availability.
13. Maintains and looks for opportunities to improve centralized incident response playbook for the subdivision to document standards for managing communication and escalation during an incident.
14. Oversees blameless post-incident reviews for high severity incidents involving more multiple product families.
Core Responsibilities/ Qualifications
• Minimum of eight years related work experience, with at least three years of development experience.
• Undergraduate degree or equivalent combination of training and experience. Graduate degree preferred.
• Full stack development – JDK8+ preferred with spring boot, Rest APIs, multithreaded, multiprocessing applications, Graphql. Experience with UI development (familiar with Angular, TypeScript, NodeJS etc.) is a plus.
• Ability to diagnose and resolve problems in high-throughput applications,
• Experience with one or more observability frameworks or tools – Experience with OpenTelemetry (java, js, etc.), Cloudwatch, Grafana, Splunk, etc.
• Exposure to *nix environments including some shell script development and basic command execution.
• Strong understanding of database principles and working knowledge in distributed storage and infrastructural solutions.
• Experience with container management and micro-services architectures such as Docker in cloud and on-premises infrastructure.
• Working knowledge of AWS network foundations, application networking, edge, and network security.
• Excellent communication, and documentation skills.
Refer code: 6621415. Data Engineer with Kafka - HAN IT Staffing - The previous day - 2023-12-01 05:40

Data Engineer with Kafka - HAN IT Staffing

Wayne, NJ
Popular Site Reliability Engineer jobs in top cities
Jobs feed

Nurse Consultant

Starnes Davis Florie Llp

Birmingham, AL

$85,000 - $100,000 a year

Registered Nurse - Hoover, AL

American Healthcare Resources

Birmingham, AL

Registered Nurse Substitute

Vestal Central School District

Vestal, NY

$123 a day

Public Health Nurse

The Personnel Board Of Jefferson County

Alabama, United States

ASU Registered Nurse/Recovery 6AM- 2:30PM

University Of Alabama At Birmingham

Birmingham, AL

School Nurse-Jefferson County, AL

American Healthcare Resources

Birmingham, AL

Registered Nurse - Med Surg - DAY SHIFT

Common Landing

Birmingham, AL

$3,008 a week

PreOP PACU Registered Nurse

Medplex Outpatient Surgery Center

Birmingham, AL

Registered Nurse

Comprehensive Pain Center

Birmingham, AL

$28 - $31 an hour

Registered Nurse Med/Surg

King Enterprises

Birmingham, AL

$40 - $100 an hour

Share jobs with friends

Site Reliability Engineer

Han Staffing

Wayne, NJ

2 months ago - seen

Site Reliability Engineer (SRE)

Cls-Group

Iselin, NJ

2 months ago - seen

Site Reliability Engineer (SRE)

Devexperts

Jersey City, NJ

2 months ago - seen

Staff Site Reliability Engineer - FedRAMP

Tenable

Jersey City, NJ

3 months ago - seen

Site Reliability Engineer, Direct to Consumer

Nbcuniversal

Englewood Cliffs, NJ

3 months ago - seen

Senior Site Reliability Engineer, Cloud Operations

Coreweave

Roseland, NJ

3 months ago - seen

Senior Site Reliability Engineer

Yoh, A Day & Zimmermann Company

Holmdel, NJ

4 months ago - seen

Principal Cloud Site Reliability Engineer

Adp

Roseland, NJ

4 months ago - seen

Site Reliability Engineer

Purelogics

Jersey City, NJ

5 months ago - seen

Site Reliability Engineer - Developer Productivity

CoreWeave

Roseland, NJ

5 months ago - seen

Site Reliability Engineer III - Hadoop/Spark, AWS, RDBMS

JPMorgan Chase Bank, N.A.

Jersey City, NJ

5 months ago - seen

Director, Cloud Site Reliability Engineer

Prudential Financial, Inc.

Newark, NJ

5 months ago - seen

Site Reliability Engineer

Fiserv, Inc.

Berkeley Heights, NJ

5 months ago - seen

Site Reliability Engineer/Sr. Java Developer

Visit Www1.jobdiva.com Now!

Secaucus, NJ

6 months ago - seen

Site Reliability Engineer/Sr. Java Developer

eTeam

Secaucus, NJ

6 months ago - seen

SITE RELIABILITY Engineer

Han Staffing

Wayne, NJ

6 months ago - seen

Staff Site Reliability Engineer

NBCUniversal

Englewood Cliffs, NJ

6 months ago - seen