Accumulus Synergy is a nonprofit trade association working on behalf of industry to address the global need for digital transformation. To help solve for this need, Accumulus is developing a transformative data exchange platform to enable enhanced collaboration and efficiency between life sciences organizations and health authorities worldwide. The Accumulus Platform aims to improve efficiencies in the regulatory process by leveraging advanced technology, including data science and AI, as well as tools for secure data exchange to improve patient safety, help reduce the cost of innovation, and ultimately bring patients safe and effective medicines faster. Accumulus is working with key stakeholders in the life sciences - regulatory ecosystem to build and sustain a platform that aims to meet regulatory, cybersecurity, and privacy requirements spanning clinical, safety, chemistry and manufacturing, and regulatory exchanges and submissions. Accumulus Synergy sponsors include Amgen, Astellas, AstraZeneca, Bristol Myers Squibb, GSK, Johnson & Johnson, Lilly, Merck, Pfizer, Roche, Sanofi, and Takeda.
Job Description
Accumulus is seeking a Site Reliability Engineer for the Product Security Team. As a Site Reliability Engineer you will be responsible for monitoring, automating, and improving the reliability, performance, and availability of our product. You will work on tasks such as preventing incidents, managing infrastructure, building effective monitoring systems, and ensuring the smooth operation of computer systems.
The ideal candidate will bridge the gap between development and operations, focusing on building and maintaining scalable and reliable infrastructure to keep our product running smoothly, even in the face of relentless user demand or unexpected disruptions. This will be a hands-on engineering role within the Product Security Team working, reporting directly to the Lead Site Reliability Engineer.
Starting day one, you will have the unique opportunity to support the growth of Accumulus Synergy through executing a combination of automation, monitoring, and incident response practices to ensure our product meets the stringent reliability requirements.
Responsibilities
Key responsibilities of the role include, but are not limited to:
- System Design and Architecture: Collaborate with software engineers and architects to design and implement highly available and scalable systems. Consider factors like fault tolerance, load balancing, and redundancy to ensure optimal system performance.
- Performance Optimization: Identify performance bottlenecks and implement optimizations to enhance system efficiency. Analyze system resource utilization, conduct capacity planning, and ensure scalability to handle increasing user demand.
- Automation and Infrastructure as Code: Develop and maintain automation frameworks and infrastructure-as-code practices. Automate manual tasks, configuration management, and deployment processes to streamline operations and minimize human error.
- Capacity Planning: Forecast resource requirements based on growth projections and user traffic patterns. Collaborate with cross-functional teams to ensure adequate resource allocation and optimize cost efficiency.
- Service Level Objective (SLO) Monitoring: Define, track, and monitor SLOs to measure and improve system reliability. Develop metrics and dashboards to provide visibility into SLO performance and identify areas for optimization.
- Service-Level Agreement (SLA) Management: Collaborate with stakeholders to define and manage SLAs, including availability targets, response times, and performance metrics. Monitor and report on SLA compliance, driving continuous improvements to meet or exceed agreed-upon service levels.
- Monitoring and Incident Response: Establish robust monitoring systems to track system health, performance, and reliability. Implement proactive alerting mechanisms to detect anomalies and respond swiftly to incidents. Conduct post-incident analyses to identify root causes and implement preventive measures.
- On-call Support: Participate in an on-call rotation to provide 24/7 support for critical incidents. Respond promptly to emergency situations, troubleshoot issues, and restore services in a timely manner.
- Security and Compliance: Collaborate with security teams to implement and maintain security measures, including access controls, vulnerability assessments, and incident response protocols. Ensure compliance with regulatory requirements and industry standards.
- Incident Preparedness and Testing: Conduct incident preparedness exercises and perform regular system testing to identify vulnerabilities and validate disaster recovery plans. Continuously improve incident response processes and playbooks.
Qualifications
- Bachelor's degree required. Degree in computer science preferred.
- 3+ years of experience in Site Reliability Engineering or related field
- Experience in a development role with lead or architect experience
- Hands-on experience with Azure cloud, microservice architecture and web application deployment models
- Highly innovative with a drive for operational excellence
- Strong understanding of cloud technologies, including virtualization and containerization
- Strong understanding of automation and monitoring tools and technologies
- Awareness of security and regulatory compliance requirements
- Problem-solving and analytical skills, including the ability to identify and resolve issues quickly
- Excellent communication and interpersonal skills.
- Strong desire to be part of a highly dynamic and talented technology team
- Ability to provide emergency support 24/7 as a part of an on call rotation
Benefits
While we hope the Accumulus mission is what really attracts you, we also have a lot to offer. Organizations are built by great people, and to attract great people you need to offer a great employee experience. Accumulus can provide:
- Very competitive compensation w/ bonus plan. We must compete with big names in tech & pharma for top talent and compensate accordingly.
- 401(k) matching, immediately vested
- A full benefits package: multiple health plans, vision, dental, life, and disability insurance
- 100% remote work. Accumulus is a fully remote organization, and we intend to remain so
- Experienced leadership to mentor you. We have drawn successful leaders from the biopharma industry with a deep understanding of regulatory affairs and combined them with similarly successful leaders in SaaS product development. Learning opportunities abound.