Required Visa Status:
Citizen | GC |
US Citizen | Student Visa |
H1B | CPT |
OPT | H4 Spouse of H1B |
GC Green Card |
Employment Type:
Full Time | Part Time |
Permanent | Independent - 1099 |
Contract – W2 | C2H Independent |
C2H W2 | Contract – Corp 2 Corp |
Contract to Hire – Corp 2 Corp |
Description:
SENIOR SITE RELIABILITY ENGINEER
RXMG is a California-based digital advertising company that employs our own state-of-the-art analytical and consumer intelligence platform to match people with the products they need to enrich their financial well-being.
We seek a Senior Site Reliability Engineer to join our engineering team to help develop an inclusive, innovative, and collaborative team environment.
The Ideal candidate is an experienced Senior Site Reliability Engineer with a strong technical background. A Site Reliability Engineer (SRE) is a professional uniquely positioned at the software engineering and systems operations crossroads. Your role is to develop and implement scalable, reliable, and efficient systems, ensuring that both internal and external services meet the highest standards of uptime and performance.
You will be working 100% remotely and should be extremely comfortable working via Slack, Google Meet, Zoom etc.
REQUIREMENTS:
- 4+ years of experience as a Site Reliability Engineer.
- Deep understanding of containerized ecosystems
- Expert Working knowledge of:
- Google Cloud Platform (GCP), Amazon Web Services (AWS) components, monitoring tools, and alerting systems.
- NGINX/Apache configuration and PHP module installation through apt or PECL.
- Firewalls, including setting up, managing, and understanding their role in network security.
- Be adept in managing user and file permissions across different operating systems, ensuring appropriate access rights without compromising security.
- Proficiency in using ‘.htaccess’ for web server configurations, such as URL redirection and access control, is crucial.
- Additionally, having a strong understanding of various hashing algorithms is essential, particularly for securing sensitive information and ensuring data integrity.
- Hosting blameless postmortems to share findings, discover gaps, embrace transparency, and improve reliability across our services
- Demonstrating Configuration Management to build and maintain consistency across platform components and services.
- Willing to work Pacific Standard Time as well as off-hours
How To Apply:
Incase you would like to apply to this job directly from the source, please click here
Responsibilities:
- Infrastructure Optimization: Tailoring our infrastructure for peak performance is especially crucial as we transition to the RXP platform.
- Uptime & Platform Support: The candidate’s main focus will be to ensure the uptime and reliability of our internal platform. This includes proactive monitoring, troubleshooting, and timely resolution of any issues to maintain continuous operational functionality.
- Site Monitoring & Support: The role also requires maintaining the uptime of our company websites. The candidate should be capable of managing site performance, addressing downtime promptly, and implementing strategies to enhance overall site stability.
- Security: The candidate must also contribute to the security of our systems. This involves implementing basic security measures, responding to security incidents, and collaborating with the security team to uphold the integrity and safety of our digital assets.
- Incident Management: Develop robust incident response protocols to address and mitigate any issues quickly, maintaining service continuity.
- Lead and participate in weekend testing (e.g., capacity testing, fail-over, etc).
- Provide for 24x7x365 on-call technical support for the Engineering and Operations team as needed.
- Provide technical leadership, support, and operational oversight to sustain resiliency and high availability of critical business operations.
- Monitor production, disaster recovery, and certification systems for issues. Troubleshoot and drive resolution of issues.
- Analyze and optimize the performance of core platforms.
- Investigate software defects.
- Assist the Engineering team in resolving build/deployment issues.
- Analyze application logs (e.g., GCP GKE and AWS EKS logs and various platform logs) to troubleshoot or explain perceived issues.
- Execute SQL queries against a database to identify potential performance issues and or create upgrade recommendations.
- Drive capacity planning decisions for RXMG platforms and systems and support capacity planning needs.
- Provide an active voice within Capacity Planning meetings with engineering and technical operations management staff.