Here at Ellucian, we are motivated by a mission - power education so institutions can empower student success. We are the global market leader in EdTech for higher education, serving more than 2,900 customers and reaching over 22 million students in 50 countries. We are dedicated to helping higher education unlock learning for all by providing solutions that support the entire student lifecycle and deliver insights needed now and into the future.
We embrace the power to lead, the courage to innovate, and the determination to grow. At our core, we believe in humanizing our approach, recognizing that our people are our greatest strength. With a shared vision of transformation, we endeavor to shape a brighter future for higher education.
About the Opportunity
Higher education customers are transforming to modern campuses while leveraging technology and solutions to enable student success. We are seeking an innovative Senior Cloud Engineer to join our Platform and Analytics team to help drive the modernization of the Ellucian applications to enable a true SaaS experience. We are in the middle of our journey from a large
environment largely manually built and transforming into a fully pipeline delivered environment using tools like Terraform, Jenkins and Kubernetes. Our primary tech stack consists nodeJS microservices running on ECS and Kubernetes with PostgreSQL. Our goal is to automate the abstract work required to manage our applications and services so that customers experience a true SaaS experience that provides 99.99 availability. That’s where you come in!
You will get to learn some of the latest technologies and enjoy creative independence to bring your ideas to life in an open and collaborative team environment. You will support Ellucian Ethos suite of products and Ellucian Insights.
If you get excited about automation & creation and the impact, you can make on shaping the future of higher education then we should talk!
Where you will make an impact
- Responsibility for delivering on identifying, creating, and maintaining SLO’s
• Design, build, and support automation and monitoring that improve system reliability
• Run the infrastructure comprised of Ansible, Terraform, and Kubernetes
• Partner with R&D and Operations teams to enhance telemetry and reliability
• Create monitoring to detect symptoms and preempt outages
• Understand, simplify, and automate process to improve systems and reduce toil
• Debug production issues across services and levels of the stack
• Partner with R&D teams to advance efforts towards containerization and Kubernetes
• Cover assigned rotation as an on-call resource – working to identify systemic problems and
identify solutions
• Rapidly troubleshoot incidents:
o By leveraging service restorative actions
o By understanding what causes most issues and the actions to mitigate those on your
assigned technology stack
o By understanding what actions will systematically eliminate the causes of common
issues
o Without doing more harm to currently impacted systems
• Report problems and participate in related root cause analysis or incident Post Mortems
o Install, setup, and configure third party tools for collecting and analyzing database
performance and stability issues.
o Provide analysis of poor performance and instabilities identified in systems.
o Participate in post mortem meeting and discussion; providing and documenting
details from the incident or problem
o Provide technical detail for diagnosing and fixing known bugs and problems. Assist
with creating run books, automation or other process improvements to address future
occurrences of issue and for automating common tasks; clearly document the steps
to execute and resolve
• Design, document, develop, test, and deploy automation
• Act as a mentor for Cloud Engineering colleagues by:
o Providing adoption leadership expertise for new standards and best practices
o Participating as a subject matter expert on process improvement; and training & tool
development
o Be the subject matter expert for assigned specializations by providing technical
leadership and technical engineering expertise.
What you will bring
- Proven experience with monitoring development and administration – Datadog, NewRelic,
AppDynamics.
• Containerization experience is highly desired: Docker, ECR, ECS, EKS/KOPS/Kubernetes,
Helm, YAML experience is highly desired: Jenkins, Git, Bitbucket pipelines - DevOps, or Engineering role with AWS platform, preference for, but not limited to: AMI, EC2, EBS, ELB, IAM, KMS, RDS, S3,SNS, VPC, Route 53, CloudWatch, Lambda
• Enterprise scale Linux administration is a plus
• Developing and deploying automation, CLI and API scripting using multiple tools
• Preferred: Terraform, Python, Java, Bash, Jenkins, Git
• Equivalent experience accepted on: Puppet, Chef, Docker, Gradle, JavaScript, Packer, Perl,
and PHP
• In web-based application deployment and administration using Nginx, nodeJS, Python.
• Troubleshooting Postgres database performance is a plus
• AWS certification(s): Developer, DevOps, SysOps or Solutions Architect certification preferred
3+ years of experience
• Possesses the tenacity to delve to the root of the issue quickly, understand why it happened,
and prevent it in the future
• Proven experience collaborating with cross functional global and remote teams with diverse
backgrounds
• Strong verbal and written skills, excellent customer service, as well as high attention to detail. - Selected incumbent must be ready to work on 2 PM -11 PM IST shift timings.
What makes #Ellucianlife
- 22 days annual leave plus 11 public holidays
- Competitive gratuity policy
- Group insurance and Annual health checkup plan with a variety of family and wellness benefits.
- Thrive Flex Lifestyle Account (LSA) that allows you to contribute towards your health, financial or learning interests
- 5 charitable days to support the community that supports us
- Wellness
- Headspace (mental health)
- Wellbeats (virtual fitness classes)
- RethinkCare – caregiver support
- Diversity and inclusion programs that promote employee resource groups such as: Buzzinga and Lean In Team to name a few.
- Parental leave
- Employee referral bonuses to encourage the addition of great new people to the team
- We Foster a learning culture with:
- Education Assistance Program
- Professional development opportunities
#LI-SK1
#LI-Remote