Job Description
As a Principal DevOps Software Engineer, you will actively interface with software developers, product managers, test engineers, and administrators on projects to design and develop the build, release, and deploy toolchain for DevOps. You should be able to identify, troubleshoot, and resolve issues quickly and effectively. Responsibilities include capacity planning, high availability engineering, performance tuning, IT troubleshooting, and automation/tools development.
Our platforms are at the forefront of driving IoT and OOB connectivity with intelligent device services that enable our customers' data collection at the Edge to processing, ingestion, storage, analysis, and search.
- Design and develop the build, release, and deploy toolchain for DevOps
- Setup, manage and maintain parity across dev, staging and production application environments in cloud infrastructure
- Installation, setup and maintenance in-house server rooms and data centers
- Support for local and remote development sites
- Provide release cadence across multiple environments
- Prototype and develop cloud native architecture solutions for application needs
- Design and implement monitoring infrastructure development
- Excellent communication skills and teamwork is a must!
- Ensure the operational integrity of the global infrastructure
- Build and expand infrastructure capacity at remote Data Centers
- Perform deployments and maintenance to implement code, architecture and configurations changes
- Provide support and diagnose issues to other teams related to our infrastructure
- Participate in 24/7 on-call rotation
- Develop and maintain new health checks for system and application-level monitoring
Skills and Competencies
- Bachelor’s degree in Computer Science, Science, Engineering or a related field, and a minimum of 10 years of experience in Software DevOps role is required
- Strong ability to architect development toolchains and cloud infrastructure
- Strong knowledge of Linux systems and internals including customizing Yocto and bitbake build structures.
- Thorough knowledge in Python and build systems including Android’s Gradle and Jackserver
- Experience in creating software to automate production systems with one of the following languages: Go, Python, Ruby, Java, etc.
- Strong working knowledge of AWS Cloud infrastructure (EC2, ECS, EKS, VPC peering, Route53, S3, Autoscaling) or hybrid environments (OpenStack)
- Hands on experience in VMware products: ESXi, vSphere, vCenter and High Availability (HA).
- Experience in building AWS AMIs for multiple operating systems including Linux and Amazon Linux.
- Experience with container technology including Kubernetes and Docker.
- Good understanding of networking and related protocols; must have a strong understanding of fundamentals (HTTP, DNS, TLS)
- Proficiency with source control, continuous integration (e.g. Git, Jenkins)
- Thorough knowledge of source code control best practices, including code review tools, git and repo branch management, and server configurations and diagnostics.
- Demonstrate experience troubleshooting problems and working with a team to resolve web scale production issues
- Strong experience with configuration management, monitoring and systems tools (i.e. Ansible, Salt, Nagios, Graphite, Fluentd, etc.). Ansible is preferred
- Good understanding of Postgres SQL databases
- Drive to build robust automated logging, monitoring, and alerting systems with tools such as CloudWatch, AppDynamics, or SumoLogic.
- Exposure to messaging pub/sub systems (e.g. MQTT, RabbitMQ, Active-MQ, Kinesis, Kafka etc.)
- Troubleshooting critical development systems (Build failures, critical web services)
- Experience with Release Management processes and controls
- Ability to specify build server hardware configurations and best practices for distributed and configuration servers.
- Excellent understanding of IT datacenter operations.
- Experience managing hundreds or thousands of servers
- Self-starter; able to complete defined tasks independently.