Senior/Staff Site Reliability Engineer
Who We Are
At Platform Science, we’re working to connect everything that moves.
Founded in 2015, we are an open IoT platform that partners with innovative fleets, application developers, vehicle manufacturers, and equipment providers in the transportation industry to deliver revolutionary solutions to supply chain professionals across the globe.
Our employees are an engaging, diverse group of people who believe in the power of great ideas. We hire people with different experiences and perspectives to build a company culture that fuels growth through innovation.
We value thoughtful actions and empathy for others. We approach challenges with resiliency and creativity, while encouraging transparency because, no matter our backgrounds or responsibilities, we are one team.
About the Role
We are looking for a qualified Senior SRE to join our team in San Diego, CA (or remote). You will be hired to solve operational problems and provide support to development teams for critical business applications in production. Our focus is to ensure reliability in all production services and enable dev teams to be able to measure their reliability to effectively make decisions.
The SRE team has the unique opportunity to work with all aspects of our platform. We run entirely in the cloud—AWS, Azure and GCP. Our applications and services are containerized and serverless. If you’re excited about learning and supporting new technologies and many different types of products (including mobile apps, hardware, websites, messaging queues, serverless pipelines, and more), and working with an incredibly talented team, then this is the position for you!
As a Senior SRE, you have a software development background or systems background with strong coding skills. Ideal candidates want to deeply understand how our systems work from the infrastructure level, their dependencies to other systems, to the customer experience and how to mitigate risk. You are comfortable with giving and taking technical direction. You are a great communicator and self starter that strives to make the company and our technologies better.
- Collaborate with teams to architect, engineer, and optimize products for Kubernetes and the cloud
- Create and enhance Continuous Integration/Continuous Deployment (CI/CD) pipelines, release management processes, and tools
- Maintain observability tools and promote standardization and best practices for development teams.
- Build tools, automation, and frameworks to improve system stability and reliability
- Lead the effort in promoting and prioritizing reliability, driving achievement of uptime goals and mentoring colleagues in SRE best practices.
- Provide oncall support to development teams for critical business applications in production
- Play an active role in facilitating an SRE guild, contributing to its operation and ensuring the sharing of knowledge and collaboration among members
- Conduct comprehensive Production Readiness Reviews, working with teams to identify and establish Service Level Objectives (SLOs), and ensure high-quality and dependable services
- Write and contribute to project plans, engineering level documentation, and develop operational excellent standard operating procedures and runbooks with a focus on automation
- 5+ years of experience in SRE or Platform Engineer role supporting a 24x7 production environment
- 3+ years AWS or comparable cloud resource administration/support in a production environment.
- Strong expertise in Kubernetes administration, containerization tools (e.g., Docker), and Helm, adhering to industry best practices such as GitOps.
- Proficiency in scripting languages such as Python, Ruby, Bash, Node.js, and/or Go.
- Experience with distributed tracing and proficient with one or more of the following monitoring solutions: Prometheus, Elasticsearch, Datadog, and Cloudwatch
- Demonstrated proficiency with current software development lifecycle (SDLC) concepts and best-practices, CI/CD pipelines, and test-driven development
- Strong problem-solving, operational skills, and automation advocate
Platform Science Benefits Highlights
The company offers various benefits to regular, full-time employees including:
- Medical, dental, and vision insurance
- Short-term and long-term disability insurances
- AD&D and life insurance
- 401k plan
- Paid vacation, sick leave and holidays
- Six weeks of paid parental leave
For more information please see the Benefits Highlights brochure for regular, full-time employees.
In addition, you can access the Benefit Highlights brochure for regular, full-time employees by copying and pasting the link into your browser: https://www.platformscience.com/benefits.
This is an exempt role. Our job titles for each posting may span across more than one job level. The estimated base salary for this role is between $145,292 and $227,680. The range displayed on each job posting reflects the minimum and maximum target range for new hire base salaries across all US locations. Compensation packages are based on many factors unique to each candidate, including but not limited to skill set, work experience, relevant trainings and certifications, business needs, market demands and specific geographical location. The base pay range is subject to change and may be modified in the future. This role may also be eligible for bonus, equity, and benefits.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits.
We are not accepting unsolicited resumes from employment agencies.
At this time we only consider candidates in these states: AL, AR, AZ, CA, CO, FL, GA, ID, IL, KY, MA, MD, MI, MN, MO, NC, NH, NV, NY, OH, OK, OR, PA, SC, TN, TX, UT, VA, WA, and WI. In the future we plan to add more states.