Technical Lead - SRE

Tetrate

Tetrate

IT
Posted on Mar 26, 2025

Don’t just follow the industry; define it. Tetrate, creators of Envoy Gateway and Envoy AI Gateway, and architects of industry-standard security practices (SPIFFE and NGAC), is building a world-class field engineering team. Are you ready to build applications that power the global economy, Fortune 150 companies, and protect national security? We’re looking for a Technical Lead – SRE who will apply cloud operations practices across our hybrid environments, improve customer outcomes, and own the operational roadmap.

Tetrate seeks an outcome driven, technically adept Technical Lead – SRE to champion our enterprise customers, demonstrating how we solve Layer 7 challenges and security vulnerabilities.

Responsibilities:

  1. Operational Excellence & Incident Management
    • Improve MTTD and MTTR through enhanced monitoring, logging, and alerting.
    • Establish SRE practices, build operational dashboards, and maintain runbooks.
    • Enhance Customer experience working with CRE team members with SRE best practices.
    • Use tools like Prometheus, Grafana, Datadog, OpenTelemetry, and Elastic Stack for observability.
    • Automate health checks and incident response with Terraform, Ansible, Helm, and Kubernetes.
  2. Customer Engagement & Architecture Review
    • Analyze customer architectures and operational practices.
    • Identify themes from escalations and map them to architectural gaps or operational improvements.
    • Provide tailored recommendations and help implement improvement plans for customers’ environments.
    • Develop standard operating procedures (SOPs) for deployment, maintenance, and incident handling in customer environments.
    • Provide proactive guidance on performance tuning, disaster recovery (DR) strategies, and scaling mechanisms.
    • Establish secure connectivity and seamless integration between the hosted management plane and customer environments.
    • Lead root cause analysis (RCA) and propose long-term solutions for recurring issues.
  3. Product & Hybrid Architecture Optimization
    • Apply cloud practices (CI/CD, GitOps) to hybrid and on-prem environments.
    • Apply Cloud Best Practices (e.g., AWS Well-Architected Framework) to enhance both internal product development and customer environments.
    • Build custom plugins and automation scripts to meet customer needs and extend Flagship product capabilities.
    • Collaborate with product teams to implement metrics improvements, UI enhancements, and alerts for hosted solutions.
  4. Ownership of Hosted Operations
    • Develop and execute an operational plan for hosted environments, including monitoring, alerts, and product improvements.
    • Take ownership of getting on-prem customers to implement hosted operational improvements, ensuring alignment with hosted best practices.
  5. Collaboration and Leadership
  • Partner with developer, platform, and security teams to align operational goals with product roadmaps.
  • Mentor other engineers on cloud-native operations best practices, focusing on Zero Trust principles.
  • Drive continuous improvement through automation, Shift-Left initiatives, and SRE (Site Reliability Engineering) methodologies.

Required Skills:

  • 8+ years of experience in Cloud Operations, SRE, or DevOps roles.
  • Strong hands-on experience with Kubernetes, Istio, Envoy, Gateway, Load Balancers and hybrid architectures.
  • Hands-on experience with cloud platforms such as AWS, GCP, or Azure and knowledge of hybrid/cloud-native architectures.
  • Strong analytical and troubleshooting skills with experience in Postgres, Elastic DB, and GraphQL queries.
  • Experience building CI/CD pipelines with tools like GitHub Actions, or ArgoCD.
  • Familiarity with on-prem deployments and integration with public cloud hosted services.
  • Familiarity with LDAP, OIDC, SAML authentication and security configurations.
  • Ability to collaborate with customers and cross-functional teams to drive operational improvements.
  • Experience with CI/CD, GitOps practices, and networking concepts.
  • Prior exposure to multi-cloud deployments and hybrid architectures with VM and container-based workloads.
  • Ability to communicate complex technical concepts clearly to both technical and non-technical stakeholders.
  • Prior experience interacting directly with enterprise customers for operational troubleshooting and architecture reviews.
  • 5+ years of experience in Python, Golang (Go),
  • 3-5+ years of Bash / Shell Scripting.
  • 1-2 years of Javascript or Typescript.
  • 2-3 years of Infrastructure as Code tools like Terraform
  • Good familiarity with YAML/JSON.

If you’re looking for a job, this isn’t it. If you’re ready to be part of something larger than yourself, connect with us.

Locations: We’re a fully distributed team with a global presence in 15 countries. While this role requires North American timezone coverage, we welcome exceptional talent from anywhere. Visa sponsorship (H1B) is supported.