Global Service Reliability and Operations Manager
Clutter
This job is no longer accepting applications
See open jobs at Clutter.See open jobs similar to "Global Service Reliability and Operations Manager" 8VC.ABOUT IRON MOUNTAIN
At Iron Mountain we protect what our customers value most, from the everyday to the extraordinary. We build customer value around the world with a passion for preserving the physical, transforming the digital, and respecting the environmental. We pioneered the industry for global records and information management and have established some of the best customer relationships in the industry with 95% of the Fortune 1000 companies among our 225,000 loyal customers. Here, you?ll bring your expertise and creativity to a workplace that thrives on continuous improvement. Here, you?ll be part of a global workforce that embraces the differences among us. And here, we?ll encourage you to Climb Higher for the benefit of our customers and each other.
Qualifications
Minimum 10 years Application Service Delivery Mgmt and/or SRE Mgmt experience
Application Service Reliability Manager is the SRE Manager (Site Reliability Engineering) with a deep Service Delivery & Engineering mindset. This role demands leadership experience in supporting and managing the Cloud native Applications in Google Cloud Platform (GCP) and Data Center Applications running on Windows and Linux systems.
Contributing to the Application Reliability strategy for Iron Mountain Warehouse Applications in Cloud and Data Center.
Building the Instrumentation strategy and Automation capabilities for Iron Mountain's Observability & Reliability Programs.
Establishing the process governance for the security features as part of the Observability.
Provide Disaster Recovery Systems Planning and Management for critical warehouse systems and end users of IT applications and hardware (e.g. Cloud resources, Servers, Operating Systems).
Collaborates with network services, software systems engineering and/or application development in order to restore service and/or identify problems. Maintains a troubleshooting tracking log ensuring timely resolution of problems. Answers questions regarding system procedures, online transactions, systems status and downtime procedures.
Experience as SRE Manager in defining the Executive Dashboards using the Log based metrics, monitors, thresholds for defining Error Budget, Service Level Objectives (SLO), SLI and creating event dashboards for cloud native Iron Mountain.
Manage pre-built procedures for Infrastructure optimization, as performance improvement, and workflow design (or redesign).
Improving the reliability, performance and availability of the applications. Responsible for leading the technical deployment process strategy for our underpinning infrastructure, alerting & monitoring and incident resolution to optimize the MTTR targets.
Adopt the Teaming and Collaboration with SRE ‘no blame’ culture.
Provides leadership to managers and professional staff.
Accountable for the performance and results of multiple applications, managers,related teams or a large team.
Billing support for Monthly, Quarterly and Annual Revenue recognition programs.
Interfaces with Global Account Management Team and Customer Care Teams
Works on escalated triage situation and high priority customer facing issues where analysis of situations or data requires an in-depth knowledge of organizational objectives and processes.
Experience in managing DataDog configuration, access roles, integration with ServiceNow and DataDog Log ingestion management.
Experience in managing applications built on Google Cloud Functions, Workflow, Google Kubernetes Engine (GKE).
Experience in managing the delivery of the automated build, containerization, cloud repository management of the build image artifacts, branching strategy, automated testing and config delivery management for cloud native apps and hybrid applications
Required skills and Experience:
SDLC Management focused on Application Sustainment Management, Release Management & Global Operations per ITIL standards and Agile/SAFe methodology.
Managing the IT Strategy and Organizational Directives to improve the efficiencies and customer experience in all regions across the globe.
Managing the Observability & the Reliability of Cloud native Apps/Data Center Apps sitting with Scrum teams, developers and SRE teams to implement the service observability as an SRE. An opportunity to be part of a great culture and challenging work environment.
Managing the Implementation and Support of Applications built with Google Cloud Functions, Google workflow, Google Firebase and leveraging the GCP framework for log management (cloud logging), Identity & Access mgmt, Cloud network and Project).
As Application SRE Manager, build and support the Datadog Log mgmt and Alert mgmt features to define Log based metrics, alerts and dashboards.
Understanding of the Gitlab CI/CD - code repository mgmt, Roles, Projects, Groups, merge request, container registry management, AutoDevOps, reporting DevOps metrics, analytics
Managing the Data Center Applications built on Linux/Windows, Apache/Tomcat & Java
Bachelors of Science in Computer Science & Engineering (4 years degree) and 10+ years working experience.
Scrum Master/PMP Certification / Agile SAFe certification
Preferred skills:
Gitlab Agile - epics, features, stories, product management support, program management, boards, reporting metrics, KPI.
Google kubernetes engine (GKE) cluster design, implementation and application deployment, Prometheus, Graffana, Jaeger tracing.
Secret management platform Thycotic/Delinea Secret Server, A Keyelss and/or Thycotic DevOps Secret vault
Past experience in Software AppDev in Golang and/or Java
Gitlab Security - built in scanner for SAST, DAST, Container Scan, Dependency Scan, License Scan, etc
Category: Information Technology
Iron Mountain is a global leader in storage and information management services trusted by more than 225,000 organizations in 60 countries. We safeguard billions of our customers’ assets, including critical business information, highly sensitive data, and invaluable cultural and historic artifacts. Take a look at our history here .
Iron Mountain helps lower cost and risk, comply with regulations, recover from disaster, and enable digital and sustainable solutions, whether in information management, digital transformation, secure storage and destruction, data center operations, cloud services, or art storage and logistics. Please see our Values and Code of Ethics for a look at our principles and aspirations in elevating the power of our work together.
Requisition: J0076164
This job is no longer accepting applications
See open jobs at Clutter.See open jobs similar to "Global Service Reliability and Operations Manager" 8VC.