Senior System Engineer

Clutter

Clutter

IT, Other Engineering
Bengaluru, Karnataka, India · India
Posted on Oct 7, 2024

Job Summary: As a DevOps Professional at Iron Mountain, you will play a crucial role in providing technical support for our computer applications and hardware, including PCs, servers, and mainframes. You will be responsible for answering system-related queries, collaborating with network services, software systems engineering, and application development teams to restore service and identify issues. Your role will also involve managing the observability strategy, application sustainment, and global service delivery to enhance the performance and reliability of our applications.

Key Responsibilities:

Technical Support & Troubleshooting:

  • Provide technical support for users regarding computer applications and hardware.

  • Answer questions related to system procedures, online transactions, systems status, and downtime procedures.

  • Maintain a troubleshooting tracking log to ensure timely resolution of problems.

Observability & Performance Management:

  • Manage the observability strategy with Engineering/Development and SRE teams to enhance application availability, performance, and reliability.

  • Define and manage log-based metrics, alerts, and dashboards using Datadog.

  • Support applications built with Google Cloud logging, Identity & Access Management, Cloud network, and projects.

Application Sustainment & Global Service Delivery:

  • Handle escalations from customers, Customer Care, and Global Account Management, performing triage with technical teams.

  • Manage and qualify new work orders for Account Consolidation and Single Sign-On to improve sustainment services revenue.

SRE Management:

  • Focus on exception handling and streamline log-based metrics definitions as an Application SRE Engineer.

  • Develop and manage procedures for Cloud/Web/DataCenter application systems optimization, performance improvement, and workflow design.

Infrastructure & Application Management:

  • Manage Data Center Applications built on Linux/Windows, Apache/Tomcat, and Java.

  • Support and optimize cloud-native applications in Google Cloud Platform (GCP), including automation services and instrumentation for observability.

Billing Cycles & Financial Management:

  • Support monthly/quarterly/annual billing cycles critical to the company's financial health.

Strategic Leadership:

  • Lead the technical strategy for underpinning infrastructure, alerting & monitoring, and incident resolution to meet MTTR targets.

  • Improve application reliability, performance, and availability, and be accountable for the performance and results of multiple applications.

Required Skills and Experience:

  • Education: Bachelor's Degree in Computer Science, Engineering, or related field (4 years degree).

  • Experience: Minimum 5+ years of experience as an SRE Engineer.

  • Technical Skills:

  • Expertise in Records Center Applications and Data Management Applications.

  • Strong experience with Datadog for log management and alerting.

  • Proficiency in GitLab code repository management, including roles, projects, groups, merge requests, and container registry management.

  • Experience managing Data Center Applications on Linux/Windows, Apache/Tomcat, and Java.

  • Prior experience with Google Cloud Platform (GCP) including Google Cloud Functions, Workflow, and Google Kubernetes Engine (GKE).

  • Experience in managing and supporting billing cycles and financial processes.

  • Certifications: Scrum Master/PMP Certification / Agile SAFe certification.

Preferred Skills:

  • Experience in Software Application Development using Golang, Java, or .Net.

  • Familiarity with secret management platforms like Thycotic/Delinea Secret Server, AKeyelss, and/or Thycotic DevOps Secret Vault.

  • Proficiency in GitLab Agile including epics, features, stories, product management support, program management, and reporting metrics.

  • Experience defining log-based metrics, monitors, thresholds for Error Budget, Service Level Objectives (SLO), SLI, and creating event dashboards.

Additional Requirements:

  • Strong problem-solving skills and ability to work under pressure.

  • Excellent communication and collaboration skills.

  • Ability to analyze complex data and situations to make informed decisions.

Category: Information Technology

Iron Mountain is a global leader in storage and information management services trusted by more than 225,000 organizations in 60 countries. We safeguard billions of our customers’ assets, including critical business information, highly sensitive data, and invaluable cultural and historic artifacts. Take a look at our history here .

Iron Mountain helps lower cost and risk, comply with regulations, recover from disaster, and enable digital and sustainable solutions, whether in information management, digital transformation, secure storage and destruction, data center operations, cloud services, or art storage and logistics. Please see our Values and Code of Ethics for a look at our principles and aspirations in elevating the power of our work together.

Requisition: J0079953