Observability/Monitoring/Tools Engineer

Clutter

Clutter

Bengaluru, Karnataka, India · India
Posted on Friday, August 30, 2024

Core experience/responsibilities

●MonitoringPlatformEngineering

○ 5+ years of experience with platforms such as SolarWinds,

Datadog, HP Openview, BMC, etc.

○ 5+ years of experience in network, application performance, and

syntheticmonitoring.

○ Expertise in configuring alerts, creating dashboards, and

conducting data trend analysis.

○ Experience in automating the detection of missing assets and

configuring them into themonitoringecosystem via REST

API/scripting.

○ Proficiency inmonitoringvarious end devices including routers,

switches, firewalls, storage, virtual, Windows servers, Linux

servers, and UNIX servers.

○ 5+ years of experience automating infrastructure operations

usingtoolslike Ansible and Python for event correlation.

○ Expertise in integratingmonitoringdata with other platforms such

as CMDB/ServiceNow.

○ Experience configuring monitors using SNMP, SSH, WinRM, WMI,

JMX, etc.

○ Ability to design and implement highly available continuous

monitoringplatforms for 24x7 operations.

● Technical Solutions and Collaboration

○ Recommend baselinemonitoringthresholds, KPIs, and SLAs.

○ Provide solutions to complex problems and drive process

improvements.

○ Experience with both on-premise and cloud environments.

○ Expertise in advanced troubleshooting and root cause analysis.

○ Proficiency with platforms like ServiceNow, Remedy, or Assyst.

○ Identify automation opportunities and implement proactive

monitoringsolutions.

○Workeffectively with Enterprise Architects, OSengineers, and

operations support teams to provide training, develop guidelines,

and serve as a subject matter expert.

● Design and Implementation

○ Participate in technical design discussions, considering trade-offs

to support business value, scalability, and delivery timelines.

○ Ensure adherence to architectural governance and security

standards.

○ Contribute to the design and architecture of high-performance,

scalable systems, ensuring they meet business requirements and

are cost-effective.

○ Integrate security best practices into the design and

implementation of systems, ensuring robust protection against

threats.

● Process/Operational Experience

○ Plan and execute system and software installations, upgrades,

and changes across the organization.

○ Understand various methodologies such as Agile, Scrum, and

manage project objectives, delivery approaches, and plans.

○ Identify and mitigate risks throughout projects and tasks,

addressing major design flaws.

○ Experience gathering and organizing large amounts of data for

instrumentation into an enterprisemonitoringsolution.

○ Share knowledge ofmonitoringbest practices with system

owners and administrators to enhance overallmonitoringand

alerting posture.

Operational requirements

● Available for on-call support outside of normal business hours to

address critical issues.

● Strong communication skills to relate technical details to non-

technical leaders and users.

● Promote a positive working environment, encourage teamwork, and

mentor rising talent.

● Excellent time management and organizational skills, with experience

establishing guidelines for others.

● Ability to notice differences and issues as they arise and escalate

them to management.

● Facilitate discussions and explore alternative approaches to resolve

conflicts.

● Take personal accountability for decision-making and collaborating

with cross-functional teams.

Nice to Have

● Working expertise in infrastructure/application log aggregation

ingested into a security

● Experience with log aggregationtoolssuch as ELK, Logstash, Kibana,

Splunk, or QRadar.

● Proficiency in Ansible and Python, with the ability to create complex

SQL queries for reporting and correlation.

Category: Information Technology

Iron Mountain is a global leader in storage and information management services trusted by more than 225,000 organizations in 60 countries. We safeguard billions of our customers’ assets, including critical business information, highly sensitive data, and invaluable cultural and historic artifacts. Take a look at our history here .

Iron Mountain helps lower cost and risk, comply with regulations, recover from disaster, and enable digital and sustainable solutions, whether in information management, digital transformation, secure storage and destruction, data center operations, cloud services, or art storage and logistics. Please see our Values and Code of Ethics for a look at our principles and aspirations in elevating the power of our work together.

Requisition: J0077275