Halian -
الإمارات , ابو ظبي
--
Halian

تفاصيل الوظيفة

System Observability Engineer

A technology-driven enterprise organization is seeking a System Observability Engineer to implement and manage comprehensive observability solutions across its platforms. The successful candidate will build and maintain monitoring, logging, and alerting frameworks that provide full visibility into system health, enabling proactive issue detection and informed operational decision-making.

Responsibilities:

  • Design, implement, and manage observability platforms covering metrics, logs, and distributed tracing.
  • Deploy and configure monitoring and alerting tools such as Grafana, Prometheus, Datadog, ELK Stack, or Dynatrace.
  • Define and implement SLIs, SLOs, and error budgets aligned to service reliability requirements.
  • Build dashboards and visualizations for operational, performance, and business-level metrics.
  • Tune alerting thresholds to reduce noise and ensure all alerts are actionable and meaningful.
  • Collaborate with DevOps, cloud, and application teams to instrument services and workloads for observability.
  • Support root-cause analysis and performance investigations using observability data and tooling.
  • Maintain and evolve the observability strategy as infrastructure and application landscapes grow.
  • Develop runbooks and documentation for observability tooling, monitoring standards, and on-call procedures.



Qualifications and Skills:

  • 4+ years of experience in systems monitoring, observability, or Site Reliability Engineering (SRE) roles.
  • Hands-on experience with observability tools such as Grafana, Prometheus, Datadog, Dynatrace, or ELK Stack.
  • Understanding of distributed tracing concepts and tools such as Jaeger, Zipkin, or OpenTelemetry.
  • Experience instrumenting applications and infrastructure components for monitoring and alerting.
  • Scripting ability in Python, Bash, or similar for automation and alerting customization.
  • Knowledge of cloud-native monitoring services including CloudWatch, Azure Monitor, or GCP Operations Suite.
  • Datadog, Dynatrace certification, or familiarity with SRE practices and reliability engineering principles is advantageous.



Halian Group:

With over 28 years of experience, we have come to understand that innovation is the only way to provide agile, practical solutions that transform businesses and careers. Our resourcing and smart services help you to realize tomorrow's potential. Discover the amazing things possible when you bring the right people and the right technologies together.

At Halian, we recognize that diversity, equity, and inclusion (DEI) are essential to building high-performing teams for our clients. We are committed to connecting organizations with top talent from all backgrounds, ensuring that every individual feels valued, respected, and empowered to contribute their unique perspectives. We encourage applications from all qualified candidates, regardless of race, gender, disability, or any other characteristic that makes them unique. By fostering diverse and inclusive workplaces, we help our clients drive innovation, enhance collaboration, and better reflect the communities they serve.

#LI-CC1


Similar Jobs

حول Halian
الإمارات, ابو ظبي