SRE - Site Reliability Engineer Opening For Abu Dhabi - Happiestminds

الإمارات

Warm Greetings from Happiestminds Technologies,


Please find below JD and kindly go-through company profile link in Signature and do apply if you are interested


Site Reliability Engineer (SRE)

From designing fault-tolerant architectures to leading incident responses, youll have the freedom to shape how we deliver stable, secure, and high-performance banking services.

About the Role

Were looking for a talented Site Reliability Engineer (SRE) to keep our systems running smoothly, reliably, and at scale. Through smart automation, deep observability, and a calm head in a crisis, youll help us balance speedcompliance, and stability, working alongside DevOpsCloudQuality Engineering, and Product teams to drive continuous improvements in performancesecurity, and resilience.

Youll play a key role in enhancing reliability, accelerating delivery, and ensuring seamless digital experiences for ADCB customers.

This role reports directly to our Lead SRE / Tribe Executive Manager.

What You Will Be Doing

  • Define and implement SLIs / SLOs and error budgets for business-critical digital banking services.
  • Build actionable observability (metrics, logs, traces, dashboards, and alerts) using DynatracePrometheusGrafana, and ELK, while reducing alert fatigue.
  • Leverage AI-driven insights and anomaly detection (Dynatrace Davis AI or equivalent AIOps platform) to proactively predict and resolve reliability issues before impact.
  • Lead incident management from on-call triage and root-cause analysis to blameless postmortems with actionable follow-ups.
  • Improve deployment safety with robust rollout / rollback strategiescanary and blue-green deployments, and production readiness reviews.
  • Support and optimize microservices-based architectures, ensuring service reliabilityscalability, and inter-service resilience.
  • Conduct capacity planningperformance tuning, and resilience testing, optimizing for both reliability and cost efficiency.
  • Automate operational toil — from runbooks and remediation scripts to proactive health checks and self-healing workflows.
  • Collaborate with DevOps to embed reliability gates and validations into CI / CD pipelines (GitHub ActionsJenkinsGitLab CI / CD or Azure DevOps).
  • Own and evolve the observability and AIOps stack, driving intelligent automation and predictive alerting capabilities.
  • Maintain high-quality documentationplaybooks, and operational standards across environments.
  • Ensure operational compliance and security alignment with internal controls and regulatory standards.
  • Analyze system performanceavailability, and cost data to continually optimize operations.
  • Provide reliability support and escalation guidance for critical production systems during major incidents.


Skills

Experience and Qualifications

  • 5+ years of experience in SRE or DevOps roles, building and managing large-scale, high-availability systems across bankingfinteche-commerce, or other data-intensive digital ecosystems.
  • Bachelor’s degree in Computer Science or equivalent technical experience.
  • Strong experience with Linux environments and performance troubleshooting.
  • Proven expertise in Terraform and Infrastructure as Code (IaC) methodologies.
  • Proficiency with Kubernetes and container orchestration in microservices environments.
  • Hands-on experience with AWS (preferred); exposure to Azure or GCP is an advantage.
  • Deep knowledge of Dynatrace (AIOps, Davis AI)PrometheusGrafana, and the ELK stack.
  • Experience implementing AI / ML-driven reliability or automation solutions (AIOps, anomaly detection, predictive alerting).
  • Practical understanding of CI / CD pipelines (GitHub ActionsJenkinsGitLab CI / CD or Azure DevOps).
  • Experience with KafkaRabbitMQRedisAurora, and RDS databases.
  • Strong scripting or programming skills in PythonBash, or GoThe Ideal Candidate
  • Organized, structured, and meticulous in approach.
  • Experienced in cross-functional collaboration and working with distributed teams.
  • Strong analytical mindset with excellent troubleshooting skills for complex production systems.
  • Calm and composed communicator under pressure, capable of leading during high-impact incidents.
  • Proactive problem-solver who anticipates issues and drives preventive improvements.
  • Passionate about AI-driven automationobservability, and reliability engineering.
  • Continuously learning, keeping up-to-date with cloud-nativemicroservices, and SRE best practices.
  • collaborative and adaptable team player who thrives in a fast-paced, regulated environment and is passionate about building reliable, scalable systems that empower digital banking innovation.


Warm Regards

Devipriya Gunasekaran

Talent Acquisition Team

Bangalore

Happiest Minds Technologies 


تاريخ النشر: اليوم
الناشر: Bayt
تاريخ النشر: اليوم
الناشر: Bayt