Job description
About the Opportunity We are hiring on behalf of a major government transformation initiative in Abu Dhabi that is building one of the world’s most ambitious applied AI programs.
This organisation is developing the infrastructure and platforms that enable AI-powered public services at scale.
The work spans sovereign model serving, vector infrastructure, retrieval systems, developer tooling, and deployment platforms that support critical government applications.
This is a rare opportunity to build foundational AI infrastructure with significant real-world impact, in an environment where performance, reliability, and security are mission-critical.
Role Overview We are seeking a Staff AI Platform Engineer to own the platform infrastructure that enables engineering teams to build and operate AI systems efficiently and reliably.
This is a senior individual contributor role focused on model serving, vector infrastructure, data pipelines, observability, and deployment platforms.
Your success will be measured by the productivity and reliability gains you enable for other engineers.
You will combine deep platform engineering expertise with a strong understanding of AI workload requirements, including inference performance, retrieval systems, and operational complexity.
At the staff level, you will: Define and evolve the technical foundations of the AI platform Build systems that accelerate engineering teams across the organisation Set standards for reliability, security, and scalability Lead architecture decisions and evaluate emerging technologies Use AI coding tools such as Codex, Claude Code, or similar as part of your daily workflow Key ResponsibilitiesAI Platform Infrastructure Design and operate model serving infrastructure using vLLM, TGI, TensorRT-LLM, or similar technologies Build and manage vector infrastructure, embedding pipelines, and retrieval systems Develop data pipelines for document ingestion, transformation, and storage Own deployment platforms, CI/CD pipelines, and infrastructure-as-code Observability & Reliability Build observability across the AI stack, including latency, throughput, and model behavior monitoring Define SLOs, perform capacity planning, and lead reliability engineering initiatives Lead incident response, root-cause analysis, and postmortems Establish platform standards for operational excellence Force Multiplication Build reusable internal tooling, SDKs, and deployment patterns Partner with engineering teams to translate infrastructure needs into scalable platform capabilities Mentor engineers and raise technical standards across the organisation Basic Qualifications 10+ years of platform, infrastructure, or backend engineering experience Proven experience operating at staff or principal engineer level Deep expertise in Azure (AWS or GCP also valued) Strong Kubernetes and Docker experience Hands-on experience with Terraform, CI/CD pipelines, and infrastructure-as-code Experience building production data pipelines Proficiency in Python, Java/Kotlin, or Go Strong PostgreSQL knowledge at production scale Experience with distributed tracing, metrics, logging, and alerting Practical use of AI coding assistants such as Codex or Claude Code Excellent written and verbal communication skills Preferred Qualifications Experience building RAG systems and retrieval infrastructure Experience with LangGraph, LangChain, Semantic Kernel, or similar frameworks Hands-on experience with GPU-based inference infrastructure Experience with vector databases such as pgvector, Qdrant, Pinecone, or Weaviate Familiarity with LLM inference optimization and evaluation frameworks Experience with speech and conversational AI systems Experience building internal developer platforms Security and compliance experience in highly regulated environments Technology Stack Backend & Platform: Python, Java, Kotlin, Go, REST, gRPC AI Infrastructure: vLLM, TGI, TensorRT-LLM, GPU Serving AI & LLM: LangChain, LangGraph, Microsoft Agent Framework, vector databases Data: PostgreSQL, Redis, DocumentDB, Azure Blob Storage Infrastructure: Azure, Docker, Kubernetes, Terraform, CI/CD Observability: Langfuse, Grafana, Prometheus, OpenTelemetry Security: Entra ID, RBAC, Secrets Management, Zero Trust AI Development: Codex, Claude Code, OpenCode What We’re Looking For You are an engineer who: Thinks in systems and enables others to move faster Designs infrastructure that is reliable, scalable, and easy to operate Grounds technical decisions in evidence and operational data Raises engineering standards through architecture and mentorship Communicates clearly and transparently Embraces AI-native development practices Why Apply?
This role offers the opportunity to shape the technical foundations of a world-class AI platform supporting transformative public services in Abu Dhabi.
You will tackle complex infrastructure challenges at the intersection of AI, cloud computing, and platform engineering while delivering meaningful impact at a national scale.