ClearAI is seeking a Senior Machine Learning Engineer with deep experience in document AI, OCR, LLMs, RAG pipelines, and model experimentation. You will own the core extraction and classification engines powering the entire platform.
This role is critical: you will build the intelligence that reads invoices, packing lists, BOLs, COO, airway bills, and compliance documents — and converts them into structured data for the automation engine.
Responsibilities
Document Processing & OCR
- Train, fine-tune, and evaluate OCR/document models (e.g., Donut, LayoutLMv3, TrOCR)
- Build extraction pipelines for invoices, PL, BOL, COO, Fumigation certificates
- Develop pattern recognition
LLM & AI Engineering
- Design and implement LLM-based extraction + validation flows
- Build RAG pipelines for customs compliance rules
- Develop reasoning frameworks for HS classification
- Integrate LLMs with backend APIs
- Create evaluation benchmarks and metrics
Data & Quality
- Manage and supervise the junior ML engineer (Riwano)
- Define annotation standards
- Identify edge cases, improve data coverage, reduce extraction errors
- Build large training datasets (synthetic + real)
Model Deployment
- Work with backend and DevOps to deploy models in production
- Optimize inference speed and accuracy
- Ensure models operate reliably at scale
Skills
Must-Have
- Strong Python
- Experience with PyTorch or TensorFlow
- Deep knowledge of modern Document AI models
- Experience with OCR pipelines (Tesseract, TrOCR, PaddleOCR, LayoutLM, Donut)
- Experience with LLMs, LangChain, prompt engineering
- Experience with building custom NLP/vision pipelines
- Strong understanding of supervised learning, evaluation, and fine-tuning
- Experience deploying ML models in production
Nice-to-Have
- Experience with customs documents (big bonus)
- Vector DBs (Pinecone, Weaviate, Chroma)
- Cloud ML infrastructure
- Experience with anomaly detection (used for undervaluation/fraud)
- Experience working in high-compliance industries