Senior LLMOps Engineer

  • steampunk HQ
  • McLean, VA, us
  • 5mo ago
  • Full-time
  • On-site

Overview

We are looking for an experienced Senior LLMOps Engineer to design, implement, and maintain production-grade large-language-model (LLM) pipelines, deployment architectures, and monitoring systems across enterprise environments. The Senior LLMOps Engineer will play a critical role in operationalizing generative AI capabilities, ensuring that LLM-based applications are scalable, secure, reliable, and compliant with emerging AI risk and governance frameworks. This role spans the spectrum of model deployment, orchestration, evaluation, and optimization. 

Responsibilities

 

  • Architect and maintain scalable LLM and RAG pipelines, including model hosting, inference optimization, retrieval layers, and context management frameworks. 
  • Lead the design and implementation of secure GenAI infrastructure across cloud environments, ensuring reliability, performance, and cost efficiency. 
  • Build and manage automated evaluation systems that assess LLM output quality, safety, latency, and adherence to AI governance requirements. 
  • Develop CI/CD workflows tailored for LLM- and GenAI-based applications, including dataset versioning, model lineage, and automated testing of prompt and model behaviors. 
  • Collaborate with AI Product Engineers and Data Scientists to productionize LLM-based prototypes into enterprise-grade, maintainable systems. 
  • Integrate vector databases, model gateways, content filters, and guardrail frameworks into end-to-end LLM solutions. 
  • Implement observability and monitoring solutions that track performance metrics, hallucination rates, cost profiles, and user interaction patterns. 
  • Lead troubleshooting and root-cause analysis for issues related to LLM deployment, inference performance, or pipeline reliability. 
  • Stay current with emerging LLM architectures, inference optimizations, fine-tuning techniques, and relevant MLSecOps patterns. 
  • Ensure compliance with data privacy, ethical AI, and AI-governance frameworks throughout pipeline design and operations. 
  • Mentor junior engineers and contribute to Steampunk’s AI engineering best practices, tooling, and reusable infrastructure patterns. 
  • You will contribute to the growth of our AI & Data Exploitation Practice! 

Qualifications

 

  • Ability to hold a position of public trust with the U.S. government. 
  • Bachelor’s and 8 years of experience. 
  • 5+ years of experience in software engineering, data engineering, MLOps, or cloud engineering, with 2+ years focusing specifically on LLM or GenAI operations. 
  • Strong experience deploying models using frameworks such as Hugging Face Transformers, vLLM, TensorRT-LLM, or similar. 
  • Proficiency in Python and operational tooling such as FastAPI, PyTorch, LangChain, LlamaIndex, and vector databases (FAISS, Milvus, Pinecone, or similar). 
  • Advanced knowledge of cloud platforms (AWS, Azure, GCP) including model hosting, distributed compute, and secure networking patterns. 
  • Hands-on experience building CI/CD pipelines, automated testing frameworks, and environment provisioning for AI/ML workloads. 
  • Experience with Docker, Kubernetes, and infrastructure-as-code (Terraform, CloudFormation). 
  • Familiarity with MLSecOps, AI governance, model hardening, prompt injection defenses, and content safety monitoring. 
  • Strong understanding of logging, observability, and performance profiling for high-throughput LLM inference systems. 
  • Excellent written and verbal communication skills, with the ability to explain trade-offs and architectural decisions to technical and non-technical stakeholders. 
  • Demonstrated ability to balance long-term platform thinking with hands-on operations and rapid problem solving. 
  • Experience working in agile teams and using modern project management tools.