Lead Applied Scientist (AI/ML) • Production AI & LLM Systems Engineer

Building production-grade AI and LLM Systems that scale from research to real-world impact.

Hi, I am Anmol Gautam a Lead Applied Scientist specializing in large scale GenAI systems. I build, optimize, and deploy LLM pipelines including fine-tuning, pruning, quantization, and GPU backed inference for enterprise Text to SQL, RAG, and agentic platforms.

Open-source LLM training & alignment
Production Text-to-SQL & RAG
GPU inference with vLLM & SGLang
Multi-agent platforms in production

Discuss a project

projects.py

Enterprise AI Systems

Find me on

Technologies I work with

Agentic AI

Multi-Agent Systems

RAG Systems

Text-to-SQL

Fine-Tuning (SFT, DPO, GRPO)

LoRA / PEFT

Model Pruning & Quantization

LLM Evaluation

Large Language Models (LLMs)

PyTorch

Hugging Face Transformers

NLP

Computer Vision

vLLM

SGLang

GPU Inference

Python

FastAPI

Node.js

REST APIs

SSE

WebSockets

PostgreSQL

MongoDB

Trino

ChromaDB

Milvus

pgvector

Docker

AWS

Azure

MLflow

Agentic AI

Multi-Agent Systems

RAG Systems

Text-to-SQL

Fine-Tuning (SFT, DPO, GRPO)

LoRA / PEFT

Model Pruning & Quantization

LLM Evaluation

Large Language Models (LLMs)

PyTorch

Hugging Face Transformers

NLP

Computer Vision

vLLM

SGLang

GPU Inference

Python

FastAPI

Node.js

REST APIs

SSE

WebSockets

PostgreSQL

MongoDB

Trino

ChromaDB

Milvus

pgvector

Docker

AWS

Azure

MLflow

Scroll

About Me

Building the future,from applied research to production AI systems.

I am a Lead Applied Scientist with 4+ years of experience architecting and shipping production-grade AI systems - from multi-agent platforms and Text-to-SQL engines to enterprise RAG and real-time multimodal solutions.

My work sits at the intersection of applied research, large-scale system design, and real-world deployment. I have built and owned enterprise GenAI platforms including Text-to-SQL, RAG pipelines, GPU-backed inference systems, and multi-agent architectures running in production.

Across roles at 8bit.ai, SuperAGI, Oracle, and NVIDIA, I have taken systems end-to-end: from research and experimentation, through model fine-tuning and alignment (SFT, DPO, GRPO), to deployment on scalable infrastructure with inference optimization using vLLM and SGLang.

Alongside industry work, I have 6 published papers across IEEE, Springer, and arXiv spanning LLMs, multimodal learning, and computer vision. My master’s research at NIT Meghalaya, completed in collaboration with NVIDIA, received the Institute Best Thesis Award and achieved state-of-the-art results.

🎓

M.Tech CSE - NIT Meghalaya | Gold Medalist | 10.0 CGPA

Institute Best Master’s Thesis Award - Region of Interest Segmentation in Biomedical Images

“I care deeply about systems that actually ship balancing performance, reliability, and cost and about building AI products that move beyond demos into real impact.”

Production LLM Systems

Designing, optimizing, and deploying large-scale LLM pipelines with GPU-backed inference, pruning, and quantization.

Enterprise RAG & Text-to-SQL

Building reliable retrieval and structured-query systems used in real enterprise workflows.

Agentic Architectures

Developing multi-agent platforms with tool orchestration, planning, and execution in production environments.

Applied Research → Production

Translating research ideas into shipped systems, backed by evaluation, experimentation, and peer-reviewed work.

Career Journey

Experience that speaks volumes.

A timeline of my professional growth, from curious beginner to Lead Applied Scientist leading teams and building products at scale.

Oct 2024 - Present

Lead Applied Scientist (AI/ML)

8bit.ai

Current

Architected Neutrino, a multi-agent AI platform powering enterprise search, Text-to-SQL, and workflow automation with human-in-the-loop execution and multi-LLM orchestration, deployed across 5 major ISV partners.
Fine-tuned domain-specific LLMs using LoRA, DoRA, PEFT, and alignment techniques (DPO, GRPO).
Built a multi-schema Text-to-SQL engine using agentic ReAct workflows across PostgreSQL and Trino.
Designed end-to-end enterprise search spanning RAG ingestion, hybrid retrieval, knowledge-graph augmentation, PII tagging, and multi-tenant data discovery.
Optimized inference latency and cost using vLLM and SGLang, and delivered real-time multimodal solutions including voice and sign-language AI.

Multi-Agent SystemsText-to-SQLRAGvLLMSGLangFastAPISSELoRADPOGRPOPostgreSQLTrinoAzure

Nov 2023 - Oct 2024

Applied Scientist (AI/ML)

SuperAGI

Built Text-to-SQL and RAG-based conversational multi-agent systems for SuperSales.
Developed SuperCoder 2.0, an autonomous code navigation and issue-resolution system achieving 33% on SWE-Bench-Lite.
Architected a fully autonomous multi-agent platform using open-source and closed-source LLMs with ReAct-style execution and Planner-Orchestration patterns, taking projects from PoC to AWS production.
Developed SAM-7B, an instruction-tuned Mistral-7B model achieving GPT-3.5-comparable performance and outperforming Orca on GSM8K and ARC.

Multi-Agent SystemsRAGText-to-SQLOpen-Source LLMsSFTDPOLoRA / PEFTAWS

Aug 2022 - Oct 2023

Associate Consultant

Oracle

Built document AI and information extraction pipelines using OCI Document Understanding and EasyOCR, improving NER and key-value extraction by 7%.
Developed RAG-based question answering and computer vision systems using Falcon, Llama, ChromaDB, TensorFlow, and transfer learning.
Built a face recognition pipeline using transfer learning that improved performance by 37%.

RAGOCRNERFalconLLaMAChromaDBTensorFlowOCI

May 2021 - Apr 2022

Research Intern

NVIDIA

Worked on NLP and computer vision systems using NVIDIA NeMo, HuggingFace, and transfer learning.
Built English-to-Hindi machine translation, object detection, and image segmentation pipelines.

NVIDIA NeMoMachine TranslationComputer VisionTransformersPyTorch

Research

Research that informsreal-world AI systems.

Alongside building production AI platforms, I actively publish and experiment on problems across LLMs, multimodal learning, and computer vision. My research focuses on ideas that translate into practical, deployable systems.

arXiv • 2024

SuperCoder 2.0: Exploring the Feasibility of LLMs as Autonomous Programmers

A technical study on multi-agent autonomous code generation and navigation systems, evaluated on SWE-Bench-Lite. Explores how LLM-driven agents can reason, plan, and execute code-level tasks.

arXiv • 2024

Veagle: Advancements in Multimodal Representation Learning

Research on multimodal representation learning, focusing on improved alignment across vision and language modalities.

IEEE • 2022

SAU-NET: Scale-Aware Polyp Segmentation using Encoder–Decoder Networks

Introduces a scale-aware segmentation architecture for biomedical imaging, achieving state-of-the-art performance and outperforming UNet and DeepLab variants.

Springer • 2022

ED-NET: Educational Teaching Video Classification Network

Proposes a deep learning architecture for classifying educational videos, combining CNN and RNN-based temporal modeling.

IEEE • 2022

Batch Image Encryption and Compression using Chaotic Map Infused Autoencoder Network

A novel approach combining chaotic map-based encryption with autoencoder networks for secure batch image encryption and compression.

IEEE • 2022

Li-SegPNet: Encoder-Decoder Mode Lightweight Segmentation Network for Colorectal Polyps Analysis

A lightweight encoder-decoder segmentation network designed for efficient colorectal polyp analysis in biomedical imaging.

Selected Systems & Research

A curated selection of applied research and production AI systems spanning autonomous agents, open-source LLMs, enterprise RAG, and large-scale inference infrastructure.

Dendrux : Open-Source Runtime for Real-World Agents

Dendrux is an open-source runtime built for real-world agents that need to do more than chat. It provides tool calling, persistence, observability, and a client-tool bridge that lets agents pause mid-execution, hand off to a human or external system, and resume exactly where they left off.

• Tool calling with pause/resume execution
• Built-in persistence & observability
• FastAPI/SSE hosting out of the box
• Client-tool bridge for human-in-the-loop

Open SourceAgentic AIFastAPITool CallingSSE

GitHub

SuperCoder 2.0 : Autonomous Multi-Agent Programming System

An autonomous multi-agent coding system that reasons, plans, navigates repositories, and executes code changes. SuperCoder 2.0 achieved 33% on SWE-Bench-Lite, outperforming several open coding agents and approaching proprietary systems.

• Multi-agent planning & execution loops
• Repository-level code navigation
• Custom RAG pipelines for code context
• Evaluated on SWE-Bench-Lite

Multi-Agent SystemsAutonomous CodingRAGLLM Evaluation

Research Paper (arXiv)Technical Blog

SAM-7B : Small Agentic Model (Open-Source LLM)

SAM is an open-source 7B parameter agentic language model fine-tuned for reasoning and task execution using explanation-trace supervision. Designed for efficiency, controllability, and strong reasoning performance in agent workflows.

• LoRA-based fine-tuning & PEFT
• Explanation-trace dataset construction
• Agent-centric instruction tuning
• 3k+ HuggingFace downloads

Open-Source LLMsLoRA / PEFTAgentic ModelsAlignment

HuggingFace Model Dataset Launch Blog

Other Production Systems

Enterprise Text-to-SQL & RAG Platform

A production conversational AI system translating natural-language queries into SQL and retrieval workflows over structured and unstructured enterprise data.

• Schema-aware query generation
• Latency vs correctness tradeoffs
• Evaluation & monitoring in production

LLM Inference & Optimization Platform

GPU-backed inference infrastructure built using vLLM and SGLang, with autoscaling and optimization pipelines for cost-efficient deployment.

• Quantization & pruning pipelines
• High-throughput GPU inference
• Benchmarking & cost optimization

Contact

Let’s connect and talk systems.

I’m always open to discussing AI systems, research, collaboration, and interesting opportunities.

anmolgautam2428@gmail.com

Location

Bengaluru, India

Open to collaboration & opportunities