IIT HYDERABAD '22 BENGALURU · INDIA 4+ YEARS SHIPPING

SANSKAR

NANEGAONKAR AI ENGINEER

I build production LLM, agentic, multimodal, and clinical ML systems: enterprise RAG for NTT DATA, voice agents across 1,000+ calls, and the clinical model behind an FDA Breakthrough Device application.

PyTorch · LLM Agents · RAG · Evals · Fine-tuning · Multimodal

01CURRENTS

ABOUT

I work across the whole AI-engineering stack, turning frontier models into systems that survive contact with real users, real data, and real regulation.

Four-plus years shipping LLM, agentic, multimodal, and clinical ML in production: multi-tenant RAG for enterprise customers, agentic pipelines that convert messy human activity into structured output, and voice agents that have handled a thousand-plus conversations. I'm comfortable from data and training through evals, post-training, and deployment.

Before the LLM work I built clinical machine-learning systems: a multimodal depression-severity model that became the basis for an FDA Breakthrough Device application with Keio, with clinical validation across Northwell Health and Hamamatsu. I like problems where being right actually matters.

The interesting part isn't getting a model to answer. It's getting a system to be trustworthy across a thousand answers.

02TRAJECTORY

EXPERIENCE

Four years at I'mbesideyou, from clinical ML into applied LLM systems.

  1. JAN 2025 – PRESENT TOKYO · REMOTE

    Senior AI Engineer, Agentic AI & Production Systems

    I'mbesideyou Inc.

    • Architected and shipped a multi-tenant HR-policy RAG assistant for NTT DATA and two enterprise customers over 1,000+ policies: FastAPI + Qdrant per-tenant retrieval, hybrid search with cross-encoder reranking, parent-document retrieval, OpenAI GPT synthesis, policy-conflict detection, and Langfuse audit traces.
    • Owned multimodal SOP-generation & gap-analysis modules turning desktop activity into procedures: a multi-provider pipeline (Gemini vision, GPT synthesis, Claude critic), LangGraph orchestration, Pydantic structured outputs, and write-review-rewrite loops; deployed across 3+ organizations with 100+ operators.
    • Built a multi-agent auto-prompt optimization system that collapsed a week of transcript review per agent into a few automated hours: planner / extractor / synthesizer chains with regression scoring against correctness thresholds.
    • Shipped voice AI agents on Vapi & LiveKit matching human pickup and conversion across 1,000+ calls.
    • Replaced spot-check QSC audits with continuous automated VLM compliance coverage piloted across 2 restaurant locations, plus a VLM-assisted auto-labeling pipeline that bootstrapped a proprietary dataset where no open-source data or model existed.
  2. AUG 2022 – DEC 2024 TOKYO · REMOTE

    Data Scientist, Clinical AI & Multimodal ML

    I'mbesideyou Inc.

    • Built a multimodal depression-severity model in PyTorch using DeepFace facial-emotion features and librosa speech-prosody features (early fusion): 83% accuracy, extended to bipolar-vs-unipolar at 78% across 350+ patients.
    • Productionized the clinical ML pipeline on AWS; collaborated with Keio on an FDA Breakthrough Device Designation application and Pre-Sub filings, and validated depression-severity predictions with Northwell Health and Hamamatsu.
    • Built an LLM system converting therapy transcripts into Pydantic-validated SOAP notes; a clinician style-adaptation loop distilled edits into reusable per-clinician profiles and cut documentation time 70%.
  3. MAY 2021 – JUL 2022 INDIA

    Data Science Intern

    I'mbesideyou Inc.

    • Built video-analytics pipelines across classroom sessions, live-streaming, and sales-call recordings (Python, OpenCV, scikit-learn), distinguishing high from low performers across 10,000+ recordings and generating structured coaching recommendations.
03SELECTED

PROJECTS

Things I've built to make AI systems measurable, aligned, and reproducible.

  • P-01

    LLM Evaluation Harness

    A pytest-style eval harness for LLM services: golden-set fixtures with versioned prompts, LLM-as-judge with confidence-aware aggregation, per-test cost/latency budgets, and a GitHub Actions gate that blocks regressions. Plugs into Langfuse traces.

    • PyTest
    • Pydantic
    • Langfuse
    • LLM-as-judge
    • GitHub Actions
  • P-02

    Preference Optimization & Alignment Stack

    Fine-tuned Llama 3 8B for medical Q&A (MedQA / PubMedQA): curated instruction and preference data, LoRA / QLoRA domain SFT followed by DPO preference optimization, evaluated against the base model and a frontier API on task success, helpfulness, latency & cost before quantized vLLM serving.

    • PyTorch
    • TRL
    • LoRA / QLoRA
    • DPO
    • vLLM
04TOOLBOX

SKILLS

The stack I reach for, end to end.

LLM APPS
  • RAG
  • Agents · ReAct · MCP
  • LangGraph
  • Pydantic outputs
  • Claude / OpenAI / Gemini
  • Voice · Vapi · LiveKit
EVALS & OBSERVABILITY
  • LLM-as-judge
  • RAGAS
  • Golden sets
  • Regression scoring
  • Langfuse
  • LangSmith
AI / ML
  • PyTorch
  • Transformers · PEFT
  • LoRA / QLoRA
  • vLLM
  • GPTQ / AWQ
  • Multimodal fusion · VLMs
POST-TRAINING / ALIGNMENT
  • RLHF
  • Reward modeling
  • DPO
  • PPO
  • Preference datasets
  • Fine-tuning
BACKEND / INFRA
  • Python
  • FastAPI
  • PostgreSQL
  • Qdrant · FAISS
  • Docker · Kubernetes
  • AWS · HIPAA
EDUCATION

Indian Institute of Technology, Hyderabad

B.Tech, Engineering Physics · CGPA 8.48 / 10 · 2018–2022

PUBLICATIONS

Nanegaonkar & Ando. A Practical Approach to Predicting Depression: Verbal and Non-Verbal Insights with Machine Learning. HIIJ 13(2), May 2024.

Co-author, Chatbot Interventions for Early-Stage Depression and Anxiety. EHPS Conference, Aug 2025.

05TRANSMIT

CONTACT

Based in India, open to US relocation (transferable H1B). Open to hard problems in applied AI: LLM systems, agents, evals, multimodal. Always happy to talk shop.