Sanskar Nanegaonkar · AI Engineer | LLM Agents · RAG · Evals

01CURRENTS

ABOUT

I work across the whole AI-engineering stack, turning frontier models into systems that survive contact with real users, real data, and real regulation.

Four-plus years shipping LLM, agentic, multimodal, and clinical ML in production: multi-tenant RAG for enterprise customers, agentic pipelines that convert messy human activity into structured output, and voice agents that have handled a thousand-plus conversations. I'm comfortable from data and training through evals, post-training, and deployment.

Before the LLM work I built clinical machine-learning systems: a multimodal depression-severity model that became the basis for an FDA Breakthrough Device application with Keio, with clinical validation across Northwell Health and Hamamatsu. I like problems where being right actually matters.

The interesting part isn't getting a model to answer. It's getting a system to be trustworthy across a thousand answers.

02TRAJECTORY

EXPERIENCE

Four years at I'mbesideyou, from clinical ML into applied LLM systems.

JAN 2025 – PRESENT TOKYO · REMOTE

Senior AI Engineer, Agentic AI & Production Systems

I'mbesideyou Inc.
- Architected and shipped a multi-tenant HR-policy RAG assistant for NTT DATA and two enterprise customers over 1,000+ policies: FastAPI + Qdrant per-tenant retrieval, hybrid search with cross-encoder reranking, parent-document retrieval, OpenAI GPT synthesis, policy-conflict detection, and Langfuse audit traces.
- Owned multimodal SOP-generation & gap-analysis modules turning desktop activity into procedures: a multi-provider pipeline (Gemini vision, GPT synthesis, Claude critic), LangGraph orchestration, Pydantic structured outputs, and write-review-rewrite loops; deployed across 3+ organizations with 100+ operators.
- Built a multi-agent auto-prompt optimization system that collapsed a week of transcript review per agent into a few automated hours: planner / extractor / synthesizer chains with regression scoring against correctness thresholds.
- Shipped voice AI agents on Vapi & LiveKit matching human pickup and conversion across 1,000+ calls.
- Replaced spot-check QSC audits with continuous automated VLM compliance coverage piloted across 2 restaurant locations, plus a VLM-assisted auto-labeling pipeline that bootstrapped a proprietary dataset where no open-source data or model existed.
AUG 2022 – DEC 2024 TOKYO · REMOTE

Data Scientist, Clinical AI & Multimodal ML

I'mbesideyou Inc.
- Built a multimodal depression-severity model in PyTorch using DeepFace facial-emotion features and librosa speech-prosody features (early fusion): 83% accuracy, extended to bipolar-vs-unipolar at 78% across 350+ patients.
- Productionized the clinical ML pipeline on AWS; collaborated with Keio on an FDA Breakthrough Device Designation application and Pre-Sub filings, and validated depression-severity predictions with Northwell Health and Hamamatsu.
- Built an LLM system converting therapy transcripts into Pydantic-validated SOAP notes; a clinician style-adaptation loop distilled edits into reusable per-clinician profiles and cut documentation time 70%.
MAY 2021 – JUL 2022 INDIA

Data Science Intern

I'mbesideyou Inc.
- Built video-analytics pipelines across classroom sessions, live-streaming, and sales-call recordings (Python, OpenCV, scikit-learn), distinguishing high from low performers across 10,000+ recordings and generating structured coaching recommendations.

03SELECTED

PROJECTS

Things I've built to make AI systems measurable, aligned, and reproducible.

P-01
LLM Evaluation Harness

A pytest-style eval harness for LLM services: golden-set fixtures with versioned prompts, LLM-as-judge with confidence-aware aggregation, per-test cost/latency budgets, and a GitHub Actions gate that blocks regressions. Plugs into Langfuse traces.
- PyTest
- Pydantic
- Langfuse
- LLM-as-judge
- GitHub Actions
P-02
Preference Optimization & Alignment Stack

Fine-tuned Llama 3 8B for medical Q&A (MedQA / PubMedQA): curated instruction and preference data, LoRA / QLoRA domain SFT followed by DPO preference optimization, evaluated against the base model and a frontier API on task success, helpfulness, latency & cost before quantized vLLM serving.
- PyTorch
- TRL
- LoRA / QLoRA
- DPO
- vLLM

04TOOLBOX

SKILLS

The stack I reach for, end to end.

LLM APPS

RAG
Agents · ReAct · MCP
LangGraph
Pydantic outputs
Claude / OpenAI / Gemini
Voice · Vapi · LiveKit

EVALS & OBSERVABILITY

LLM-as-judge
RAGAS
Golden sets
Regression scoring
Langfuse
LangSmith

AI / ML

PyTorch
Transformers · PEFT
LoRA / QLoRA
vLLM
GPTQ / AWQ
Multimodal fusion · VLMs

POST-TRAINING / ALIGNMENT

RLHF
Reward modeling
DPO
PPO
Preference datasets
Fine-tuning

BACKEND / INFRA

Python
FastAPI
PostgreSQL
Qdrant · FAISS
Docker · Kubernetes
AWS · HIPAA

EDUCATION

Indian Institute of Technology, Hyderabad

B.Tech, Engineering Physics · CGPA 8.48 / 10 · 2018–2022

PUBLICATIONS

Nanegaonkar & Ando. A Practical Approach to Predicting Depression: Verbal and Non-Verbal Insights with Machine Learning. HIIJ 13(2), May 2024.

Co-author, Chatbot Interventions for Early-Stage Depression and Anxiety. EHPS Conference, Aug 2025.

05TRANSMIT

CONTACT

Based in India, open to US relocation (transferable H1B). Open to hard problems in applied AI: LLM systems, agents, evals, multimodal. Always happy to talk shop.

→ 01 EMAIL sanskarnanegaonkar@gmail.com
→ 02 LINKEDIN linkedin.com/in/sanskar-nanegaonkar
→ 03 GITHUB github.com/02san02
→ 04 GOOGLE SCHOLAR coming soon

ABOUT

EXPERIENCE

Senior AI Engineer, Agentic AI & Production Systems

Data Scientist, Clinical AI & Multimodal ML

Data Science Intern

PROJECTS

LLM Evaluation Harness

Preference Optimization & Alignment Stack

SKILLS

Indian Institute of Technology, Hyderabad

CONTACT