ABOUT
I work across the whole AI-engineering stack, turning frontier models into systems that survive contact with real users, real data, and real regulation.
Four-plus years shipping LLM, agentic, multimodal, and clinical ML in production: multi-tenant RAG for enterprise customers, agentic pipelines that convert messy human activity into structured output, and voice agents that have handled a thousand-plus conversations. I'm comfortable from data and training through evals, post-training, and deployment.
Before the LLM work I built clinical machine-learning systems: a multimodal depression-severity model that became the basis for an FDA Breakthrough Device application with Keio, with clinical validation across Northwell Health and Hamamatsu. I like problems where being right actually matters.
The interesting part isn't getting a model to answer. It's getting a system to be trustworthy across a thousand answers.
EXPERIENCE
Four years at I'mbesideyou, from clinical ML into applied LLM systems.
-
JAN 2025 – PRESENT TOKYO · REMOTE
Senior AI Engineer, Agentic AI & Production Systems
I'mbesideyou Inc.
- Architected and shipped a multi-tenant HR-policy RAG assistant for NTT DATA and two enterprise customers over 1,000+ policies: FastAPI + Qdrant per-tenant retrieval, hybrid search with cross-encoder reranking, parent-document retrieval, OpenAI GPT synthesis, policy-conflict detection, and Langfuse audit traces.
- Owned multimodal SOP-generation & gap-analysis modules turning desktop activity into procedures: a multi-provider pipeline (Gemini vision, GPT synthesis, Claude critic), LangGraph orchestration, Pydantic structured outputs, and write-review-rewrite loops; deployed across 3+ organizations with 100+ operators.
- Built a multi-agent auto-prompt optimization system that collapsed a week of transcript review per agent into a few automated hours: planner / extractor / synthesizer chains with regression scoring against correctness thresholds.
- Shipped voice AI agents on Vapi & LiveKit matching human pickup and conversion across 1,000+ calls.
- Replaced spot-check QSC audits with continuous automated VLM compliance coverage piloted across 2 restaurant locations, plus a VLM-assisted auto-labeling pipeline that bootstrapped a proprietary dataset where no open-source data or model existed.
-
AUG 2022 – DEC 2024 TOKYO · REMOTE
Data Scientist, Clinical AI & Multimodal ML
I'mbesideyou Inc.
- Built a multimodal depression-severity model in PyTorch using DeepFace facial-emotion features and librosa speech-prosody features (early fusion): 83% accuracy, extended to bipolar-vs-unipolar at 78% across 350+ patients.
- Productionized the clinical ML pipeline on AWS; collaborated with Keio on an FDA Breakthrough Device Designation application and Pre-Sub filings, and validated depression-severity predictions with Northwell Health and Hamamatsu.
- Built an LLM system converting therapy transcripts into Pydantic-validated SOAP notes; a clinician style-adaptation loop distilled edits into reusable per-clinician profiles and cut documentation time 70%.
-
MAY 2021 – JUL 2022 INDIA
Data Science Intern
I'mbesideyou Inc.
- Built video-analytics pipelines across classroom sessions, live-streaming, and sales-call recordings (Python, OpenCV, scikit-learn), distinguishing high from low performers across 10,000+ recordings and generating structured coaching recommendations.
PROJECTS
Things I've built to make AI systems measurable, aligned, and reproducible.
-
P-01
LLM Evaluation Harness
A pytest-style eval harness for LLM services: golden-set fixtures with versioned prompts, LLM-as-judge with confidence-aware aggregation, per-test cost/latency budgets, and a GitHub Actions gate that blocks regressions. Plugs into Langfuse traces.
-
P-02
Preference Optimization & Alignment Stack
Fine-tuned Llama 3 8B for medical Q&A (MedQA / PubMedQA): curated instruction and preference data, LoRA / QLoRA domain SFT followed by DPO preference optimization, evaluated against the base model and a frontier API on task success, helpfulness, latency & cost before quantized vLLM serving.
SKILLS
The stack I reach for, end to end.
- RAG
- Agents · ReAct · MCP
- LangGraph
- Pydantic outputs
- Claude / OpenAI / Gemini
- Voice · Vapi · LiveKit
- LLM-as-judge
- RAGAS
- Golden sets
- Regression scoring
- Langfuse
- LangSmith
- PyTorch
- Transformers · PEFT
- LoRA / QLoRA
- vLLM
- GPTQ / AWQ
- Multimodal fusion · VLMs
- RLHF
- Reward modeling
- DPO
- PPO
- Preference datasets
- Fine-tuning
- Python
- FastAPI
- PostgreSQL
- Qdrant · FAISS
- Docker · Kubernetes
- AWS · HIPAA
Indian Institute of Technology, Hyderabad
B.Tech, Engineering Physics · CGPA 8.48 / 10 · 2018–2022
Nanegaonkar & Ando. A Practical Approach to Predicting Depression: Verbal and Non-Verbal Insights with Machine Learning. HIIJ 13(2), May 2024.
Co-author, Chatbot Interventions for Early-Stage Depression and Anxiety. EHPS Conference, Aug 2025.
CONTACT
Based in India, open to US relocation (transferable H1B). Open to hard problems in applied AI: LLM systems, agents, evals, multimodal. Always happy to talk shop.
- → 01 EMAIL sanskarnanegaonkar@gmail.com
- → 02 LINKEDIN linkedin.com/in/sanskar-nanegaonkar
- → 03 GITHUB github.com/02san02
- → 04 GOOGLE SCHOLAR coming soon