Home

Sarvesh Baskar

Sarvesh Baskar

AI Engineer/Researcher

LLM/VLM Systems • Multimodal Agents
Planning • Embodied AI

About Me

I am an AI engineer/researcher working at the intersection of language models, vision-language models, planning, and multimodal agents. My work focuses on making AI systems more grounded and reliable as they move beyond text generation toward perception, reasoning, and action.

I have research experience with the University of Maryland, College Park and UMBC (collaborating with IBM Research on automated planning), where I have worked on VLM hallucination mitigation, multimodal video reasoning, LLM planning, code reasoning, and persona-aware conversational agents. I also have applied engineering experience building production AI systems, including multimodal RAG pipelines, MCP-based workflow integrations, and clarification-driven chat systems.

My long-term goal is to build AI agents that can understand visual environments, reason over uncertainty, and execute reliable long-horizon actions — both in digital interfaces and eventually in embodied/robotic settings.

Research & Technical Interests

My technical interests center on multimodal intelligence: how models can combine language, vision, memory, and planning to make decisions that are grounded in the world rather than only fluent in text.

I am especially interested in vision-language models, multimodal reasoning, planning for agents, visual grounding, hallucination mitigation, and embodied AI. A recurring theme in my work is that intelligent systems should not simply generate answers; they should know when to ask, when to look, when to verify, and when to plan.

Going forward, I want to work on systems that connect VLMs, world models, and planning/control for agents that can operate across visual environments, GUIs, simulations, and robotics-inspired settings.

Research Interests: Multimodal AI, Vision-Language Models, LLM Agents, Reasoning & Planning, Visual Grounding, Embodied AI, Robotics, Computer Vision, Vision-Language-Action Models.

News

  • [Academic] March 2026: Research paper "Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings" accepted at ACL Findings 2026!
  • [Academic] March 2026: Paper "The Low-Frequency Trap: Why Scaling Doesn't Solve Simple Temporal Counting" accepted at the ICLR 2026 Workshop!
  • [Academic] February 2026: Co-authored paper "Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks" accepted at the WoRMA 2026 Workshop!
  • [Academic] May 2025: First-author paper "(CPER) From Guessing to Asking" accepted at NAACL SRW 2025!

CV & Resume