AI Engineer/Researcher
LLM/VLM Systems • Multimodal Agents
Planning • Embodied AI
I am an AI engineer/researcher working at the intersection of language models, vision-language models, planning, and multimodal agents. My work focuses on making AI systems more grounded and reliable as they move beyond text generation toward perception, reasoning, and action.
I have research experience with the University of Maryland, College Park and UMBC (collaborating with IBM Research on automated planning), where I have worked on VLM hallucination mitigation, multimodal video reasoning, LLM planning, code reasoning, and persona-aware conversational agents. I also have applied engineering experience building production AI systems, including multimodal RAG pipelines, MCP-based workflow integrations, and clarification-driven chat systems.
My long-term goal is to build AI agents that can understand visual environments, reason over uncertainty, and execute reliable long-horizon actions — both in digital interfaces and eventually in embodied/robotic settings.
My technical interests center on multimodal intelligence: how models can combine language, vision, memory, and planning to make decisions that are grounded in the world rather than only fluent in text.
I am especially interested in vision-language models, multimodal reasoning, planning for agents, visual grounding, hallucination mitigation, and embodied AI. A recurring theme in my work is that intelligent systems should not simply generate answers; they should know when to ask, when to look, when to verify, and when to plan.
Going forward, I want to work on systems that connect VLMs, world models, and planning/control for agents that can operate across visual environments, GUIs, simulations, and robotics-inspired settings.
Research Interests: Multimodal AI, Vision-Language Models, LLM Agents, Reasoning & Planning, Visual Grounding, Embodied AI, Robotics, Computer Vision, Vision-Language-Action Models.