My current research focuses on building executable virtual environments inspired by generative agents, which simulate real-world interaction dynamics for dynamic evaluation, synthetic data generation, and RL-based post-training. I also work on agentic multimodal understanding, especially tool-augmented long-document and long-video agents, as well as multimodal model merging for aligning perception and reasoning capabilities and enabling cross-modal knowledge transfer.
Executable generative-agent environments for dynamic evaluation, synthetic data generation, and Agentic RL
Tool-augmented multimodal agents for long-document and long-video understanding
Multimodal model merging for perception-reasoning alignment and cross-modal knowledge transfer
News
[Dec. 2025 - Present] Research intern at Microsoft Research Asia (MSRA), working on executable multi-agent environments for realistic office workflows, Agentic Document Understanding, agent evaluation, and data synthesis.
Pre-prints
PASA: Post-Merge Perception-Reasoning Asymmetry as a Self-Alignment Signal for MLLMs Hongchen Wei, Zhenzhong Chen.
NeurIPS 2026 (under review)