Hongchen Wei

I am currently a final-year Ph.D. student at Wuhan University, under the supervision of Prof. Zhenzhong Chen.

I received my M.E. degree from Nanjing University of Science and Technology, China, in 2023.

I received my B.Sc. degree from Xi'an Shiyou University, China, in 2020.

Main Research Interests

My current research focuses on building executable virtual environments inspired by generative agents, which simulate real-world interaction dynamics for dynamic evaluation, synthetic data generation, and RL-based post-training. I also work on agentic multimodal understanding, especially tool-augmented long-document and long-video agents, as well as multimodal model merging for aligning perception and reasoning capabilities and enabling cross-modal knowledge transfer.

Executable generative-agent environments for dynamic evaluation, synthetic data generation, and Agentic RL
Tool-augmented multimodal agents for long-document and long-video understanding
Multimodal model merging for perception-reasoning alignment and cross-modal knowledge transfer

News

[Dec. 2025 - Present] Research intern at Microsoft Research Asia (MSRA), working on executable multi-agent environments for realistic office workflows, Agentic Document Understanding, agent evaluation, and data synthesis.

Pre-prints

	PASA: Post-Merge Perception-Reasoning Asymmetry as a Self-Alignment Signal for MLLMs Hongchen Wei, Zhenzhong Chen. NeurIPS 2026 (under review)
	Training-Free Reasoning and Reflection in MLLMs Hongchen Wei, Zhenzhong Chen arXiv Preprint, 2025

LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models
Hongchen Wei, Zhihong Tan, Yaosi Hu, Chang Wen Chen, Zhenzhong Chen
arXiv Preprint, 2025

LOP: Learning Optimal Pruning for Efficient On-Demand MLLMs Scaling
Zhihan Zhang, Xiang Pan, Hongchen Wei, Zhenzhong Chen
arXiv Preprint, 2025

RSFAKE-1M: A Large-Scale Dataset for Detecting Diffusion-Generated Remote Sensing Forgeries
Zhihong Tan, Jiayi Wang, Huiying Shi, Binyuan Huang, Hongchen Wei, Zhenzhong Chen
arXiv Preprint, 2025

TDSAgent: A Task-Driven Sampling Agent for Long Video Question Answering
Author list includes Hongchen Wei
Under Review, 2026

GTC: Game-Theoretic Token Compression for Video Large Language Models
Author list includes Hongchen Wei
Under Review, 2026

ETC: Extreme Token Compression via Task-aware Visual Information Distillation in VLMs
Author list includes Hongchen Wei
Under Review, 2026

Publications

See What We Cannot See: A Geo-guided Reasoning Benchmark for Object Counting under Adverse Earth Observation Conditions
Author list includes Hongchen Wei
CVPR, 2026

Visual Context Window Extension: A New Perspective for Long Video Understanding
Hongchen Wei, Zhenzhong Chen
ACM MM (CCF-A Conference), 2025
Project page

RealVG: Unleashing MLLMs for Training-Free Spatio-Temporal Video Grounding in the Wild
Hongchen Wei, Zhenzhong Chen
ACM MM (CCF-A Conference), 2025

Remote Sensing Semantic Segmentation Quality Assessment based on Vision Language Model
Huiying Shi, Zhihong Tan, Zhihan Zhang, Hongchen Wei, Yaosi Hu, Yingxue Zhang, Zhenzhong Chen
TGRS (CCF-B Journal), 2025

Improving Generalization of Image Captioning with Unsupervised Prompt Learning
Hongchen Wei, Zhenzhong Chen
TOMM (CCF-B Journal), 2024

Exploiting Cross-Modal Prediction and Relation Consistency for Semisupervised Image Captioning
Yang Yang, Hongchen Wei , Hengshu Zhu, Dianhai Yu, Hui Xiong, Jian Yang
TCYB (CCF-B Journal), 2022 (student first author)
Code

S2OSC: A Holistic Semi-Supervised Approach for Open Set Classification
Yang Yang, Hongchen Wei , Zhenqiang Sun, Guangyu Li, Yuanchun Zhou, Hui Xiong, Jian Yang
TKDD (CCF-B Journal), 2021 (student first author)

Activities

Reviewer: ICLR25/26, CVPR25/26, NeurIPS25/26, ICML26, TNNLS

Last updated in Jun. 2026.

Homepage credits: Jon Barron.