Jihan Yang, a leading emerging voice in multimodal artificial intelligence, is advancing the frontier of machine learning as a postdoctoral associate at New York University’s Courant Institute of Mathematical Sciences. Working with Professor Saining Xie, Yang focuses on developing next-generation Multimodal Large Language Models (MLLMs) capable of reasoning across vision, language, and time—an area widely seen as critical to the future of general-purpose AI systems. His recent work explores how AI can better understand spatial environments, long-form video, and real-world contexts, positioning him at the center of efforts to build more grounded and intelligent models.
Yang’s research has rapidly gained traction within the academic community, with publications at top-tier conferences including CVPR, NeurIPS, ICML, ECCV, and ICLR. His projects, such as *Thinking in Space* and the *Cambrian* series, examine how models can construct, retain, and manipulate spatial representations—pushing beyond static image understanding toward dynamic, environment-aware intelligence. In parallel, his work on unified tokenization and post-training methods reflects a broader ambition: to create scalable architectures that integrate perception and reasoning into cohesive systems. This body of work places him among a new generation of researchers redefining the capabilities of foundation models.
Before joining NYU, Yang completed his PhD at the University of Hong Kong, where he built a strong track record in 3D vision and domain adaptation under Professor Xiaojuan Qi. His earlier contributions addressed key challenges in object detection, semantic segmentation, and sim-to-real transfer, forming the technical foundation for his current research. He also gained industry experience through research roles at Tencent, SenseTime, and YITU Technology, bridging academic innovation with real-world applications. As interest in multimodal AI accelerates globally, Yang’s work highlights the shift toward systems that can see, reason, and act—marking a significant step toward more capable and general AI.