Hi! I'm Yuxuan, an applied scientist working at Microsoft with a focus on speech and audio research. My work includes emotion recognition, speaker identification, speech translation, and building multimodal large language models. I'm passionate about bridging the gap between human communication and intelligent systems.
Previously, I earned my master's degree from Beijing University of Technology. I’ve contributed to several impactful projects like phi-4-multimodal-instruct, and my ASR model is ranked #1 on the Open ASR Leaderboard.