Skip to content
View teowu's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@VQAssessment @Q-Future

Block or report teowu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
teowu/README.md
  • 👋 Hi, I’m Teo Wu (officially Haoning Wu), working on LMMs in Moonshot AI, working with Xinyu Zhou. Prior to this, I have been a PhD candidate (preparing thesis defense) in Nanyang Technological University 🇸🇬, supervised by Prof. Weisi Lin. I obtained by B.S. degree of computer science in Peking University (北京大学).

  • I am currently focusing on LMM pre-training, long-prefill, and long-decode extensions.

    • [General LMMs]: I co-lead Kimi-VL, an MoE LMM that is excellent at agent, long-context, with long-thinking abilities (<think>blabla</think>); and previously Aria-Chat, an MoE LMM optimized for multimodal everyday dialogs, matching GPT-4o on WildVision-Bench.
    • [Video LMMs]: I have designed LongVideoBench, the first video benchmark for LMMs proven improvable given more input frames (>=256); I have also led video and long-context training of Aria (Model, Paper, GitHub), an excellent open-source native MoE LMM with abilities matching GPT-4o-mini/Gemini-1.5-Flash in only 3.9B activated parameters.
  • 🌱 I have also been the lead of project Q-Future: Visual Evaluation with LMMs📹, on which 7 first-authored papers accepted in top conferences and journels including ICML, ICLR, NeurIPS, TPAMI, CVPR, ECCV and ACMMM. The flagship scorer, OneAlign has been downloaded more than 600K times (until April, 2025) on HuggingFace.

  • 📫 Reach me by e-mail: realtimothyhwu@gmail.com/haoning001@e.ntu.edu.sg, Twitter: Twitter

  • Google Scholar

Pinned Loading

  1. MoonshotAI/Kimi-VL MoonshotAI/Kimi-VL Public

    Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

    554 26

  2. rhymes-ai/Aria rhymes-ai/Aria Public

    Codebase for Aria - an Open Multimodal Native MoE

    Jupyter Notebook 1k 86

  3. longvideobench/LongVideoBench longvideobench/LongVideoBench Public

    [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.

    Python 94 2

  4. Q-Future/Q-Align Q-Future/Q-Align Public

    ③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.

    Python 407 24