Hyun-Bin Oh

About Me

I am a Ph.D. student at POSTECH, advised by Prof. Tae-Hyun Oh. I am also a research scientist intern at Sony AI.

My research lies at the intersection of computer vision and machine learning, with a focus on multimodal signals that evolve over time, such as video and audio. I am especially interested in audio-visual understanding and generation.

News

2026.06 One paper has been accepted to TMLR 2026.
2026.06 One paper has been accepted to INTERSPEECH 2026.
2026.02 One paper has been accepted to CVPR 2026 as an Oral presentation.
2025.07 Started a research scientist internship at Sony AI, Tokyo.
2025.02 One paper has been accepted to CVPR 2025 as a Highlight.
2025.01 One paper has been accepted to ICLR 2025.
2024.07 One paper has been accepted to ECCV 2024.
2024.06 Two papers have been accepted to INTERSPEECH 2024.
2024.04 One paper has been accepted to RA-L 2024 and presented at IROS 2024 as an oral presentation.

Publications

* denotes equal contribution. † denotes co-corresponding authors.

Spatio-Temporal Audio Language Modeling teaser

Spatio-Temporal Audio Language Modeling for Dynamic Sound Sources

Oh Hyun-Bin, Kazuki Shimada, Yuhta Takida, Kim Sung-Bin, Toshimitsu Uesaka, Takashi Shibuya, Kyeongyoon Lee, Tae-Hyun Oh†, Yuki Mitsufuji†

Preprint

We enable audio-language models to reason about what is sounding, where it is, and how it moves over time from spatial audio.

Real-time video motion magnification teaser

Revisiting Learning-based Video Motion Magnification for Real-time Processing

Hyunwoo Ha*, Oh Hyun-Bin*, Kim Jun-Seong, Kwon Byung-Ki, Kim Sung-Bin, Linh-Tam Tran, Ji-Yun Kim, Sung-Ho Bae, Tae-Hyun Oh

TMLR 2026

We magnify invisible small motions in real time.

Physics-Aware Deepfake Detection via Distance–Speech Consistency

Kyeongrae Kim, Kim Sung-Bin, Oh Hyun-Bin, Tae-Hyun Oh

INTERSPEECH 2026

We detect dynamic talking-head deepfakes by checking whether speech energy is physically consistent with speaker-to-camera distance, complementing lip-sync cues in in-the-wild videos.

PAVAS: Physics-Aware Video-to-Audio Synthesis

Oh Hyun-Bin, Yuhta Takida, Toshimitsu Uesaka, Tae-Hyun Oh, Yuki Mitsufuji

CVPR 2026 Oral presentation

We generate physically plausible audio from a video by explicitly integrating physics estimation into a latent diffusion-based model.

Perceptually accurate 3D talking head teaser

Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics

Lee Chae-Yeon*, Oh Hyun-Bin*, Han EunGi, Kim Sung-Bin, Suekyeong Nam, Tae-Hyun Oh

CVPR 2025 Highlight

We introduce new definitions, a speech-mesh representation space, and evaluation metrics for perceptually accurate 3D talking face generation.

AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models

Kim Sung-Bin*, Oh Hyun-Bin*, JungMok Lee, Arda Senocak, Joon Son Chung, Tae-Hyun Oh

ICLR 2025

We introduce a benchmark for evaluating the perception and comprehension capabilities of audio-visual large language models.

Learning-based Axial Video Motion Magnification

Kwon Byung-Ki, Oh Hyun-Bin, Kim Jun-Seong, Hyunwoo Ha, Tae-Hyun Oh

ECCV 2024

We magnify invisible small motions in user-specified directions.

Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert

Han EunGi*, Oh Hyun-Bin*, Kim Sung-Bin, Corentin Nivelet Etcheberry, Suekyeong Nam, JangHoon Ju, Tae-Hyun Oh

INTERSPEECH 2024

We generate speech-synchronized lip movements in 3D facial animation with an audio-visual lip reading expert.

MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset

Kim Sung-Bin*, Lee Chae-Yeon*, Gihun Son*, Oh Hyun-Bin, JangHoon Ju, Suekyeong Nam, Tae-Hyun Oh

INTERSPEECH 2024

We generate accurate 3D talking heads from multilingual speech.

Uni-DVPS: Unified Model for Depth-Aware Video Panoptic Segmentation

Kim Ji-Yeon, Oh Hyun-Bin, Kwon Byung-Ki, Dahun Kim, Yongjin Kwon, Tae-Hyun Oh

RA-L 2024 IROS Oral

We present a unified multi-task model for depth-aware video panoptic segmentation.

Experience

Research Scientist Intern

Sony AI

Jul 2025 - Present

Hosts (Deep Generative Modeling Team): Yuhta Takida, Toshimitsu Uesaka, and Yuki Mitsufuji
Hosts (Cinematic Technology Team): Kazuki Shimada, Takashi Shibuya, and Yuki Mitsufuji

Education

2022.03 - Present Integrated Ph.D. in Electrical Engineering, POSTECH, Pohang, South Korea
2017.03 - 2021.08
B.S. in Physics and B.E. in Electrical Engineering, Chung-Ang University, Seoul, South Korea
Summa Cum Laude

Honors and Awards

2025 Best Paper Award, IPIU
2024 Best Paper Award, KRoC
2023 Best Paper Award, IPIU
2021 Summa Cum Laude, Chung-Ang University
2021 Department Honors Scholarship (ranked 1st in the department), Chung-Ang University
2018 Science Scholarship, Suwon Municipal Scholarship Foundation
2017 Department Honors Scholarship (ranked 1st in the department), Chung-Ang University

Academic Services

Reviewer ACCV, CVPR, NeurIPS, SIGGRAPH Asia, IJCV, Pattern Recognition

Teaching Experiences

2023 NAVER Boostcamp AI Tech Computer Vision Track (5th, 6th)
2022 Introduction to Machine Learning (EECE454), POSTECH