Engineering

Inside the mpv Engine: Why We Chose libmpv

2025-02-156 min read

When we started HorangPlayer, we built it entirely on Apple's AVFoundation framework. It seemed like the right choice — native, well-documented, hardware-accelerated. But we quickly hit walls.

The AVFoundation Problem

AVFoundation is great for MP4 and MOV files. But try to play an MKV file? Not supported. WebM? Nope. AVI with DivX? Forget it.

The format limitations are just the beginning:

Feature	AVFoundation	mpv
MKV support	❌	✅
WebM/VP9	❌	✅
AVI/DivX	❌	✅
ASS subtitles	❌	✅ (styled)
Audio codecs	AAC, MP3	Opus, Vorbis, FLAC, DTS, everything
A/V sync modes	Automatic only	Advanced (audio/display-sync)

We initially tried to work around AVFoundation's limitations with NV12 native output, Metal shaders, and render thread isolation. These optimizations helped — we eliminated ~1.9 GB/s of CPU bandwidth for 4K60 content. But the format support problem remained unsolvable.

Enter libmpv

mpv is the engine behind IINA, the most popular third-party video player on macOS. It's built on FFmpeg for demuxing and decoding, with VideoToolbox hardware acceleration on macOS.

We integrated libmpv — mpv's embeddable library — directly into HorangPlayer. Here's what that gives us:

Universal Format Support

mpv plays virtually everything. MKV, WebM, AVI, FLV, OGM, TS — every container format. H.264, H.265, VP8, VP9, AV1, VC-1, DivX — every codec. Opus, Vorbis, FLAC, DTS, AC3 — every audio format.

Hardware Decoding

mpv uses --hwdec=auto to automatically detect and use VideoToolbox hardware decoding. This means H.264, H.265, VP9, and AV1 are all decoded by the GPU, keeping CPU usage minimal.

Built-in Subtitle Rendering

mpv includes libass for full ASS/SSA subtitle rendering. This means styled subtitles with fonts, colors, positioning, and animations — all rendered directly in the video output. No separate subtitle overlay needed.

The Integration

We use mpv's OpenGL Render API. mpv renders frames to an OpenGL framebuffer object (FBO), which is displayed through a CAOpenGLLayer. This follows IINA's proven approach.

The key components:

MPVDecoder — Wraps the libmpv C API with a Swift interface conforming to our MediaDecoder protocol
MPVVideoLayer — A CAOpenGLLayer subclass that provides the OpenGL surface for mpv to render into
MPVVideoView — A SwiftUI NSViewRepresentable wrapper

The render loop follows IINA's battle-tested pattern:

mpv signals a new frame via mpv_render_context_set_update_callback
We dispatch to a dedicated render queue (mpvGLQueue, .userInteractive QoS)
mpv_render_context_update() confirms the frame is ready
We read the actual FBO and viewport from OpenGL state
mpv_render_context_render() draws the frame
mpv_render_context_report_swap() tells mpv the frame was displayed

Dual Backend Architecture

We didn't throw away AVFoundation. HorangPlayer uses a protocol-based dual backend:

MediaDecoder (protocol)
├── MPVDecoder (default) — mpv engine, plays everything
└── AVFoundationDecoder — Apple native, PiP support

mpv is the default backend. AVFoundation remains available for Apple-specific features like Picture-in-Picture that require an AVPlayer instance.

The result? IINA-level format support and performance, with a modern SwiftUI interface on top.