Inside the mpv Engine: Why We Chose libmpv
When we started HorangPlayer, we built it entirely on Apple's AVFoundation framework. It seemed like the right choice — native, well-documented, hardware-accelerated. But we quickly hit walls.
The AVFoundation Problem
AVFoundation is great for MP4 and MOV files. But try to play an MKV file? Not supported. WebM? Nope. AVI with DivX? Forget it.
The format limitations are just the beginning:
| Feature | AVFoundation | mpv |
|---|---|---|
| MKV support | ❌ | ✅ |
| WebM/VP9 | ❌ | ✅ |
| AVI/DivX | ❌ | ✅ |
| ASS subtitles | ❌ | ✅ (styled) |
| Audio codecs | AAC, MP3 | Opus, Vorbis, FLAC, DTS, everything |
| A/V sync modes | Automatic only | Advanced (audio/display-sync) |
We initially tried to work around AVFoundation's limitations with NV12 native output, Metal shaders, and render thread isolation. These optimizations helped — we eliminated ~1.9 GB/s of CPU bandwidth for 4K60 content. But the format support problem remained unsolvable.
Enter libmpv
mpv is the engine behind IINA, the most popular third-party video player on macOS. It's built on FFmpeg for demuxing and decoding, with VideoToolbox hardware acceleration on macOS.
We integrated libmpv — mpv's embeddable library — directly into HorangPlayer. Here's what that gives us:
Universal Format Support
mpv plays virtually everything. MKV, WebM, AVI, FLV, OGM, TS — every container format. H.264, H.265, VP8, VP9, AV1, VC-1, DivX — every codec. Opus, Vorbis, FLAC, DTS, AC3 — every audio format.
Hardware Decoding
mpv uses --hwdec=auto to automatically detect and use VideoToolbox hardware decoding. This means H.264, H.265, VP9, and AV1 are all decoded by the GPU, keeping CPU usage minimal.
Built-in Subtitle Rendering
mpv includes libass for full ASS/SSA subtitle rendering. This means styled subtitles with fonts, colors, positioning, and animations — all rendered directly in the video output. No separate subtitle overlay needed.
The Integration
We use mpv's OpenGL Render API. mpv renders frames to an OpenGL framebuffer object (FBO), which is displayed through a CAOpenGLLayer. This follows IINA's proven approach.
The key components:
- MPVDecoder — Wraps the libmpv C API with a Swift interface conforming to our
MediaDecoderprotocol - MPVVideoLayer — A
CAOpenGLLayersubclass that provides the OpenGL surface for mpv to render into - MPVVideoView — A SwiftUI
NSViewRepresentablewrapper
The render loop follows IINA's battle-tested pattern:
- mpv signals a new frame via
mpv_render_context_set_update_callback - We dispatch to a dedicated render queue (
mpvGLQueue,.userInteractiveQoS) mpv_render_context_update()confirms the frame is ready- We read the actual FBO and viewport from OpenGL state
mpv_render_context_render()draws the framempv_render_context_report_swap()tells mpv the frame was displayed
Dual Backend Architecture
We didn't throw away AVFoundation. HorangPlayer uses a protocol-based dual backend:
MediaDecoder (protocol)
├── MPVDecoder (default) — mpv engine, plays everything
└── AVFoundationDecoder — Apple native, PiP supportmpv is the default backend. AVFoundation remains available for Apple-specific features like Picture-in-Picture that require an AVPlayer instance.
The result? IINA-level format support and performance, with a modern SwiftUI interface on top.