All posts

Inside the mpv Engine: Why We Chose libmpv

When we started HorangPlayer, we built it entirely on Apple's AVFoundation framework. It seemed like the right choice — native, well-documented, hardware-accelerated. But we quickly hit walls.

The AVFoundation Problem

AVFoundation is great for MP4 and MOV files. But try to play an MKV file? Not supported. WebM? Nope. AVI with DivX? Forget it.

The format limitations are just the beginning:

FeatureAVFoundationmpv
MKV support
WebM/VP9
AVI/DivX
ASS subtitles✅ (styled)
Audio codecsAAC, MP3Opus, Vorbis, FLAC, DTS, everything
A/V sync modesAutomatic onlyAdvanced (audio/display-sync)

We initially tried to work around AVFoundation's limitations with NV12 native output, Metal shaders, and render thread isolation. These optimizations helped — we eliminated ~1.9 GB/s of CPU bandwidth for 4K60 content. But the format support problem remained unsolvable.

Enter libmpv

mpv is the engine behind IINA, the most popular third-party video player on macOS. It's built on FFmpeg for demuxing and decoding, with VideoToolbox hardware acceleration on macOS.

We integrated libmpv — mpv's embeddable library — directly into HorangPlayer. Here's what that gives us:

Universal Format Support

mpv plays virtually everything. MKV, WebM, AVI, FLV, OGM, TS — every container format. H.264, H.265, VP8, VP9, AV1, VC-1, DivX — every codec. Opus, Vorbis, FLAC, DTS, AC3 — every audio format.

Hardware Decoding

mpv uses --hwdec=auto to automatically detect and use VideoToolbox hardware decoding. This means H.264, H.265, VP9, and AV1 are all decoded by the GPU, keeping CPU usage minimal.

Built-in Subtitle Rendering

mpv includes libass for full ASS/SSA subtitle rendering. This means styled subtitles with fonts, colors, positioning, and animations — all rendered directly in the video output. No separate subtitle overlay needed.

The Integration

We use mpv's OpenGL Render API. mpv renders frames to an OpenGL framebuffer object (FBO), which is displayed through a CAOpenGLLayer. This follows IINA's proven approach.

The key components:

  • MPVDecoder — Wraps the libmpv C API with a Swift interface conforming to our MediaDecoder protocol
  • MPVVideoLayer — A CAOpenGLLayer subclass that provides the OpenGL surface for mpv to render into
  • MPVVideoView — A SwiftUI NSViewRepresentable wrapper

The render loop follows IINA's battle-tested pattern:

  1. mpv signals a new frame via mpv_render_context_set_update_callback
  2. We dispatch to a dedicated render queue (mpvGLQueue, .userInteractive QoS)
  3. mpv_render_context_update() confirms the frame is ready
  4. We read the actual FBO and viewport from OpenGL state
  5. mpv_render_context_render() draws the frame
  6. mpv_render_context_report_swap() tells mpv the frame was displayed

Dual Backend Architecture

We didn't throw away AVFoundation. HorangPlayer uses a protocol-based dual backend:

MediaDecoder (protocol)
├── MPVDecoder (default) — mpv engine, plays everything
└── AVFoundationDecoder — Apple native, PiP support

mpv is the default backend. AVFoundation remains available for Apple-specific features like Picture-in-Picture that require an AVPlayer instance.

The result? IINA-level format support and performance, with a modern SwiftUI interface on top.