All posts

From Stutter to Smooth: Our Performance Journey

The same MP4 file played buttery smooth in IINA but stuttered in our player. Here's the story of how we diagnosed the problem, optimized our AVFoundation pipeline, and ultimately adopted mpv for the best of both worlds.

The Problem

We started with a standard AVFoundation pipeline:

AVPlayer → AVPlayerItemVideoOutput → copyPixelBuffer (BGRA)
→ CIImage → CIFilter chain → CIContext → Metal texture → CAMetalLayer

It worked. But 4K content stuttered. 1080p at 60fps dropped frames. Meanwhile, IINA played the same files without breaking a sweat.

Diagnosing the Bottlenecks

We profiled everything and found four major bottlenecks:

1. Forced BGRA Conversion (~1.9 GB/s wasted)

We requested kCVPixelFormatType_32BGRA from AVFoundation. But VideoToolbox decodes to NV12 natively. Every frame was being converted from YCbCr to BGRA on the CPU before we even touched it.

Fix: Switch to kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange and handle YUV→RGB on the GPU.

2. CIContext on Every Frame

Even for simple playback without filters, we were running every frame through CIContext — creating a CIImage, applying an identity transform, compositing onto a black background, and rendering to Metal.

Fix: Bypass CIContext entirely when no filters are active. Use CVMetalTextureCache for zero-copy texture creation and a Metal shader for NV12→RGB conversion.

3. Main Thread Rendering

All Metal rendering was happening on the main thread. This meant every command buffer encode and GPU submission competed with UI updates, gesture handling, and animation.

Fix: Move rendering to a dedicated DispatchQueue with .userInteractive QoS. Use a semaphore to prevent frame queue backlog.

4. The Fundamental Limitation

After all optimizations, our AVFoundation pipeline was significantly better. But IINA was still smoother. Why?

Because IINA uses mpv, which has a completely different architecture:

  • FFmpeg demuxing with no intermediate copies
  • VideoToolbox decoding to NV12
  • GLSL shaders for YUV→RGB + scaling + filtering in a single pass
  • Direct FBO output — no CIContext, no CIImage, no Metal texture cache

mpv's renderer is purpose-built for video. CIContext is general-purpose.

The Optimization Results

ImprovementImpact
NV12 native formatCPU bandwidth reduced 50-70%
CIContext bypass (no filters)Rendering 30-40% faster
CIContext cachingFilter path 10-15% faster
Render thread isolationUI responsive, stable frame timing

These optimizations transformed our AVFoundation backend from stuttery to smooth for most content.

The Final Move: mpv Integration

But "most content" wasn't good enough. We wanted IINA-level performance for all content, plus MKV/WebM support that AVFoundation simply cannot provide.

So we integrated libmpv — the same engine that powers IINA. The result:

  • All formats supported — MKV, WebM, AVI, and everything else
  • IINA-level smoothness — mpv's optimized render pipeline
  • Hardware decoding — VideoToolbox for H.264, H.265, VP9, AV1
  • Built-in subtitles — ASS/SSA with full styling support

We kept our optimized AVFoundation backend as a fallback for PiP and other Apple-specific features. The MediaDecoder protocol makes switching between backends seamless.

Lessons Learned

  1. Profile first — We assumed the bottleneck was in Metal rendering. It was actually in pixel format conversion
  2. Accept native formats — Don't fight the hardware decoder's output format
  3. Minimize intermediate steps — Every copy, every conversion, every context switch adds latency
  4. Dedicated render threads — Never block the main thread with GPU work
  5. Know when to adopt — Sometimes the best optimization is using a purpose-built engine