I think there's a generic C fallback, which can also serve as a baseline. But for the big (targeted) architectures, there one handwritten assembly version per arch.
On startup, it runs cpuid and assigns each operation the most optimal function pointer for that architecture.
In addition to things like ‘supports avx’ or ‘supports sse4’ some operations even have more explicit checks like ‘is a fifth generation celeron’. The level of optimization in that case was optimizing around the cache architecture on the cpu iirc.
Source: I did some dirty things with chromes native client and ffmpeg 10 years ago.
Yeah, sure. I was specifically referring to the tutorials. Ffmpeg needs to run everywhere, although I believe they are more concerned about data center hardware than consumer hardware. So probably also stuff like power pc.
To a first approximation, the only architectures where people really care about ffmpeg performance (anymore) are x86_64 and arm64. Everything else is of minimal importance - the few assembly routines for other architectures were probably written more for fun than for practical reasons.