At least on x86 the CPUID instruction is part of the problem. Userspace can do f...

phh · on June 20, 2023

I don't really know how CPUID works, but I'm guessing it can be trapped by Linux. So I think that a first "stupid" implementation would be for Linux to report in CPUID intersecting section of CPUs on which the process is allowed to run. So if you want to run AVX512, you first need to pin that process to an AVX512 CPU. You would be able to find an AVX512 CPU by checking in /proc/cpuinfo. (even this "simple" variant is far from first because the cpuset can be changed dynamically in various ways, like Android would move a process from foreground CPUs to backgrouns CPUs using cgroups)

gpderetta · on June 20, 2023

Not sure if you can trap on cpuid, bit the kernel does have control to which cpuid bits are exposed to your application So requiring pinning to see all the bits could work, but then the issue is what happen if the affinity is changed. A static list of required capabilities in some ELF header would probably be better.

phh · on June 20, 2023

> A static list of required capabilities in some ELF header would probably be better.

I think I agree, the thing is that it's a kind-of security issue. I suggested pinning, because it requires CAP_SYS_NICE, which is a feature: If you allow apps to freely declare their usage, they will end up being scheduled not fairly, because system will stick them to P cores.

That being said, you could have indeed an ELF header mentioning since, and then ignore it if caller doesn't have CAP_SYS_NICE. I do feel using an ELF header for that is weird, but my knowledge of ELF is way too little to judge.

Another thing that could work is using file-system attributes or mode (like setuid), but I think FS support of attributes is at best spotty, and I doubt modes can be extended.

the8472 · on June 20, 2023

I don't think sched_setaffinity requires CAP_SYS_NICE unless you want to set it on a process you don't own.

pmontra · on June 20, 2023

Maybe I'm dumb and for sure I'm not an expert of this subject but wouldn't we need an executable containing both an AVX512 code path and an alternative plain code one, plus a way to switch code paths according to the core the code is running on? The same memory page would run in a P core or in an E core. Inefficient because of the extra checks?

the8472 · on June 20, 2023

userspace can be preempted at any instruction, so you have a TOCTOU problem.

janwas · on June 20, 2023

Sure, but one can first pin the thread to a core, or a "don't move me between core types" flag could be added to OSes.

pmontra · on June 20, 2023

Right, thanks.

codeflo · on June 20, 2023

Or maybe a new system call to allow a thread to temporarily enter a “performance mode” where it can only be scheduled on the powerful cores. Pinning sounds a bit too strict.

the8472 · on June 20, 2023

You can already pin to a set of cores instead of a single one. But anyway, my point is that currently userspace interacts directly with CPU features without intermediation from the kernel. So intel would have to think about how to coordinate with userspace too, not just rely on the kernel to patch things up (or not).