Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At least on x86 the CPUID instruction is part of the problem. Userspace can do feature-detection and then rely on that. But if it's inconsistent between cores then thread migration would cause illegal instruction faults.

If the kernel tried to fix that by moving such faulting threads to P-cores that would lead to a memcpy routine with AVX512 instructions cause all threads to be moved off E-cores.

So first intel would have to introduce new CPUID semantics to indicate that e.g. AVX512 is not supported by default and then a separate flag indicating that it's specifically supported on this core and then userspace would have to pin the thread if it wants to use them or stick to the default set if it wants to be migratable.



I don't really know how CPUID works, but I'm guessing it can be trapped by Linux. So I think that a first "stupid" implementation would be for Linux to report in CPUID intersecting section of CPUs on which the process is allowed to run. So if you want to run AVX512, you first need to pin that process to an AVX512 CPU. You would be able to find an AVX512 CPU by checking in /proc/cpuinfo. (even this "simple" variant is far from first because the cpuset can be changed dynamically in various ways, like Android would move a process from foreground CPUs to backgrouns CPUs using cgroups)


Not sure if you can trap on cpuid, bit the kernel does have control to which cpuid bits are exposed to your application So requiring pinning to see all the bits could work, but then the issue is what happen if the affinity is changed. A static list of required capabilities in some ELF header would probably be better.


> A static list of required capabilities in some ELF header would probably be better.

I think I agree, the thing is that it's a kind-of security issue. I suggested pinning, because it requires CAP_SYS_NICE, which is a feature: If you allow apps to freely declare their usage, they will end up being scheduled not fairly, because system will stick them to P cores.

That being said, you could have indeed an ELF header mentioning since, and then ignore it if caller doesn't have CAP_SYS_NICE. I do feel using an ELF header for that is weird, but my knowledge of ELF is way too little to judge.

Another thing that could work is using file-system attributes or mode (like setuid), but I think FS support of attributes is at best spotty, and I doubt modes can be extended.


I don't think sched_setaffinity requires CAP_SYS_NICE unless you want to set it on a process you don't own.


Maybe I'm dumb and for sure I'm not an expert of this subject but wouldn't we need an executable containing both an AVX512 code path and an alternative plain code one, plus a way to switch code paths according to the core the code is running on? The same memory page would run in a P core or in an E core. Inefficient because of the extra checks?


userspace can be preempted at any instruction, so you have a TOCTOU problem.


Sure, but one can first pin the thread to a core, or a "don't move me between core types" flag could be added to OSes.


Right, thanks.


Or maybe a new system call to allow a thread to temporarily enter a “performance mode” where it can only be scheduled on the powerful cores. Pinning sounds a bit too strict.


You can already pin to a set of cores instead of a single one. But anyway, my point is that currently userspace interacts directly with CPU features without intermediation from the kernel. So intel would have to think about how to coordinate with userspace too, not just rely on the kernel to patch things up (or not).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: