- cross-posted to:
- osdev@programming.dev
- cross-posted to:
- osdev@programming.dev
The changes:
Intel® APX doubles the number of general-purpose registers (GPRs) from 16 to 32. This allows the compiler to keep more values in registers; as a result, APX-compiled code contains 10% fewer loads and more than 20% fewer stores than the same code compiled for an Intel® 64 baseline.2 Register accesses are not only faster, but they also consume significantly less dynamic power than complex load and store operations.
Intel® APX adds conditional forms of load, store, and compare/test instructions, and it also adds an option for the compiler to suppress the status flags writes of common instructions. These enhancements expand the applicability of if-conversion to much larger code regions, cutting down on the number of branches that may incur misprediction penalties. All these conditional ISA improvements are implemented via EVEX prefix extensions of existing legacy instructions.
deleted by creator
This is AMD64 - extensions to x86_64/AMD64 are created all the time, after a while they become expected by software distributors and compiled software relies on their existence. That’s why new games don’t work on old CPUs.
The biggest advantage is backwards compatibility with existing x86 software. You will lose that with a switch to ARM.
Apple and Microsoft are doing a pretty good job at translating x86 to ARM. It’s not perfect but I haven’t really faced any application that just failed to run (to be fair I use pretty popular applications so maybe there’s a bias there).
Aren’t most x86 executables being built now still favoring compatibility to performance? I think I’ve read that just targeting the current gen CPUs while compiling can bring up to 20% improvements.
For consumer software, yes, most is still being built with a baseline target instruction set from the early/mid-2000s. In 2019 there were reports of Apex Legends requiring SSE4.1, an instruction set from circa 2007. It will be be probably close to a couple decades before consumer software would start commonly requiring these instructions.
However, for more specialized environments, such as scientific and high-performance computing applications, it’s much more common that you will be using custom software designed for a specific task, and that it’s normal to recompile the software when you get a new set of hardware. In those applications, these instructions can make a huge impact, as you know exactly which capabilities are supported by the hardware and can use everything available.
I believe there are also some (possibly limited) situations where a program can check what instructions a processor supports and use either the newer (higher-performance) version or the slower, more widely-supported version depending on that check. There may be limits on how often that can be done however.
In 2019 there were reports of Apex Legends requiring SSE4.1, an instruction set from circa 2007.
It’s not just about when it was released, sometimes budget processors or, in this case, AMD doesn’t support them straight away or ever.
@keenkoon I wonder if it would make sense to store a regular compiled code and the extensions into one binary. And only load the extensions if the binary is executed on such an architecture, otherwise be compatible to older architecture.
This is why .NET code compiles to platform-independent binaries that get JIT translated to machine code and optimized for the target CPU. Developers don’t need to do anything (the applications don’t even need to be re-compiled), they will just get conditionally optimized when appropriate.
This is the only way really to move forward with ISA extensions.
Though, I think for this update we don’t need to be too concerned. Since it changes the code in such an extensive way, compiler writers will be strongly incentivised to produce this duplicate path themselves. Instead of letting the burden of dispatching fall on the programmer like with AVX and friends
For an extension like this - unlike most prior extensions - you’re best off with essentially an entirely separately compiled copy of the program/library. So
IFUNC
is a poor fit, even with peer optimization.