Intel held a virtual presentation on Architecture Day, revealing details of the technology behind several upcoming consumer and data center products. While the exact specifications of CPUs and GPUs will have to wait until they actually hit the market, we now have a better idea of the building blocks Intel uses to put them together. Intel SVP and GM of the Accelerated Computing Systems and Graphics Group, Raja Koduri, led the presentation when several senior Intel engineers appeared.
The 12th generation Core CPU line, code-named “Alder Lake” is expected to hit the market in the next few months, starting with desktop models. These will be the first mainstream Intel CPUs to feature a mix of high-performance and low-power cores – which is common in mobile SoCs today. This follows on from the experimental ‘Lakefield’ CPU which so far only had a limited version. Alder Lake will use a more modular approach than before, with different combinations of logic blocks for different product segments.
Intel will use the terms Performance-Core and Efficient-Core, often abbreviated as P-Core and E-Core. At Alder Lake, the E-cores are based on the ‘Gracemont’ architecture, while the P-cores use the ‘Golden Cove’ design. For Gracemont, Intel targeted physical silicon size and throughput efficiency to achieve multithreaded performance across a large number of individual cores. These cores run on low voltage and are mainly used by simpler processes.
The Golden Cove-based P-cores are designed for speed and low latency. Intel calls this the most powerful core it has ever built. New to this generation is support for Advanced Matrix Extensions to accelerate deep learning training and inference.
Combined, this generation of P and E cores in the Alder Lake architecture will be highly scalable, from 9W to 125W, covering most of today’s mobile and desktop categories. It is manufactured using the newly announced Intel 7 process, which is a renaming of the 10 nm “Enhanced SuperFIN” process. Different implementations incorporate different combinations of DDR5, PCIe Gen5, Thunderbolt 4, and Wi-Fi 6E.
The desktop implementation uses a new LGA1700 socket with up to eight performance cores (two threads each), eight efficient cores (single-threaded) and 30 MB last-level cache memory. The integrated GPU has up to 32 execution units for basic display output and graphics functions. It will not have integrated Thunderbolt or an image processing block, but will support 16 lanes PCIe Gen5 and a further four lanes PCIe Gen4. The suitable platform controllers for mainboards will have up to 12 additional PCIe Gen4 and 16 PCIe Gen3 lanes.
Two mobile versions of Alder Lake were also discussed – a mainstream die with six P-cores and eight E-cores and an ultra-compact die with two P-cores and eight E-cores. Both will have 96 execution unit GPUs, along with image processing units and built-in Thunderbolt controllers, and will be aimed at devices that do not have separate GPUs.
All Alder Lake CPUs are made up of modular logic blocks – CPU cores, GPU, memory controllers, I / O and more. They will support up to DDR5-4800, LPDDR5-5200, DDR4-3200, and LPDDR4X-4266 RAM, and it is up to the motherboard and laptop OEMs to decide what to implement. The modular blocks of each CPU are connected via three fabrics – Compute, Memory and IO. Intel describes 100 GB / s computing fabric bandwidth per P-core or per cluster of four E-cores, i.e. a total of 1000 GB / s between 10 such units. The last level cache can be dynamically adjusted between inclusive and exclusive depending on the load.
We now have some information on how the workloads are distributed between P and E cores. Intel is announcing a new hardware scheduler called Thread Director, which will be completely transparent to software and will work with the OS scheduler to assign threads to different cores based on urgency and real-time conditions. Thread Director is designed to scale to mobile and desktop CPUs and can adapt to thermal and power conditions, migrating threads from one core type to another, and managing multi-threading on the P-cores with “nanosecond precision”.
Thread Director requires Windows 11, so Alder Lake will work optimally on this upcoming operating system, although Windows 10, Linux, and other operating systems will work as well. This means that the OS scheduler now understands which types of threads need which types of resources, and it can prioritize latency, energy savings or other parameters depending on the operating conditions.
Intel has been testing its first high-end gaming GPU for a while now and is adding to the hype with the recent announcement of a new Intel Arc brand for GPU hardware, software, and services. The first generation product bears the code name “Alchemist” and will be launched in early 2022. This is one level of the Xe architecture product stack known as Xe-HPG, or High Performance Gaming. Alchemist is made by TSMC on its N6 node. It will support both hardware ray tracing and DirectX 12 Ultimate features like mesh shading and variable rate shading.
Each first-generation Xe-HPG core will have 16 vector engines and 16 matrix engines, as well as caches that enable common GPU workloads as well as AI acceleration. Four such cores as well as four ray tracing units and other rendering hardware form a “slice”. Each Alchemist GPU can have up to eight such slices.
Now we also know that Intel will launch its own version of AI upscaling called XeSS (Xe Super Sampling) to take on Nvidia’s DLSS and AMD’s FSR. XeSS is an AI-based upscaling process that combines information from previous frames. Intel claims up to 2x better performance by rendering at lower resolutions and then upscaling to the target resolution. XeSS will even run on built-in Xe LP GPUs, and several game developers are on board to support it.
While we don’t have GPU specs yet, Intel said it has worked to deliver “leading” performance per watt. We’ll surely learn more as the launch gets closer.
Intel also made several announcements related to its server and data center business during Architecture Day, including a demonstration of the upcoming Ponte Vecchio architecture for big data, which will be the foundation of the exascale supercomputer Aurora. Further highlights were the modular Xeon Scalable platform “Sapphire Rapids”, the oneAPI software stack and an emerging product category – Infrastructure Processing Units (IPUs), which are designed to allocate infrastructure overheads for customer data and processing requirements in cloud-centric data centers separate.