A processing unit (CPU, GPU or no matter) and RAM are sometimes separate issues constructed on separate chips. However what in the event that they have been a part of the identical chip, all blended collectively? That’s precisely what Samsung did to create the world’s first High Bandwidth Memory (HBM) with built-in AI processing {hardware} referred to as HBM-PIM (for processing-in-memory).
It took its HBM2 Aquabolt chips and added Programmable Computing Models (PCU) between the reminiscence banks. These are comparatively easy and function on 16-bit floating level values with a restricted instruction set – they’ll transfer knowledge round and carry out multiplications and additions.
PCUs blended in with the reminiscence banks • The PCU is a really restricted FP16 processor
However there are a lot of PCUs and so they actually sit subsequent to the information they’re engaged on. Samsung managed to get the PCUs working at 300 MHz, which works out to 1.2 TFLOPS processing energy per chip. And it saved the facility utilization (per chip) the identical whereas transferring knowledge at 2.4 Gbps per pin.
Per-chip energy utilization would be the similar, however general system power consumption drops by 71%. It’s because a typical CPU would wish to maneuver knowledge twice – learn the enter then write the outcome. With HBM-PIM the information doesn’t actually go anyplace.
It’s not simply energy saving, utilizing PIM for machine studying and inference duties researchers noticed system efficiency greater than double. That’s a win-win state of affairs.
The HBM-PIM design is backwards suitable with common HBM2 chips, so no new {hardware} must be developed – the software program simply wants to inform the PIM system to modify from common mode to in-memory processing mode.
There may be one problem with this and it’s the PCUs take up area beforehand occupied by reminiscence banks. This cuts the whole capability in half – all the way down to 4 gigabits. Samsung determined to separate the distinction and mix 4 gigabit PIM chips with 8 gigabit common HBM2 dies. Utilizing 4 of every it created 6 gigabyte stacks.
There’s some extra unhealthy information – it is going to be some time earlier than HBM-PIM lands in client {hardware}. For now Samsung has despatched out chips to be examined by companions creating AI accelerators and expects the design to be validated by July.
HBM-PIM will likely be offered on the Worldwide Stable-State Circuits Digital Convention this week, so we will count on extra particulars then.