Announcement

**coder** · 23 June 2022, 01:17 PM

Originally posted by s_j_newbury View Post

Which is back to the 1970s microcomputer designs, including the original IBM PC where the motherboard/CPU card usually contained a small amount of memory, and additional memory was attached to the system expansion bus with the peripherals.

Absolutely. What happened is that memory got pulled closer to the CPU to increase bandwidth and reduce latency. This has obvious tradeoffs vs. capacity, so system architecture with memory tiers has arisen to roughly mirror what we have with cache hierarchies, and that opens the door for disaggregation.

Part of the story here is the rapid escalation in core counts accelerating memory scaling beyond what can be directly-connected to CPUs. But, another aspect of the current trend towards disaggregation is the rise of special-purpose compute accelerators and ever-faster networking & storage.

BTW, I remember seeing ads for ISA/EISA cards with MBs of RAM, that you could use via EMM or as RAM disks. Until the mid/late 2000's, RAM was connected to x86 CPUs via a "Northbridge" chip, on the motherboard. Then, the memory controller got merged into the CPU (bringing NUMA complexities into the realm of mainstream servers). Now, we're entering the era of in-package memory (eventually forcing servers to cope with memory tiers).

IIRC, a similar thing happened with L2 cache - first was external, then moved in-package (anyone remember the Pentium Pro?), and finally on-die.

**s_j_newbury** · 23 June 2022, 06:05 PM

Originally posted by coder View Post

IIRC, a similar thing happened with L2 cache - first was external, then moved in-package (anyone remember the Pentium Pro?), and finally on-die.

Actually, that happened twice! As I recall:

For x86 CPUs:
L1 cache was introduced with the 80386, it was external. Usually via SRAM chip sockets on the motherboard.

The 80486 introduced a single combined instruction and data L1 cache, while the motherboard cache introduced with the 80386 was then relegated to L2 cache.

As you said, the Pentium Pro moved the L2 cache onto package and the PentiumII on die.

Meanwhile, Super Socket 7 based AMD K6-II and later which also sported on die L2 cache, and is what I was using at the time, still supported the motherboard SRAM inherited from the 386 era as L3 cache!

**oiaohm** · 24 June 2022, 01:39 PM

Originally posted by coder View Post

BTW, I remember seeing ads for ISA/EISA cards with MBs of RAM, that you could use via EMM or as RAM disks. Until the mid/late 2000's, RAM was connected to x86 CPUs via a "Northbridge" chip, on the motherboard. Then, the memory controller got merged into the CPU (bringing NUMA complexities into the realm of mainstream servers). Now, we're entering the era of in-package memory (eventually forcing servers to cope with memory tiers).

Information \ VOGONS

https://www.vogons.org/viewtopic.php?f=46&t=59018

EMM were most ISA these were not ram drives.

https://en.wikipedia.org/wiki/I-RAM Ramdisks that more PCI time frame. Yes that time frame include ram drives being connected by normal harddrive connections as well.

Really CXL memory is redo of the idea EMM ISA cards come from. EMM on ISA is also a cutdown of what you found for memory sharing in historic mainframes. Yes we end up emulating EMM hardware with software.

Basically we have gone around a ~40 year circle here. Virtual memory using spinning rust also kind of time lines up with current day usage of NVME for storage leading as well in this circle(note to see this you have to look at mainframe location where not the PC hardware). We are kind of repeating memory solutions either circle here.

**coder** · 24 June 2022, 02:07 PM

Originally posted by oiaohm View Post

Basically we have gone around a ~40 year circle here.

Only if you ignore the development of fast, in-package RAM. The factors pulling RAM closer to the CPU haven't lessened. It's only because that's been so successful, and we having cache-like tiering schemes, that we can afford to push some of that RAM away, again.

**oiaohm** · 24 June 2022, 10:13 PM

Originally posted by coder View Post

Only if you ignore the development of fast, in-package RAM. The factors pulling RAM closer to the CPU haven't lessened. It's only because that's been so successful, and we having cache-like tiering schemes, that we can afford to push some of that RAM away, again.

No this is you ignoring that mainframes historical had upto Layer 6 caches. Ram in cpu has just been layer 1-3 moving into cpu. EMM on ISA was in fact based on a mainframe L4-L5 between nodes cache. Some these in mainframes were CRT screens as memory storage.

Yes those old mainframes with L4-L5 between nodes caches also had custom processing nodes.

Long term storage being slow is not a new problem. Yes mainframes are systems I am talking about in history take takes up complete rooms. CXL basically replicated that old system in box.

Consumer hardware like the PC with caches have had stuff built on the most cost effective way at the time. 8086 cpu only can address 1M of memory EMM allows you to have more than 1 meg of memory. CXL memory also allows you to have more memory than the CPU can address. The old mainframe l4-l5 caches also allow having more memory than the CPU itself can address.

The EMM ISA cards are based on the prior mainframe L4-L5 caching. Modern CXL system works very much like how the old mainframes did.

CXL memory and the old mainframe L4-L5 caches are the multi processing node problem. Yes you need to get data as close to the processing units as possible having a mid point in transport between the processing units does have some advantages.

Cache tiering schemes is not a new thing. We have go though a time frame where lots of systems were built from general processing systems without mix of custom processing nodes. Early mainframes it was nothing strange to have 20 to 30 different custom processing solutions in a single system. CXL could bring back this massive stack of custom processing nodes. Remember those custom processing nodes were important because general cpu and so on where not that high of performance.

I have not ignored in package ram. Cost reduction of being able to in-package stuff and performance improvements of general CPU and expanding address bus on CPUs got us away from needing the L4/L5 and the custom processing nodes that match up. So yes we have gone around in a circle here.

coder its not that we can afford to push some of the ram away. The reason why the old mainframes had this not CPU connected ram but general bus connected ram is the multi processing in many custom nodes problem and cpu of the time not being able to address enough memory. We fixed the memory addressing problem for while. Performance improvements of general CPU removed the need for custom processing nodes for a while. The reality is that while is basically up since custom processing nodes for AI and so on is back and needing more memory than what can be connected to CPU is also back. So welcome to full circle the same problems are back again.

Announcement

PCI Express 7.0 Specification Announced - Hitting 128 GT/s In 2025

Comment

Comment

Comment

Comment

Comment