Announcement

Collapse
No announcement yet.

PCI Express 7.0 Specification Announced - Hitting 128 GT/s In 2025

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • oiaohm
    replied
    Originally posted by coder View Post
    Only if you ignore the development of fast, in-package RAM. The factors pulling RAM closer to the CPU haven't lessened. It's only because that's been so successful, and we having cache-like tiering schemes, that we can afford to push some of that RAM away, again.
    No this is you ignoring that mainframes historical had upto Layer 6 caches. Ram in cpu has just been layer 1-3 moving into cpu. EMM on ISA was in fact based on a mainframe L4-L5 between nodes cache. Some these in mainframes were CRT screens as memory storage.

    Yes those old mainframes with L4-L5 between nodes caches also had custom processing nodes.

    Long term storage being slow is not a new problem. Yes mainframes are systems I am talking about in history take takes up complete rooms. CXL basically replicated that old system in box.

    Consumer hardware like the PC with caches have had stuff built on the most cost effective way at the time. 8086 cpu only can address 1M of memory EMM allows you to have more than 1 meg of memory. CXL memory also allows you to have more memory than the CPU can address. The old mainframe l4-l5 caches also allow having more memory than the CPU itself can address.

    The EMM ISA cards are based on the prior mainframe L4-L5 caching. Modern CXL system works very much like how the old mainframes did.

    CXL memory and the old mainframe L4-L5 caches are the multi processing node problem. Yes you need to get data as close to the processing units as possible having a mid point in transport between the processing units does have some advantages.

    Cache tiering schemes is not a new thing. We have go though a time frame where lots of systems were built from general processing systems without mix of custom processing nodes. Early mainframes it was nothing strange to have 20 to 30 different custom processing solutions in a single system. CXL could bring back this massive stack of custom processing nodes. Remember those custom processing nodes were important because general cpu and so on where not that high of performance.

    I have not ignored in package ram. Cost reduction of being able to in-package stuff and performance improvements of general CPU and expanding address bus on CPUs got us away from needing the L4/L5 and the custom processing nodes that match up. So yes we have gone around in a circle here.

    coder its not that we can afford to push some of the ram away. The reason why the old mainframes had this not CPU connected ram but general bus connected ram is the multi processing in many custom nodes problem and cpu of the time not being able to address enough memory. We fixed the memory addressing problem for while. Performance improvements of general CPU removed the need for custom processing nodes for a while. The reality is that while is basically up since custom processing nodes for AI and so on is back and needing more memory than what can be connected to CPU is also back. So welcome to full circle the same problems are back again.

    Leave a comment:


  • coder
    replied
    Originally posted by oiaohm View Post
    Basically we have gone around a ~40 year circle here.
    Only if you ignore the development of fast, in-package RAM. The factors pulling RAM closer to the CPU haven't lessened. It's only because that's been so successful, and we having cache-like tiering schemes, that we can afford to push some of that RAM away, again.

    Leave a comment:


  • oiaohm
    replied
    Originally posted by coder View Post
    BTW, I remember seeing ads for ISA/EISA cards with MBs of RAM, that you could use via EMM or as RAM disks. Until the mid/late 2000's, RAM was connected to x86 CPUs via a "Northbridge" chip, on the motherboard. Then, the memory controller got merged into the CPU (bringing NUMA complexities into the realm of mainstream servers). Now, we're entering the era of in-package memory (eventually forcing servers to cope with memory tiers).
    https://www.vogons.org/viewtopic.php?f=46&t=59018

    EMM were most ISA these were not ram drives.

    https://en.wikipedia.org/wiki/I-RAM Ramdisks that more PCI time frame. Yes that time frame include ram drives being connected by normal harddrive connections as well.

    Really CXL memory is redo of the idea EMM ISA cards come from. EMM on ISA is also a cutdown of what you found for memory sharing in historic mainframes. Yes we end up emulating EMM hardware with software.

    Basically we have gone around a ~40 year circle here. Virtual memory using spinning rust also kind of time lines up with current day usage of NVME for storage leading as well in this circle(note to see this you have to look at mainframe location where not the PC hardware). We are kind of repeating memory solutions either circle here.

    Leave a comment:


  • s_j_newbury
    replied
    Originally posted by coder View Post

    IIRC, a similar thing happened with L2 cache - first was external, then moved in-package (anyone remember the Pentium Pro?), and finally on-die.
    Actually, that happened twice! As I recall:

    For x86 CPUs:
    L1 cache was introduced with the 80386, it was external. Usually via SRAM chip sockets on the motherboard.

    The 80486 introduced a single combined instruction and data L1 cache, while the motherboard cache introduced with the 80386 was then relegated to L2 cache.

    As you said, the Pentium Pro moved the L2 cache onto package and the PentiumII on die.

    Meanwhile, Super Socket 7 based AMD K6-II and later which also sported on die L2 cache, and is what I was using at the time, still supported the motherboard SRAM inherited from the 386 era as L3 cache!

    Leave a comment:


  • coder
    replied
    Originally posted by s_j_newbury View Post
    Which is back to the 1970s microcomputer designs, including the original IBM PC where the motherboard/CPU card usually contained a small amount of memory, and additional memory was attached to the system expansion bus with the peripherals.
    Absolutely. What happened is that memory got pulled closer to the CPU to increase bandwidth and reduce latency. This has obvious tradeoffs vs. capacity, so system architecture with memory tiers has arisen to roughly mirror what we have with cache hierarchies, and that opens the door for disaggregation.

    Part of the story here is the rapid escalation in core counts accelerating memory scaling beyond what can be directly-connected to CPUs. But, another aspect of the current trend towards disaggregation is the rise of special-purpose compute accelerators and ever-faster networking & storage.

    BTW, I remember seeing ads for ISA/EISA cards with MBs of RAM, that you could use via EMM or as RAM disks. Until the mid/late 2000's, RAM was connected to x86 CPUs via a "Northbridge" chip, on the motherboard. Then, the memory controller got merged into the CPU (bringing NUMA complexities into the realm of mainstream servers). Now, we're entering the era of in-package memory (eventually forcing servers to cope with memory tiers).

    IIRC, a similar thing happened with L2 cache - first was external, then moved in-package (anyone remember the Pentium Pro?), and finally on-die.
    Last edited by coder; 23 June 2022, 01:31 PM.

    Leave a comment:


  • edwaleni
    replied
    Originally posted by torsionbar28 View Post
    This is what integrated graphics is for. It's built into the CPU, no PCIe slot required.


    10G is not a mainstream consumer technology. Heck 2.5G is barely making a dent. 99% are still on 1G inside the home today.


    Consumer mobos already have a bunch of these built in already. x1 expansion cards are readily available.


    Large numbers of SATA ports is not a consumer use case. New PC's use one or two M.2 drives primarily. And existing boards have four, six, or even more SATA ports already.
    Ah...you wanted to assess my choices, I see.

    Well perhaps I want consumer board choices at less money than a workstation or server would provide? That means use cases that are non traditional in a consumer setting.

    If consumer boards provide a PCIe Gen 4 1X slot and one has an additional need beyond what the board provides, one should be able to meet it, otherwise remove the slot on consumer boards.

    Leave a comment:


  • s_j_newbury
    replied
    Originally posted by coder View Post
    First, who knows if PCIe 7.0 will ever trickle down to mainstream PCs. Ethernet >= 100 Gbps probably never will, given how long the transition to 10 Gigabit has taken.

    Second, CXL.Mem is probably the best use case & one of the key drivers behind this stuff. We seem to be headed for a world of tiered memory, where you have a some GBs of RAM in-package with the CPU, and those systems requiring further expandability can put more memory out on the bus. This scales better and is more flexible, because system bandwidth is no longer pre-partitioned between memory & devices. It also provides the benefits of a unified, coherent memory model for all devices in the system - not so heavily biased towards privileging the CPU (not that it has much relevance to mainstream PC use cases).
    Which is back to the 1970s microcomputer designs, including the original IBM PC where the motherboard/CPU card usually contained a small amount of memory, and additional memory was attached to the system expansion bus with the peripherals.

    Leave a comment:


  • coder
    replied
    Originally posted by torsionbar28 View Post
    Consumer mobos already have a bunch of these built in already. x1 expansion cards are readily available.
    Right. So, for USB 4, then.

    Originally posted by torsionbar28 View Post
    Large numbers of SATA ports is not a consumer use case. New PC's use one or two M.2 drives primarily. And existing boards have four, six, or even more SATA ports already.
    A lot of motherboards are dropping to just 2-4 SATA connectors. You mostly see > 4 on server/workstation and some higher-end gaming boards.

    That's reasonable, but those wanting to run a fileserver on a low-cost board might need to add a HBA card for more SATA connectivity. I still do some backups to BD-R and use another drive for occasionally ripping CDs, which chews up 2 more SATA ports. So, the fileserver I'm planning to replace currently has 8 SATA ports in use: 5 HDDs, one SSD for the OS, and 2 for optical drives. The motherboard had 6 chipset-integrated ports + 1, so I had to use a PCIe card for the 8th.

    Leave a comment:


  • coder
    replied
    Originally posted by s_j_newbury View Post
    This article is about PCIe 7.0, so are you suggesting consumer 100GbE in a 1x slot? Or Terabit Ethernet? 64GB/s m.2 NVME SSDs? There are already interconnects for extreme high performance, but PCI Express is a mainstream standard for "PCs". As coder mentioned above, you're going to have to really improve memory bandwidth before you can even begin to make any use of this technology. Is it really for PCs?
    First, who knows if PCIe 7.0 will ever trickle down to mainstream PCs. Ethernet >= 100 Gbps probably never will, given how long the transition to 10 Gigabit has taken.

    Second, CXL.Mem is probably the best use case & one of the key drivers behind this stuff. We seem to be headed for a world of tiered memory, where you have a some GBs of RAM in-package with the CPU, and those systems requiring further expandability can put more memory out on the bus. This scales better and is more flexible, because system bandwidth is no longer pre-partitioned between memory & devices. It also provides the benefits of a unified, coherent memory model for all devices in the system - not so heavily biased towards privileging the CPU (not that it has much relevance to mainstream PC use cases).

    Leave a comment:


  • coder
    replied
    Originally posted by Teggs View Post
    It does make me expect that various parties will start skipping over PCIe versions eventually.
    So far, each version of PCIe is more expensive to implement than the previous one and largely benefits from experience in doing so. Until version 6.0, they all doubled the signalling frequency, with 4.0 and 5.0 typically requiring more/better PCB layers and retimers to make it work. In 6.0, they enlarged the symbol space and changed to using Flits, which add complexity to the PHY and transceiver. 7.0 is set to reuse that, but who knows where they'll end up.

    Originally posted by Teggs View Post
    If PCIe isn't replaced by that time.
    Who knows, but there's such momentum behind it that it's probably easier for the PCIe standard to bridge to photonics than to get similar momentum behind a fresh standard. That said, PCIe seems to be incorporating an increasing amount of complexity focused on managing the issues related to signaling over copper.

    The whole transition to PAM4 seemed to suggest copper had basically run out of frequency headroom. That they plan to stick with PAM4 is therefore very interesting. It gives me the feeling that 7.0 might be the end of the road, for copper. We'll see.

    Leave a comment:

Working...
X