Those Itanium cpu SGI Altix servers are not SMP systems. Instead, they are ccNUMA systems, i.e. basically a cluster. That is why we will never see them as 4096 core Database servers. That is the reason they are not doing SMP work.
I have seen numbers of >50x slower RAM latency with NUMA systems.One can view NUMA as a very tightly coupled form of cluster computing
When you take a cluster, where some memory cells are very far away from a node, and pretend them to be one single image via NUMA - you get very slow RAM latency in worst case. In some case, RAM latency is good (if the memory cell happens to be close to the cpu) and in other cases the RAM latency is very bad (if the memory cell is in another PC on the network). This can wreak havoc with performance.
On the other hand, the Oracle M9000 Solaris server with 64cpus, have a latency of approximately 1.3x which is very good. Thus, NUMA is not needed on these M9000 SMP servers. You dont need to mask memory cells far away, via NUMA. Uniform memory servers does not have that 50x worse latency, as clusters has, though Uniform memory servers probably have some worst case latency scenarios too.
He also claims that quite recently, in 2007, Linux had a very bad NUMA API support. NUMA was hardly implemented at all in Linux:
"The NUMA API for Linux is very rudimentary compared to the boutique features in legacy NUMA systems like Sequent DYNIX/ptx and SGI IRIX, but it does support memory and process placement".
But SGI sold their their Altix systems in 2003. How bad was the NUMA API back then? NUMA could not have worked well for Altix Itanium back then. Only for some work loads that could use the beta phase NUMA API, the Altix cluster would have worked.
I do wonder what would happen if we compiled Linux on an Oracle M9000. How good would Linux perform? Apparently it is difficult to build these MPNI servers:
There is an empty niche to earn some millions, but noone succeeds. It is difficult, and not easy. It seems that the reason no one does this, is because it is technically difficult. Just as I suspected.With Advanced Micro Devices not building any chipsets that go beyond four Opteron processor sockets in a single system image – and no one else interested in doing chipsets, either – there is an opportunity, it would seem, for someone to make big wonking Opteron boxes to compete against RISC and Itanium machines.
Many have tried. Newisys, Liquid Computing, Fabric7, 3Leaf Systems, and NUMAscale all took very serious runs at it, and thus far, four out of five of them have gone the way of all flesh. It is not a coincidence that these companies fail