Announcement

**DMJC** · 20 October 2019, 05:15 PM

I just wish we could get the code to SGI IRIX opened up. Now THAT would make MIPS a much more interesting platform. I'm a bit surprised that they (Libre) haven't chosen MIPS for a GPU since it has a very long/proven track record of usage in graphics.

**lkcl** · 20 October 2019, 05:36 PM

Originally posted by DMJC View Post

I just wish we could get the code to SGI IRIX opened up. Now THAT would make MIPS a much more interesting platform. I'm a bit surprised that they (Libre) haven't chosen MIPS for a GPU since it has a very long/proven track record of usage in graphics.

last time i tried to contact the MIPS open foundation the website had been taken down (or the page they referred to was 404). basically they haven't the infrastructure in place, just a change in "licensing" arrangement. the Open Power Foundation by contrast has been established for years.

yes, for due diligence, we really do need to look at MIPS. sigh. so much to do

**madscientist159** · 20 October 2019, 05:51 PM

Originally posted by lkcl View Post

last time i tried to contact the MIPS open foundation the website had been taken down (or the page they referred to was 404). basically they haven't the infrastructure in place, just a change in "licensing" arrangement. the Open Power Foundation by contrast has been established for years.

yes, for due diligence, we really do need to look at MIPS. sigh. so much to do

While I agree due diligence is in order, I highly doubt you'll find an organization with a viable ISA more committed to truly open hardware / firmware than OpenPOWER right now.

Plus that performance difference of current POWER silicon vs. both RISC-V and MIPS silicon should significantly speed development, but the persistent advantage in my mind is that the OpenPOWER Foundation and many of its members welcome these kinds of ideas (especially around secure and trustworthy open systems). You don't see that kind of thinking every decade!

**tului** · 20 October 2019, 07:10 PM

One thing I've often wondered is why x86 CPUs haven't let assembly and compilers issue microop instructions? If all x86 CPUs just convert to micro ops anyway couldn't we cut out the middle man and let people write straight microop instructions? Sure the decoders would be useless but assuming you could that extra unused (for microop code) silicon would just help with the high heat density we run into now

**lkcl** · 20 October 2019, 07:19 PM

Originally posted by madscientist159 View Post

I'm certain we could free up a few POWER machines to the development team here, though we'd like a bit more focus on potential 4k / PCIe support as that would eliminate one of the last remaining binary blobs in a typical built desktop / workstation POWER system (namely the GPU)...

appreciated: do bear in mind that as we're doing this pretty much from-scratch (not entirely, you know what i mean), if when we talk to any potential RTL licensees they say "oh you'll need a proprietary blob for that" we'll just put the phone down on them and find something else.

4k / PCIe will (unless we get sponsors / customers with USD $2m+ budgets) be on the table for a Revision 3. the critical first milestone is to prove the architecture on a minimum budget, so that iterations can be done cheaply. this is why the NLNet Grants last month went in for a *180nm* ASIC (USD $600 per sq.mm, we'll need around 20 sq.mm for a single-core chip) because it's peanuts, and the RTL doesn't care if it's running in 180nm or 14nm. we could do a hundred test revisions at 180nm for the cost of a single 14nm ASIC.

_then_ we ramp up through the geometries, _then_ we ramp up with high-end peripherals and high-end performance. reduce risk, get something done. the LIP6.fr team is doing a 360nm tape-out (early next year), all using alliance / coriolis2.

**madscientist159** · 20 October 2019, 07:22 PM

Originally posted by tului View Post

One thing I've often wondered is why x86 CPUs haven't let assembly and compilers issue microop instructions? If all x86 CPUs just convert to micro ops anyway couldn't we cut out the middle man and let people write straight microop instructions? Sure the decoders would be useless but assuming you could that extra unused (for microop code) silicon would just help with the high heat density we run into now

There are a number of reasons why not to allow this. From a business perspective, Intel/AMD's continued survival largely depends on a mix of software backward compatibility and severe licensing restrictions (i.e. no licenses allowed, period) for the overall copyrighted/patented/trademarked x86 ISA. Neither of those goals are helped by exposing microops, and indeed being able to modify operation of the macro x86 instructions would very likely violate existing DRM contracts as it would be possible to subvert DRM software without the software knowing. That's not even going into the fact that the micro ops are probably not copyrightable/patentable, and software compatibility across different internal CPU designs would be an absolute nightmare.

The instruction decoder is a tiny part of the chip. With something like POWER that doesn't have these concerns, it might even be possible to make a CPU that allows custom development-only instructions via open microcode (with those instructions then being submitted for inclusion in the actual ISA over time). But as far as x86 being able to do this, let alone being a legal choice for new development outside of Intel or AMD, the answer is a hard no.

**madscientist159** · 20 October 2019, 07:25 PM

Originally posted by lkcl View Post

appreciated: do bear in mind that as we're doing this pretty much from-scratch (not entirely, you know what i mean), if when we talk to any potential RTL licensees they say "oh you'll need a proprietary blob for that" we'll just put the phone down on them and find something else.

So do we!

Originally posted by lkcl View Post

4k / PCIe will (unless we get sponsors / customers with USD $2m+ budgets) be on the table for a Revision 3. the critical first milestone is to prove the architecture on a minimum budget, so that iterations can be done cheaply. this is why the NLNet Grants last month went in for a *180nm* ASIC (USD $600 per sq.mm, we'll need around 20 sq.mm for a single-core chip) because it's peanuts, and the RTL doesn't care if it's running in 180nm or 14nm. we could do a hundred test revisions at 180nm for the cost of a single 14nm ASIC.

Let me talk some with folks on my side. If we got you a blob-free PCIe core, somehow, even if it was just PCIe 2.0 or 3.0, would that be possible to include in Rev. 1? Also, when I say 4k, I just mean the display side for the first generation -- i.e. 4k raster across a few outputs, not necessarily a 3D engine that would actually be rendering decent FPS to a canvas of that size.

For reference, our current display output for blob-free systems is a 1080p single digital output with zero 3D capability aside from LLVMPipe on the host CPU. A few 4k unaccelerated framebuffers, especially on a wide bus like CAPI or multi-lane PCIe, would be a big step up from where we are right now -- and any 3D / accelerator capability would be a significant bonus even if it's fundamentally mismatched in raw performance with the 4k framebuffers.

CAPI does neatly sidestep the PCIe issues, since the RTL etc. is open. Would be humorous on some level to have Rev 1 support CAPI but not support PCIe.

**lkcl** · 20 October 2019, 10:10 PM

Originally posted by madscientist159 View Post

So do we!

Let me talk some with folks on my side. If we got you a blob-free PCIe core, somehow, even if it was just PCIe 2.0 or 3.0, would that be possible to include in Rev. 1?

turns out that enjoy-digital has a PCIe Controller - that just leaves the PHY. which is the bit that has me concerned: aside from DDR3/4 RAM, there's nothing on the ASIC that i was planning to run above around 150mhz for the first version.

if however you can get hold of a PCIe PHY, then yes, we can put it in. however not for the test chip, because it's 180nm.

what *would* work would be to use a Lattice ECP5G as a gateway (communicating using some form of parallel bus e.g. xSPI or SDRAM). the ECP5G already has the balanced differential PHY drivers needed to do PCIe, and someone is actually working on it: https://github.com/enjoy-digital/litepcie/issues/20

Also, when I say 4k, I just mean the display side for the first generation -- i.e. 4k raster across a few outputs, not necessarily a 3D engine that would actually be rendering decent FPS to a canvas of that size.

For reference, our current display output for blob-free systems is a 1080p single digital output with zero 3D capability aside from LLVMPipe on the host CPU. A few 4k unaccelerated framebuffers, especially on a wide bus like CAPI or multi-lane PCIe, would be a big step up from where we are right now -- and any 3D / accelerator capability would be a significant bonus even if it's fundamentally mismatched in raw performance with the 4k framebuffers.

CAPI does neatly sidestep the PCIe issues, since the RTL etc. is open. Would be humorous on some level to have Rev 1 support CAPI but not support PCIe.

i'm planning to use richard herveille's excellent vga_lcd RTL, which, technically, there's nothing to stop you running it at any speed you want.

the problem is that it's a Bus Master (yes, really, the processor is *not* a Bus Master). this because you absolutely cannot have the scan lines of the video pause at any time.

when you compute the data transfer rate generated by 4k, it's 8.3 million pixels per frame. let's say 30 fps, that's now 250 million pixels per frame. let's say 16 bpp, that's 2 bytes - now that's 500 mbytes/sec, just for the pixel data.

DDR3 @ 800mhz is a nice low-cost RAM rate, 32 bit wide, the power budget is around 300mW with DDR3L. you get 4 bytes so it's 3200 mbytes/sec. FIFTEEN PERCENT of the data bandwidth is taken up by a 4k frame @ 30 fps, 16bpp!

if you went to 60fps, it would be 30%. if you went to 60fps 32bpp, it would be a whopping SIXTY PERCENT of the data bandwidth taken up just feeding the framebuffer, at 2000 mbytes/sec.

at least with 1080p60@32bpp the data rate is 4x less so it's back down to (only!!) 15% of the total bandwidth.

this is why most (power-hungry) systems now have 2x 32-bit DRAM channels @ minimum 1666mhz DDR3/4 rates, and, unfortunately, those are looking at a 3 to 5 watt power budget, just for the DRAM.

honestly it would be better, at this early stage, to use an FPGA as a gateway IC to provide PCIe. something that enjoy-digital already supports, and have a conversion bus (parallel bus) in something dreadful but very simple (multiple xSPI or overclocked SDRAM, or overclocked IDE/AT).

**madscientist159** · 20 October 2019, 10:28 PM

Originally posted by lkcl View Post

this is why most (power-hungry) systems now have 2x 32-bit DRAM channels @ minimum 1666mhz DDR3/4 rates, and, unfortunately, those are looking at a 3 to 5 watt power budget, just for the DRAM.

Yes, this makes sense, and for what we're looking at the power budget isn't as critical. I know the original application was mobile, but I just wonder if there's any way to do this where the same chip can do double duty and fill the void that seems to exist right now with our current desktops -- even if the mobile flavor of the chip has one of the DDR controllers lasered off, it'd still be better to have the capability in the design I would think?

Originally posted by lkcl View Post

honestly it would be better, at this early stage, to use an FPGA as a gateway IC to provide PCIe. something that enjoy-digital already supports, and have a conversion bus (parallel bus) in something dreadful but very simple (multiple xSPI or overclocked SDRAM, or overclocked IDE/AT).

That's actually not out of the realm of possibilities. Cost and power consumption aren't huge drivers for this initial work from where I sit (walk before you can run, basically), but just showing an open desktop-class GPU is actually possible and viable would be a major feat in and of itself. Right now we run into significant opposition about the importance of open systems because of the closed GPUs that most people stuff in our otherwise fully open POWER boxes to get 4k screen support etc; even being able to point to an expensive open GPU with relatively poor performance and high power use (but multiple 4k framebuffer outputs) is better than what we can do now.

**madscientist159** · 20 October 2019, 10:31 PM

Originally posted by Qaridarium

I think 14nm will be very cheap in 2020 because then IBM power10 will have 7nm and also Intel will have 7nm for all products.

so all the 14nm fabs will be free for low-cost manufacturing.

That's not how things work, unfortunately. 14nm will be a high end (high cost) process for a long time.

You might have more of a point with e.g. 32nm, which isn't a bad place to be honestly, but you're not going to compete with the big boys (AMD/Intel/IBM/ARM) on node size until your volumes are high enough that you can recover your manufacturing costs at the smaller node sizes.

Announcement

Libre RISC-V Open-Source Effort Now Looking At POWER Instead Of RISC-V

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment