Libre RISC-V Snags $50k EUR Grant To Work On Its RISC-V 3D GPU Chip

starshipeleven replied

05 June 2019, 07:43 AM
Originally posted by uid313 View Post

Besides primitive operations, will it support hardware-accelerated cryptography and encoding? Such as AES-256 or AV1 decoding?
Besides a basic instruction set, will it support advanced instructions like those in SSE4, FMA and AVX-512?
Will it support virtualization?

That's a GPU. All the stuff you list are either hardware accelerators (independent from the GPU cores) or CPU instructions that make no sense on a GPU.
Likes 5
Leave a comment:
the_scx replied

05 June 2019, 07:36 AM
Originally posted by kpedersen View Post

The original article mentioned "for mobile devices". Sod that, if this project is successful, I feel it could be a fantastic asset for all types of devices, even those that are not locked down pieces of consumer / gamer shite!

Sure, there will be some backlash and bad press saying that "it isn't as fast as a Geforce from 5 years ago" but I would still exclusively buy libre GPUs based on this technology and never look back to the dark old days ever again!

This GPU (or rather the "Vulkan accelerator") does not even have the performance of NVIDIA graphics chips from 15 years ago.
Please just look at the spec:
- 720p@25FPS - A lot of smartphones provide HiDPI display. I believe that something around 1080p is currently the bare minimum for modern smartphones. And 25 FPS also does not impress.
- 100 Mpixels/sec - The same result has been achieved by NVIDIA Riva 128 from 1997...
- 30 Mtriangles/sec - These results are comparable to those of the first Xbox (2001). It used NV2A, a derivative of the GeForce 3. According to the spec sheet, "the RSX in the PS3 (2006) has something like 250 million triangles per second". Anyway, "today pipelines are too complex to be measured by such unit of measurement".
- 5-6 GFLOPs - This performance is comparable to the PlayStation 2 (2000) with 147 MHz GraphicsSynthesizer. Adreno 640 has 898.56 GFLOPS in FP16 and 449.28 GFLOPS in FP32! Maybe 10 years ago 5-6 GFLOPs wouldn't look so bad on the smartphone market, but today it only makes people laugh.
As you can see, it is definitely not suitable for any kind of modern smartphone, even a low-end (~100 USD). And it is not even finished and won't be before 2020!

Maybe this performance would be enough for cheap smartwatches. However, power consumption (2.5 W) is definitely too high. The 38mm variant of the Apple Watch has a 3.8 V 0.78 W·h (205 mA·h) battery which is able to power the device for many hours! For the same reason, this RISC-V solution is unsuitable for digital photo frames, weather forecast stations or similar devices.

That's what a modern RISC-V based smartwatch could look like. 😉

Additional battery cells to power RISC-V based mobile devices. 😉

We have one more problem here - there is no mature software mobile platform on RISC-V at all. All current and planed Linux solution are tied to x86 and ARM CPUs. This includes Tizen, Sailfish OS, Ubuntu Touch, KaiOS and PureOS. What is worse, port to RISC-V wouldn't be so easy. For example, PureOS Store is supposed to be based around flatpaks. However, the Freedesktop runtime (as well as its derivatives: GNOME and KDE) doesn't support RISC-V. Is is available only on ARM (ARMv7 and AArach64) and x86 (x86-32 and x86-64).

I am not saying that this RISC-V solution is completely useless. I believe that there are some applications where it could fit. In my option, this initiative has more sense than OGP (Open Graphics Project). However, do not expect a mass adaptation in commercial devices. It just won't happen.

Last edited by the_scx; 05 June 2019, 02:45 PM.
Leave a comment:
oiaohm replied

05 June 2019, 07:28 AM
Originally posted by discordian View Post

Then we disagree in terms used, as a studied mathematician I don't want to be involved in anything practical and related to the real world =)

Mathematically sound from applied security is broader than most who have done course work in mathematics think. So you are alone thinking that doing maths is not anything practical and related to the real world.

Mathematically Verified Silicon you start seeing coming out of CSIRO data61 related projects. This is where the design is being put through full formal mathematical proofs of function.

Sorry the separating between between practical and your maths is basically gone in the security side. You say something is secure you need to put up a mathematical proof covering all the possible problems proving they don't exist. CSIRO and other employ a lot of mathematicians to in fact do these insanely complex proof systems.

Yes of course Physics people lay out the basic structures that the mathematician has to base proof around.
Likes 1
Leave a comment:
oiaohm replied

05 June 2019, 07:14 AM
Originally posted by lkcl View Post

i have been talking with jean-paul from LIP6.fr (alliance / coriolis2) - and the answer would appear to be, amazingly, "not a lot". the reason is because the design layout completely scales linearly.

so as long as you can still get 7nm "cells" (as they are called) that fit exactly with the 28nm version that you did, you have pretty much zero layout changes needed.

I did write 14nm or 7nm for a reason. You were asked straight down to 7nm.

14nm has thick enough tracks to basically work fairly well with 28nm voltage and amps. I suspect this is part of the reason why AMD is doing their chiplet model/SiP the way they are. 14 nm to provide the power feeds out of the system in package (SiP) and the 7nm and smaller for the high performance parts inside the SiP.

Basically there there is a problem when you go under 14nm and try to run external memory controllers and the like with power handling. Yes basically not having enough silicon to handle power switching. Same thing with high voltage mosfets using like 120nm process or equal.

Scaling a design down to 7nm is simple having it work with only 7nm parts that is another problem and maybe impossible. Performance you will want you high performance parts done in as small as nm as possible. Outside signal parts seam to hit wall at 10-14nm.

We are basically starting to have different silicon production walls come up where different sections of the design can only go to X nm and no smaller. Yes this is happening well before the absolute production wall.
Likes 1
Leave a comment:
discordian replied

05 June 2019, 07:13 AM
Originally posted by oiaohm View Post

Mathematically sound goes down in power analysis, em and so on. Same things that leak information also can allow outside interference.

Then we disagree in terms used, as a studied mathematician I don't want to be involved in anything practical and related to the real world =)
Leave a comment:
discordian replied

05 June 2019, 07:11 AM
lkcl : sidechannels will be with us for a very long time, I was blown away by spectre et all, and had a talk with a student researching communicating via sidechannel over networks. Theres alot scary stuff possible aswell.

I guess you are in a even worse spot with crypto since you are trying to be independent of the semiconductor fabs and likely use Verilog or something similar. Maybe some halfway measure would be to add something similar to AES-NI, at which point its atleast not worse than using pure software.
Likes 1
Leave a comment:
oiaohm replied

05 June 2019, 07:01 AM
Originally posted by discordian View Post

That's an algorithm's issue, not about bringing this to hardware.

I would guess that the issue is rather to make sure that you can't infer from differing timings or EM whether your encryption hits some "weak spots" leaking back information about a key.
TPM Chips go even further in a way that they have a physical layout where its really hard to get back an internal key, even if someone has considerable resources to scratch away chip layers and measure the stored bits (probably only theoretical for now).

Mathematically sound goes down in power analysis, em and so on. Same things that leak information also can allow outside interference.

https://www.cl.cam.ac.uk/~sps32/cardis2016_sem.pdf

That scratch away chip layers and measure the stored bits is not theoretical it requires a electron microscope.and proven possible in 2016 and is functional all the way into 3nm production/5nm real structures at least. Is more of a question will your attacker have these resources.

When adding crypto design that is already sound using the right balance in gates and everything else inserting it into a design you just have to make sure you don't modify it and screw it up.
Leave a comment:
lkcl replied

05 June 2019, 06:44 AM
Originally posted by discordian View Post

That's an algorithm's issue, not about bringing this to hardware.

I would guess that the issue is rather to make sure that you can't infer from differing timings or EM whether your encryption hits some "weak spots" leaking back information about a key.

i was stunned to learn, from a power analysis talk given at the Chennai RISC-V conference, that even just *having* a Floating-Point extension was enough to leak side-channel information sufficient to be exploitable and work out an internal key on a symmetric cipher. i cannot recall the exact details, it was something to do i think with the instruction decode phase: just the *presence* of the FP extension was enough to trigger transistors in that region, even though no actual FP operations were carried out. that leaked information about what instructions *were* being carried out, and at what time.

amazing work, i was blown away at how effective the statistical inference techniques were that the presenter used, given not a lot of data.

from people that i met at Barcelona i learned that high security ASICs start by only providing the GDS files to the Foundries, that the top and bottom layers are pure metal as a Faraday Cage, and it progresses from there.

caches are out, obviously. as in: L1 *and* L2 caches, because those leak timing information as well as power. if you are doing power analysis, and are going "true paranoid", you need *differential pairs* on every single track, but, worse than that, you need one diff-pair to signal a 0 and one diff pair to signal a 1! this gives you at least 4x the gate count however you are guaranteed to have the same power consumption under all state-transition switching conditions (full power).

properly secure crypto hardware design is *awful* and your customers then complain it's not fast enough! it's no wonder "secure" RFID chips use piss-poor RSA keys: the banks won't pay for something that takes too long for a customer to pay for their goods at a store just because the RSA key-signing takes 20 seconds instead of 0.5! sigh, can we not talk about properly secure crypto, please?
Likes 1
Leave a comment:
lkcl replied

05 June 2019, 06:32 AM
Originally posted by oiaohm View Post

Its more than the timing guarantees you encryption engine has to be 100 percent mathematically sound so.

i've worked with cryptography (designed a parallel block cipher algorithm, used NIST.gov's STS to check hundreds of gigabytes of output), and was a cryptographic FAE for Aspex Semiconductors back in 2004, so know what you're referring to. so you thus also know a bit more about why i'm warier of including an arbitrary crypto engine than most

I would take if I was you on crypto a watch this space. I guess you are using chisel https://chisel.eecs.berkeley.edu/.

ah no, definitely not. we went through quite a comprehensive (long) evaluation process: chisel was very specifically excluded for several reasons. the first is that it relies on java. the second that it is human-unreadable (the code, not its auto-generated output). the third is that its auto-generated output is also unreadable (it generates a state machine that removes names and makes it near-impossible to verify that the verilog output is correct). i'd continue but i'm sure you get the picture.

myhdl proved to be too awkward: you are restricted to a *subset* of python. i love the concept, however the subset is so painful to work with it just had to be scrubbed from the list.

verilog we knew in advance that because it was designed as a unit test suite system back in the 1980s (or so) it has all the hallmarks of procedural programming languages (PASCAL, BASIC) from around that era. our design is so complex that not having OO would make developing in verilog both costly in time and resources as well as very risky. only "unauthorised" versions of the verilog standard such as those developed by Cadence support structs that may be passed into verilog modules as the types of variables! everyone else who uses tools (iverilog) that stick to the standard has to pass in potentially hundreds of signals!

pyrtl did not have anything like the level of adoption that would give us a community that we could ask for help if there were any difficulties (myhdl on the other hand does, but its subsetting rules it out).

bluespec was considered for its high performance simulation speed (10-20x faster than iverilog compiled to c), as well as the 100% guarantee that a correctly compiled BSV program *will* synthesise (this because bluespec is written in haskell). however, with no libre tools, it had to be ruled out. we cannot make a libre processor if it relies on proprietary tools. it would be pure hypocrisy.

that left migen / nmigen. we did discuss the idea of writing a much faster HDL that would allow us to simulate designs much more rapidly, as well as give us strong typing. however i pointed out that without a community based around it (like there is with migen - see enjoydigital "litex", and many more projects) we would be completely on our own if difficulties cropped up.

so, nmigen it is, and we're up to 15,000 lines of code already, in about 6 months.

https://www.youtube.com/watch?v=dcW6a7SO2zE and https://github.com/scarv/xcrypto few others are working on for risc-v crypto support.

.

that's very interesting to me: bristol is where... no, i can't talk about it

Now once method is truly decided and part of the risc-v ISA you will most likely be able to pick up solid define reference parts made in chisel to integrate in. But until the ISA arguments are sorted out you could be wasting a lot of time on a path that going to be scrapped with crypto. Yes once it is sorted out their will be reference implementations to go from with risc-v.

precisely. it's not ready yet. we have to be quite pathological about this, cutting out things that could, if added on an already extremely low budget, throw us way off track. NLNet's Grant, as mentioned above, is based on completion of milestones.

I don't see in time that is going take a large wad of cash for crypto because you should not have to be absolutely reinventing the wheel if you wait until the right time to implement it. Basically the development cost of doing crypto on a risc-v chip should keep on reducing.

yes, that makes sense.

if there happens to be a core available in time that is libre-licensed, and its inclusion is within budget, we can look at it. otherwise, we go with software-based crypto, and a hard macro version will have to go into the next chip.
Likes 3
Leave a comment:
discordian replied

05 June 2019, 05:58 AM
Originally posted by oiaohm View Post

Its more than the timing guarantees you encryption engine has to be 100 percent mathematically sound so.

That's an algorithm's issue, not about bringing this to hardware.

I would guess that the issue is rather to make sure that you can't infer from differing timings or EM whether your encryption hits some "weak spots" leaking back information about a key.
TPM Chips go even further in a way that they have a physical layout where its really hard to get back an internal key, even if someone has considerable resources to scratch away chip layers and measure the stored bits (probably only theoretical for now).
Likes 1
Leave a comment:

Announcement

Libre RISC-V Snags $50k EUR Grant To Work On Its RISC-V 3D GPU Chip

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: