Announcement

**lkcl** · 05 June 2019, 04:28 AM

as always, many thanks to michael for continuing to report on this project. we're going after a lower threshold as it meets a client's ultra-low-power requirements for an embedded application. however there is a secondary reason, which i outlined here https://slashdot.org/comments.pl?sid...96210#comments (won't copy it in full).

in a nutshell we "prove the design" with a much simpler (much lower) target - there happens to be a customer who wants it which is a bonus - and *after* that initial product is done and successful, *then* we up the ante. the core design is already taking that into account. with Mitch Alsup's help, the only reason why we're not doing 6 or higher issue and octa-core (or higher) is because it would blow the power budget for the early adopter / customer.

that, and it would be such a large chip that it would easily cost USD $1m even on the MVP Programme to have just 50 samples made. by sticking to the lower target, it will only be $100k. if we make the same design error on both chips, one of them will cost USD $1m to rectify, the other will cost $100k to rectify. which do you think is the most sensible chip to design first - the big one or the small one?

**lkcl** · 05 June 2019, 04:40 AM

Originally posted by uid313 View Post

Is it possible to "refactor" it from 28 nm to 7 nm?
How much work is that?

i have been talking with jean-paul from LIP6.fr (alliance / coriolis2) - and the answer would appear to be, amazingly, "not a lot". the reason is because the design layout completely scales linearly.

so as long as you can still get 7nm "cells" (as they are called) that fit exactly with the 28nm version that you did, you have pretty much zero layout changes needed.

we still have a year to go before we even get seriously to that point: i am looking forward to seeing the results of the LIP6.fr team's efforts to lay out a (small) RISC-V core using an adapted version of FreePDK45 to the alliance / coriolis2 tools.

if that is successful, our task just got a whole lot easier.

Besides primitive operations, will it support hardware-accelerated cryptography and encoding? Such as AES-256 or AV1 decoding?

most crypto processors need specialist design expertise to create a hard macro that has very specific timing guarantees, and so on. i am... reluctant to commit resources to such an effort. if on the other hand a customer comes forward and says, "we need crypto! here's a wad of cash!", i will be deeeelighted to investigate properly

Besides a basic instruction set, will it support advanced instructions like those in SSE4, FMA and AVX-512?

ha, the SIMD trap. do read this, it will change your perspective on SIMD forevaaaah

https://www.sigarch.org/simd-instruc...dered-harmful/

we're in the rather odd position of needing to do our own Vectorisation Extension (see article by michael, and my followup comment https://www.phoronix.com/scan.php?pa...ple-V-Detailed)

we need to do some custom 3D opcodes (the latest one being considered is "texturisation" - see http://bugs.libre-riscv.org/show_bug.cgi?id=91) and that needs to be parallel.

the parallelism is added in the form of a "hardware for-loop" if that makes any sense. see the comment at the end of the article above, for details.

Will it support virtualization?

we just raised a bugreport to look at hypervisor mode. http://bugs.libre-riscv.org/show_bug.cgi?id=90 as it appears to be sufficiently straightforward, and the code added to OpenSBI recently, that it's stable and so worthwhile investigating. follow the link-chain in bug#90 to OpenSBI's mailing list if you're interested to know more.

**lkcl** · 05 June 2019, 04:54 AM

Originally posted by brent View Post

Unfortunately that's peanuts. Designing a capable GPU is a year-long effort for a team of skilled engineers. 50K only pays a single engineer for a few months.

from the West, yes. i'm living in Taiwan, now: my costs for myself and my family are a mere USD $1500 a month. Jacob needs USD $1k a month. therefore, EUR 50k covers both of us for well over a year, and is just the first Grant.

And I still don't trust any efforts lead by lkcl after what has happened in the past.

oh? you're aware that several people have lied both publicly and privately about conversations that they've had with me, causing the EOMA68 project to be set back by at least three years, due to the harm that they caused? those people are directly responsible for the ongoing environmental damage that EOMA68 *would* have reduced if it had been possible to complete earlier, because more volunteers would have helped out to get it to a crucial threshold point.

as it is, there is barely enough money left - just enough to complete the manufacturing production run - and that's with me, personally, shovelling every penny of my income that i can spare into it to bolster it, and with various other extremely welcome small donations from people across the world who believe in the project.

bottom line: *be careful* before you pass judgement on me, ok?

Let's hope the money will be put to good use,

the NLNet Foundation has strict auditing requirements. also, as Jacob mentioned in a later comment, they only pay out on 100% demonstrabe completion of agreed milestones. no completion, no payment. and it's libre source code, available *right now*, so even you - or anyone can check - *right now* if the money is producing something of value.

but I'm very skeptical about it.

skepticism is healthy, it means that you ask pertinent questions and raise issues that need proper investigation and answers, which would potentially otherwise be missed. do keep it up, however, please, do look up "The Mandala Effect", and once you have, contact me and we can have a chat ok?

**kpedersen** · 05 June 2019, 05:03 AM

The original article mentioned "for mobile devices". Sod that, if this project is successful, I feel it could be a fantastic asset for all types of devices, even those that are not locked down pieces of consumer / gamer shite!

Sure, there will be some backlash and bad press saying that "it isn't as fast as a Geforce from 5 years ago" but I would still exclusively buy libre GPUs based on this technology and never look back to the dark old days ever again!

**oiaohm** · 05 June 2019, 05:20 AM

Originally posted by lkcl View Post

most crypto processors need specialist design expertise to create a hard macro that has very specific timing guarantees, and so on. i am... reluctant to commit resources to such an effort. if on the other hand a customer comes forward and says, "we need crypto! here's a wad of cash!", i will be deeeelighted to investigate properly

Its more than the timing guarantees you encryption engine has to be 100 percent mathematically sound so.

I would take if I was you on crypto a watch this space. I guess you are using chisel https://chisel.eecs.berkeley.edu/.

https://www.youtube.com/watch?v=dcW6a7SO2zE and https://github.com/scarv/xcrypto few others are working on for risc-v crypto support.

Now once method is truly decided and part of the risc-v ISA you will most likely be able to pick up solid define reference parts made in chisel to integrate in. But until the ISA arguments are sorted out you could be wasting a lot of time on a path that going to be scrapped with crypto. Yes once it is sorted out their will be reference implementations to go from with risc-v.

I don't see in time that is going take a large wad of cash for crypto because you should not have to be absolutely reinventing the wheel if you wait until the right time to implement it. Basically the development cost of doing crypto on a risc-v chip should keep on reducing.

**discordian** · 05 June 2019, 05:58 AM

Originally posted by oiaohm View Post

Its more than the timing guarantees you encryption engine has to be 100 percent mathematically sound so.

That's an algorithm's issue, not about bringing this to hardware.

I would guess that the issue is rather to make sure that you can't infer from differing timings or EM whether your encryption hits some "weak spots" leaking back information about a key.
TPM Chips go even further in a way that they have a physical layout where its really hard to get back an internal key, even if someone has considerable resources to scratch away chip layers and measure the stored bits (probably only theoretical for now).

**lkcl** · 05 June 2019, 06:32 AM

Originally posted by oiaohm View Post

Its more than the timing guarantees you encryption engine has to be 100 percent mathematically sound so.

i've worked with cryptography (designed a parallel block cipher algorithm, used NIST.gov's STS to check hundreds of gigabytes of output), and was a cryptographic FAE for Aspex Semiconductors back in 2004, so know what you're referring to. so you thus also know a bit more about why i'm warier of including an arbitrary crypto engine than most

I would take if I was you on crypto a watch this space. I guess you are using chisel https://chisel.eecs.berkeley.edu/.

ah no, definitely not. we went through quite a comprehensive (long) evaluation process: chisel was very specifically excluded for several reasons. the first is that it relies on java. the second that it is human-unreadable (the code, not its auto-generated output). the third is that its auto-generated output is also unreadable (it generates a state machine that removes names and makes it near-impossible to verify that the verilog output is correct). i'd continue but i'm sure you get the picture.

myhdl proved to be too awkward: you are restricted to a *subset* of python. i love the concept, however the subset is so painful to work with it just had to be scrubbed from the list.

verilog we knew in advance that because it was designed as a unit test suite system back in the 1980s (or so) it has all the hallmarks of procedural programming languages (PASCAL, BASIC) from around that era. our design is so complex that not having OO would make developing in verilog both costly in time and resources as well as very risky. only "unauthorised" versions of the verilog standard such as those developed by Cadence support structs that may be passed into verilog modules as the types of variables! everyone else who uses tools (iverilog) that stick to the standard has to pass in potentially hundreds of signals!

pyrtl did not have anything like the level of adoption that would give us a community that we could ask for help if there were any difficulties (myhdl on the other hand does, but its subsetting rules it out).

bluespec was considered for its high performance simulation speed (10-20x faster than iverilog compiled to c), as well as the 100% guarantee that a correctly compiled BSV program *will* synthesise (this because bluespec is written in haskell). however, with no libre tools, it had to be ruled out. we cannot make a libre processor if it relies on proprietary tools. it would be pure hypocrisy.

that left migen / nmigen. we did discuss the idea of writing a much faster HDL that would allow us to simulate designs much more rapidly, as well as give us strong typing. however i pointed out that without a community based around it (like there is with migen - see enjoydigital "litex", and many more projects) we would be completely on our own if difficulties cropped up.

so, nmigen it is, and we're up to 15,000 lines of code already, in about 6 months.

https://www.youtube.com/watch?v=dcW6a7SO2zE and https://github.com/scarv/xcrypto few others are working on for risc-v crypto support.

.

that's very interesting to me: bristol is where... no, i can't talk about it

Now once method is truly decided and part of the risc-v ISA you will most likely be able to pick up solid define reference parts made in chisel to integrate in. But until the ISA arguments are sorted out you could be wasting a lot of time on a path that going to be scrapped with crypto. Yes once it is sorted out their will be reference implementations to go from with risc-v.

precisely. it's not ready yet. we have to be quite pathological about this, cutting out things that could, if added on an already extremely low budget, throw us way off track. NLNet's Grant, as mentioned above, is based on completion of milestones.

I don't see in time that is going take a large wad of cash for crypto because you should not have to be absolutely reinventing the wheel if you wait until the right time to implement it. Basically the development cost of doing crypto on a risc-v chip should keep on reducing.

yes, that makes sense.

if there happens to be a core available in time that is libre-licensed, and its inclusion is within budget, we can look at it. otherwise, we go with software-based crypto, and a hard macro version will have to go into the next chip.

**lkcl** · 05 June 2019, 06:44 AM

Originally posted by discordian View Post

That's an algorithm's issue, not about bringing this to hardware.

I would guess that the issue is rather to make sure that you can't infer from differing timings or EM whether your encryption hits some "weak spots" leaking back information about a key.

i was stunned to learn, from a power analysis talk given at the Chennai RISC-V conference, that even just *having* a Floating-Point extension was enough to leak side-channel information sufficient to be exploitable and work out an internal key on a symmetric cipher. i cannot recall the exact details, it was something to do i think with the instruction decode phase: just the *presence* of the FP extension was enough to trigger transistors in that region, even though no actual FP operations were carried out. that leaked information about what instructions *were* being carried out, and at what time.

amazing work, i was blown away at how effective the statistical inference techniques were that the presenter used, given not a lot of data.

from people that i met at Barcelona i learned that high security ASICs start by only providing the GDS files to the Foundries, that the top and bottom layers are pure metal as a Faraday Cage, and it progresses from there.

caches are out, obviously. as in: L1 *and* L2 caches, because those leak timing information as well as power. if you are doing power analysis, and are going "true paranoid", you need *differential pairs* on every single track, but, worse than that, you need one diff-pair to signal a 0 and one diff pair to signal a 1! this gives you at least 4x the gate count however you are guaranteed to have the same power consumption under all state-transition switching conditions (full power).

properly secure crypto hardware design is *awful*

and your customers then complain it's not fast enough! it's no wonder "secure" RFID chips use piss-poor RSA keys: the banks won't pay for something that takes too long for a customer to pay for their goods at a store just because the RSA key-signing takes 20 seconds instead of 0.5! sigh, can we not talk about properly secure crypto, please?

**oiaohm** · 05 June 2019, 07:01 AM

Originally posted by discordian View Post

That's an algorithm's issue, not about bringing this to hardware.

I would guess that the issue is rather to make sure that you can't infer from differing timings or EM whether your encryption hits some "weak spots" leaking back information about a key.
TPM Chips go even further in a way that they have a physical layout where its really hard to get back an internal key, even if someone has considerable resources to scratch away chip layers and measure the stored bits (probably only theoretical for now).

Mathematically sound goes down in power analysis, em and so on. Same things that leak information also can allow outside interference.

https://www.cl.cam.ac.uk/~sps32/cardis2016_sem.pdf

That scratch away chip layers and measure the stored bits is not theoretical it requires a electron microscope.and proven possible in 2016 and is functional all the way into 3nm production/5nm real structures at least. Is more of a question will your attacker have these resources.

When adding crypto design that is already sound using the right balance in gates and everything else inserting it into a design you just have to make sure you don't modify it and screw it up.

**discordian** · 05 June 2019, 07:11 AM

lkcl : sidechannels will be with us for a very long time, I was blown away by spectre et all, and had a talk with a student researching communicating via sidechannel over networks. Theres alot scary stuff possible aswell.

I guess you are in a even worse spot with crypto since you are trying to be independent of the semiconductor fabs and likely use Verilog or something similar. Maybe some halfway measure would be to add something similar to AES-NI, at which point its atleast not worse than using pure software.

Announcement

Libre RISC-V Snags $50k EUR Grant To Work On Its RISC-V 3D GPU Chip

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment