Announcement

**kvuj** · 18 November 2021, 09:24 PM

That's pretty cool. I'd imagine machine learning would be great at this sort of processing huge amounts of data and finding anomalies.

**sdack** · 18 November 2021, 09:37 PM

... And next comes the GPLv4, denying the use of source code for the purpose of training an AI unless the AI itself is open-sourced under a GPL license.

Luckily is it under the MIT License.

**Paradigm Shifter** · 18 November 2021, 10:38 PM

So humans have to define what is "good code".

Hopefully the definition will be "does exactly what it needs to, doesn't do anything it doesn't need to, with zero bugs, unexpected behaviours or vulnerabilities".

How much code fits that definition?

print("Hello world!")

And even that is doubtful depending on the language chosen.

**CommunityMember** · 18 November 2021, 11:10 PM

Originally posted by kvuj View Post

That's pretty cool. I'd imagine machine learning would be great at this sort of processing huge amounts of data and finding anomalies.

No such tool is perfect. But perfect is the enemy of good enough, and the various code analysis tools have identified code areas that are, at least, sufficiently questionable to be reviewed. I will take and review such advice each and every time.

**pegasus** · 19 November 2021, 03:10 AM

Originally posted by Paradigm Shifter View Post

So humans have to define what is "good code".

I'd start with Spark ADA as a requirement and then work from there ...

**oleid** · 19 November 2021, 03:42 AM

Originally posted by sdack View Post

... And next comes the GPLv4, denying the use of source code for the purpose of training an AI unless the AI itself is open-sourced under a GPL license.

Luckily is it under the MIT License.

there are ai driven snippet generators, but the question arises: what license does the generated code follow? It could stem from gpl.

**Old Grouch** · 19 November 2021, 04:35 AM

Originally posted by oleid View Post

there are ai driven snippet generators, but the question arises: what license does the generated code follow? It could stem from gpl.

Well, replace 'AI' with 'human', and copyright law applies, with the attendant problems of defining what is a 'fair use' sized snippet. My view would be is that if any GPL code were used to train the AI, then either all the AI's output must be GPL licenced or generated code should be accompanied by a decision tree record (or equivalent) showing all generated code was derived from sources with licences compatible with the licence proposed for the code output by the AI.

The problem is quite simple: don't train the AI on GPL licensed code unless you are happy for the AI's output to be GPL licensed. The same would apply for BSD-n licensed code, or indeed any other licence other than 'public domain'. Essentially, AIs should be trained on public domain code (if it exists) together with code licensed with licences compatible with the licence you expect the AIs code to be used under.

**skeevy420** · 19 November 2021, 05:11 AM

Originally posted by Old Grouch View Post

Well, replace 'AI' with 'human', and copyright law applies, with the attendant problems of defining what is a 'fair use' sized snippet. My view would be is that if any GPL code were used to train the AI, then either all the AI's output must be GPL licenced or generated code should be accompanied by a decision tree record (or equivalent) showing all generated code was derived from sources with licences compatible with the licence proposed for the code output by the AI.

The problem is quite simple: don't train the AI on GPL licensed code unless you are happy for the AI's output to be GPL licensed. The same would apply for BSD-n licensed code, or indeed any other licence other than 'public domain'. Essentially, AIs should be trained on public domain code (if it exists) together with code licensed with licences compatible with the licence you expect the AIs code to be used under.

That makes so much sense it'll never work.

I'm getting back to the eclipse. 31F so I'm freezing my ass off.

**M@yeulC** · 19 November 2021, 06:45 AM

Originally posted by Old Grouch View Post

The problem is quite simple: don't train the AI on GPL licensed code unless you are happy for the AI's output to be GPL licensed. The same would apply for BSD-n licensed code, or indeed any other licence other than 'public domain'. Essentially, AIs should be trained on public domain code (if it exists) together with code licensed with licences compatible with the licence you expect the AIs code to be used under.

That is the obvious answer I think. Those who disagree are jumping through hoops as they want the best of both worlds (freely usable output code, trained on any license).

Note that 'public domain' isn't the only answer. Microsoft could have trained it on their proprietary code too. Did they? I'm not sure they did, for "obvious" reasons. Double standards... It can be proprietary code if the license allows training with it.

Now, of course MIT and BSD can in most cases be combined with proprietary code... Github should have restricted their training set.

Announcement

Intel Releases ControlFlag 1.0 For AI-Driven Detection Of Bugs Within C Code

Intel Releases ControlFlag 1.0 For AI-Driven Detection Of Bugs Within C Code

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment