Announcement

Collapse
No announcement yet.

Intel Releases ControlFlag 1.0 For AI-Driven Detection Of Bugs Within C Code

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
    oiaohm
    Senior Member

  • oiaohm
    replied
    Originally posted by Old Grouch View Post
    Well, replace 'AI' with 'human', and copyright law applies, with the attendant problems of defining what is a 'fair use' sized snippet. My view would be is that if any GPL code were used to train the AI, then either all the AI's output must be GPL licenced or generated code should be accompanied by a decision tree record (or equivalent) showing all generated code was derived from sources with licences compatible with the licence proposed for the code output by the AI.

    The problem is quite simple: don't train the AI on GPL licensed code unless you are happy for the AI's output to be GPL licensed. The same would apply for BSD-n licensed code, or indeed any other licence other than 'public domain'. Essentially, AIs should be trained on public domain code (if it exists) together with code licensed with licences compatible with the licence you expect the AIs code to be used under.
    Legally there is no fair usage snippet size under copyright.
    https://info.legalzoom.com/article/h...ght-permission
    • The purpose and character of the use. What use are you making of the copyrighted work? Nonprofit, noncommercial use is more likely to be considered fair than if you are looking to profit.
    • The nature of the copyrighted work. Is the work used more creative and thus more closely related to copyright law's purpose of protecting creative expression? Or is it more factual and technical and thereby less susceptible to a variety of forms of expression?
    • The amount of the original work that was used. Was only a small portion of the copyrighted work used? Note that this is only one factor considered, and there is no specific rule about how much use is fair use.
    • The effect of the use on the original work's value or market. Is your use likely to harm or undercut the market for the copyright holder's work?
    Yes someone had made up a word you copy it. Yes a single unique made up word that you have copied can be copyright infrignment because that is a creative expression. Yes the 3 and 4 points of copyright assessment is more calculating cost not if you have breached copyright law. Point 3 is why people have the mistaken idea that there is a min amount.

    There is no AI I know of that you can trust to solve those 4 questions. Also remember taking snippets may not only breach copyright law. You have trademark law and patent protected as well. Yes you can take a snippet from one GPLv3 work into another GPLv3 work and end up in court because that snippet has resulted in a miss usage of a trademark. You can also copy a snippet from one MIT licensed work to another MIT licensed work and end up with a patent breach as well because the patent was only licensed to be used with a particular work.

    Originally posted by [email protected] View Post
    That is the obvious answer I think. Those who disagree are jumping through hoops as they want the best of both worlds (freely usable output code, trained on any license).

    Note that 'public domain' isn't the only answer. Microsoft could have trained it on their proprietary code too. Did they? I'm not sure they did, for "obvious" reasons. Double standards... It can be proprietary code if the license allows training with it.

    Now, of course MIT and BSD can in most cases be combined with proprietary code... Github should have restricted their training set.
    No what github did where the AI was providing snippets when you look at the law there is no way to-do it legally with AI without risk. Heck its not possible for a human todo snippet from projects without legal review safely. This is different to the Intel ControlFlag where it trains on source codes then highlights sections in your own source code that are questionable. ControlFlag is designed not to transfer code from one project to the next.

    Controlflag in theory could train on any source code.the way it working is not going to undermine the value of the original work or transfer a fragment of the original work into another work.

    AI training on source code to provide guide to developer of what sections of their code need review is leaving the code alterations in the project developers hands.

    Now if Controlflags started like github providing examples how to fix what its detected then you are going to run into infrignment problems with copyright, trademarks, trade secrets and finally patents at different times. Yes trade secrets you would be able to avoid by controlling the input source code but other three are on the table. Problem is the other three could have you stuck in court for 10 years to get a final ruling and if it against you its going to cost a lot possible everything you have.

    When you have something that companies like IBM and Google and Oracle at times are not sure where the law is with their huge legal teams that is snippets of code with copyright, trademarks and patents transferred into another project there is no hope for a AI to be solving this correctly. Yes the different court rulings in USA by IBM, Google and Oracle over these points at the high court don't all align with each other. So this is a do you fell lucky on how the high court will rule on the day problem this also make legal option not decided on what the ruling really is.

    Some areas in the law are better off avoided. People do not think they need to check code snippets for trademarks and patent declares until they work in a large company and have to do a legal review and get told off by the legal team because not checking that stuff is dangerous.

    Github is be pushing the legal limit with Copilot with the problem here that if this end up ruled against there is going to be a lot of harmed parties. One developer proved that you could get Copilot to provide another projects about box text with trademark and all. Now if that slipped into your production product how to say ouch because it trademark infrignment without question. Copyright is complex hell to work out what is legal. Trademarks and patents are even worse because the fair usage defence is even more limited with those to many cases being non existent.

    Leave a comment:

  • microcode
    Senior Member

  • microcode
    replied
    We should start aggressively tagging CVEs with the patches that introduced them, when there is a patch to point to, there might be some patterns to be learned from the difference between those and all the other patches.

    Leave a comment:

  • FlaSheridn
    Junior Member

  • FlaSheridn
    replied
    FWIW, it didn’t find any significant bugs in our code; a couple of commercial tools did, and CppCheck found one (and reported it hundreds of times).

    Leave a comment:

  • uid313
    Senior Member

  • uid313
    replied
    This could be really useful for unsafe languages such as Python, Ruby and JavaScript. Less so for safer languages such as Rust, Ada and SPARK.

    Leave a comment:

  • sdack
    Senior Member

  • sdack
    replied
    Originally posted by oleid View Post
    there are ai driven snippet generators, but the question arises: what license does the generated code follow? It could stem from gpl.
    Of course, hence my sarcastic comment. This is going to get fun the more we see AI being used on source code and learning from it to then produce something new.

    Some people believe it is ok to use 15s of a copyrighted song, but it is also not right.

    I am only waiting for AIs to replace lawyers. It will get wild.

    Leave a comment:


  • M@yeulC
    replied
    Anyway, I wanted to ask the following: has it been tested on the Linux kernel?

    Leave a comment:


  • M@yeulC
    replied
    Originally posted by Old Grouch View Post
    The problem is quite simple: don't train the AI on GPL licensed code unless you are happy for the AI's output to be GPL licensed. The same would apply for BSD-n licensed code, or indeed any other licence other than 'public domain'. Essentially, AIs should be trained on public domain code (if it exists) together with code licensed with licences compatible with the licence you expect the AIs code to be used under.
    That is the obvious answer I think. Those who disagree are jumping through hoops as they want the best of both worlds (freely usable output code, trained on any license).

    Note that 'public domain' isn't the only answer. Microsoft could have trained it on their proprietary code too. Did they? I'm not sure they did, for "obvious" reasons. Double standards... It can be proprietary code if the license allows training with it.

    Now, of course MIT and BSD can in most cases be combined with proprietary code... Github should have restricted their training set.

    Leave a comment:

  • skeevy420
    Senior Member

  • skeevy420
    replied
    Originally posted by Old Grouch View Post

    Well, replace 'AI' with 'human', and copyright law applies, with the attendant problems of defining what is a 'fair use' sized snippet. My view would be is that if any GPL code were used to train the AI, then either all the AI's output must be GPL licenced or generated code should be accompanied by a decision tree record (or equivalent) showing all generated code was derived from sources with licences compatible with the licence proposed for the code output by the AI.

    The problem is quite simple: don't train the AI on GPL licensed code unless you are happy for the AI's output to be GPL licensed. The same would apply for BSD-n licensed code, or indeed any other licence other than 'public domain'. Essentially, AIs should be trained on public domain code (if it exists) together with code licensed with licences compatible with the licence you expect the AIs code to be used under.
    That makes so much sense it'll never work.

    I'm getting back to the eclipse. 31F so I'm freezing my ass off.

    Leave a comment:

  • Old Grouch
    Phoronix Member

  • Old Grouch
    replied
    Originally posted by oleid View Post

    there are ai driven snippet generators, but the question arises: what license does the generated code follow? It could stem from gpl.
    Well, replace 'AI' with 'human', and copyright law applies, with the attendant problems of defining what is a 'fair use' sized snippet. My view would be is that if any GPL code were used to train the AI, then either all the AI's output must be GPL licenced or generated code should be accompanied by a decision tree record (or equivalent) showing all generated code was derived from sources with licences compatible with the licence proposed for the code output by the AI.

    The problem is quite simple: don't train the AI on GPL licensed code unless you are happy for the AI's output to be GPL licensed. The same would apply for BSD-n licensed code, or indeed any other licence other than 'public domain'. Essentially, AIs should be trained on public domain code (if it exists) together with code licensed with licences compatible with the licence you expect the AIs code to be used under.

    Leave a comment:

  • oleid
    Senior Member

  • oleid
    replied
    Originally posted by sdack View Post
    ... And next comes the GPLv4, denying the use of source code for the purpose of training an AI unless the AI itself is open-sourced under a GPL license.

    Luckily is it under the MIT License.
    there are ai driven snippet generators, but the question arises: what license does the generated code follow? It could stem from gpl.

    Leave a comment:

Working...
X