Announcement

Collapse
No announcement yet.

Shotcut 24.10 Open-Source Video Editor Adds Initial AI Feature

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quackdoc
    replied
    Originally posted by ehansin View Post

    Thanks for the heads up and some details on what you like about it. I have heard of it and when reminded here remembered the 0.2 rewrite stuff and talk of it being possibly a really good option with some unique features. I just installed via Brew on the macOS laptop I'm on right now. Looks like a pretty old version (says April 2019!), but will at least try and I see the Olive site has a version for 2024, so maybe that is a better option
    yeah that would be the 0.1 version, 0.2 has a lot of changes, but some folk are still on 0.1 for various reasons

    Leave a comment:


  • sophisticles
    replied
    Originally posted by stormcrow View Post
    Caution. While the Whisper code and "model weights" are lic. MIT, I don't see any way of auditing the source material those language models were trained on. This could still be a legal landmine given the state copyright is in today.
    I'm not too worried about that, I assume that they have done their due diligence.
    Last edited by sophisticles; 31 October 2024, 12:53 PM.

    Leave a comment:


  • pong
    replied
    Originally posted by stormcrow View Post

    Caution. While the Whisper code and "model weights" are lic. MIT, I don't see any way of auditing the source material those language models were trained on. This could still be a legal landmine given the state copyright is in today.
    Very abstractly I see your point about ML models "in general". But at the coarsest "pragmatic" level of risk wrt. the end user my initial reaction is to think that it would "matter" more to the end user when using generative ML models and creating generated content that they expect to be free of IP infringements but might have some if the ML model regurgitates nearly literally substantial parts of its (possibly non open to for non transformative reuse) training material as the output.

    So in that case, generating images, generating code, generating music might be more areas of concern.

    But whisper basically AFAIK generates more or less individual tokens and those are more or less syllables or something like individual words or coarse descriptions of non-speech sounds etc. It's not AFAIK likely to generate absent 95% literally corresponding input the poetry of shakespeare or the lyrics of an opera song or anything else that's of sufficient length / uniqueness / etc. which one would likely be subject to copyright / IP.

    At worst in any ordinary circumstance that comes to mind I suppose it MIGHT be able to generate text that describes something using some trademarked descriptive text like "Growl like the Hulk" based on some input sound, but what you're in theory
    going to get out is a ~90% accurate "transcription" of your input audio word by word so if you've got output IP concerns, you
    probably have the same with the input.

    The more abstract case would be merely having / using the model regardless of its output could be something some troll could
    complain about wrt. patent or made-using-my-IP infringement claim affecting many users or users with use case workflows someone is trolling but that's very different and in many cases transcends having to worry about whether the output is "clean/licit".

    Leave a comment:


  • ehansin
    replied
    Originally posted by Quackdoc View Post
    Olive video editor...
    Thanks for the heads up and some details on what you like about it. I have heard of it and when reminded here remembered the 0.2 rewrite stuff and talk of it being possibly a really good option with some unique features. I just installed via Brew on the macOS laptop I'm on right now. Looks like a pretty old version (says April 2019!), but will at least try and I see the Olive site has a version for 2024, so maybe that is a better option

    Leave a comment:


  • caligula
    replied
    Originally posted by eszlari View Post

    AFAIK it's much faster than kdenlive, since it supports GPU effects. The maintainer of shotcut is also the maintainer of MLT.
    Kdenlive is pretty slow considering it's written in C++. Just opening and closing a project takes ages. You can see from the debug messages in the log that something is totally fucked up there.

    Leave a comment:


  • Quackdoc
    replied
    Originally posted by TheDcoder View Post

    What's the first one?
    Olive video editor, The compositing is done completely in GPU with full OCIO colormanagement. It properly handles associated alpha, with custom OCIO configs you can do HDR, color manage from different cameras. The node based editing is basic but very powerful, It can be great for simple things but more complicated effects are not beginner friendly. I currently use a PR which allows me to use custom GLSL shaders for effects when I need something more complex.

    But for me it's more or less perfect. It's simple to use in most cases, powerful when you need it to be, and unlike pretty much every other single foss video editor, it has proper color management support which is crucial for me. And perhaps most importantly working with it is bloody fast. I can edit 4k video on my ryzen 2600 + arc a380 without much issues at all as it has a powerful cache (uses EXR), render resolution and render depth can be changed (f32 f16 u16 u8) (u8 render changes the cache format from EXR to jpeg).

    Development on 0.2 is stalled because the dev is working on porting it from QT to Godot for the UI. QT has proved too troublesome and godot work will make vulkan more viable.

    EDIT: It has a lot of downsides, working with audio for instance kinda sucks, supplying your own ocio config is bugged if you do it via UI, and so you need to start olive using OCIO env var instead, it's missing a lot of effects (the GLSL PR has accompanying files that adds a lot of them back) etc. But even with the MANY downsides of olive, the benefits of olive make it just too good for me to use anything else.

    Leave a comment:


  • ehansin
    replied
    Originally posted by TheDcoder View Post
    What's the first one?
    First thought I had as well Obviously an opinion - but I come here for opinions, investigate, then form my own opinions!

    Leave a comment:


  • stormcrow
    replied
    Originally posted by sophisticles View Post
    This is the best open source NLE and works equally well on Windows and Linux, in fact I would say it works a bit better on Windows due to better hardware encoding support.

    While this new AI feature is nice, I would like to see AI upscaling or video enhancement that is NPU accelerated.

    Still this is legally free and works well, so can't really complain.
    Caution. While the Whisper code and "model weights" are lic. MIT, I don't see any way of auditing the source material those language models were trained on. This could still be a legal landmine given the state copyright is in today.

    Leave a comment:


  • TheDcoder
    replied
    Originally posted by Quackdoc View Post
    While I'm not the biggest fan of shotcut, I do find it the second best free video editor.
    What's the first one?

    Leave a comment:


  • Quackdoc
    replied
    While I'm not the biggest fan of shotcut, I do find it the second best free video editor. This is a really nice feature "By making use of OpenAI's Whisper and the Whisper.cpp open-source project, there is now AI-driven speech-to-text support integrated in this video editor." This is dope, I don't see myself using this as I already have a pipeline to generate subtitles by using whisper.

    I use whisper all the time, It's phenomenal. easily the best speech to text I have ever used. Glad to see shotcut get this feature.

    Leave a comment:

Working...
X