Firefox 80 To Support VA-API Acceleration On X11

Veto replied

05 July 2020, 01:16 PM
Originally posted by bug77 View Post

That still works out to 15*1920*1080 or ~31MFLOPS every frame (~16ms). A CPU shouldn't even notice that, especially with SIMD.
Of course, it's not just the transformation, so I'd accept a 5-10% CPU overhead. Anything on top of that, just screams of sloppy programming somewhere in the stack.

Uhm, no... this is just plain wrong. FLOPS means FLoating-point Operations Per Second and the burden does not become less, just because you look at a smaller time-interval (hint: the CPU will also only have 16ms worth of processing time per frame ).

Yes, SIMD and other tricks helps - but it doesn't change the fact that video processing is brutally expensive and dedicated HW processing can be much more efficient than the CPU.
Likes 2
Leave a comment:
Vistaus replied

05 July 2020, 12:00 PM
Originally posted by curfew View Post

They switched to Chrome because Firefox was shit. Now Firefox is getting better and the same users will switch back to Firefox unless Chrome catches up. Simple.

Firefox has taken huge steps forward since Fx 75 and finally I can feel satisfied personally.

I'm one of the "same users" and I'm not going to switch back to Firefox as Vivaldi fulfills all of my needs (and more, once M3 will land, which could be any day now according to Jon).

So no, not *every* former Firefox user will switch back.
Leave a comment:
pal666 replied

05 July 2020, 11:58 AM
Originally posted by caligula View Post

Apparently there's a language barrier here. What I meant is, 7nm CPU is more power efficient than say 40nm CPU (original RPi process node).

what i'm trying to say is that rpi does not do video decoding on cpu. that's why comparing their process nodes is silly. compare process nodes of your cpu and some videocard and think why you can't play modern game without videocard

Originally posted by caligula View Post

In a similar way, 7nm GPUs are more power efficient than 40nm GPUs and 7nm DSPs are more power efficient than 40nm DSPs. So the advances in process node technology can benefit all types of video decoding.

sure. new rpi can have faster decoding than old rpi. new intel cpu can have faster decoding than old intel cpu. and new intel cpu can have worse decoding than old rpi because cpus suck at video decoding

Originally posted by caligula View Post

Irrelevant. I wasn't claiming anything like that. My claim was, if a 8 year old $25 computer could decode H.264, a modern $1000 computer should easily be able to decode the same video efficiently, thanks to multiple improvements in hardware technology.

my claim is you are imbecile who can't understand that specialized circuits exist exactly because they are faster than modern $1000 computers

Originally posted by caligula View Post

Intel's latest desktop arch (Comet Lake) is still at 14nm.

exactly, intel is selling you 6 year old shit

Originally posted by caligula View Post

Not really - the modern notebooks are so powerful you can do everything the original RPi does without any kind of hardware acceleration..

moron, your notebook can't be both faster than rpi at videodecoding and using 70% of cpu for 720p
Likes 1
Leave a comment:
pal666 replied

05 July 2020, 11:47 AM
Originally posted by curfew View Post

They switched to Chrome because Firefox was shit. Now Firefox is getting better and the same users will switch back to Firefox unless Chrome catches up. Simple.

everything works in powerpoint. wake me up when firefox regains marketshare in reality
Likes 1
Leave a comment:
bug77 replied

05 July 2020, 10:07 AM
Originally posted by treba View Post

That's assuming all the data is in L1 cache. But I think you are somewhat right - YUV - RGB translation apparently is not the main show stopper. It just adds to it and doing it on the GPU is more efficient. BTW, this motivated the the whole DMABUF implementation in the first place, see https://bugzilla.mozilla.org/show_bug.cgi?id=1580169

Yes, of course, specialized hardware exists for a reason. Yet, as other have pointed out, we have much weaker hardware (RPi) decoding video without issues, yet PC having an order of magnitude faster hardware will choke on a few streams in the absence of hardware decoding.
Decoding video id far from being my strong point, but it's pretty obvious something's amiss here.
And don;t get me started on Windows, where a fairly powerful laptop cannot output smooth video, no matter the amount of hardware decoding, because DPC woes...
Leave a comment:
caligula replied

05 July 2020, 09:55 AM
Originally posted by pal666 View Post

you are being silly. process node advancements apply to progress from 8 years old intel cpu to 14nm intel cpu.

Apparently there's a language barrier here. What I meant is, 7nm CPU is more power efficient than say 40nm CPU (original RPi process node). In a similar way, 7nm GPUs are more power efficient than 40nm GPUs and 7nm DSPs are more power efficient than 40nm DSPs. So the advances in process node technology can benefit all types of video decoding.

but 8 year old intel cpu wasn't able to play video.

Irrelevant. I wasn't claiming anything like that. My claim was, if a 8 year old $25 computer could decode H.264, a modern $1000 computer should easily be able to decode the same video efficiently, thanks to multiple improvements in hardware technology.

(and btw intel's 14nm is 6 years old)

Intel's latest desktop arch (Comet Lake) is still at 14nm.

it is better in many ways, but it is worse in hardware video decode way(especially when hardware video decoding parts of your laptop aren't used)

Not really - the modern notebooks are so powerful you can do everything the original RPi does without any kind of hardware acceleration. RPi might be able to do some low level real-time bit banging faster than Intel, but then again its GPIO isn't that fast.

Last edited by caligula; 05 July 2020, 10:03 AM.
Leave a comment:
treba replied

05 July 2020, 09:20 AM
Originally posted by bug77 View Post

Freeing the CPU from some tasks usually yields a smoother experience. Also, when playing back several streams, without hardware acceleration even a modern CPU will choke. Fast.

That still works out to 15*1920*1080 or ~31MFLOPS every frame (~16ms). A CPU shouldn't even notice that, especially with SIMD.
Of course, it's not just the transformation, so I'd accept a 5-10% CPU overhead. Anything on top of that, just screams of sloppy programming somewhere in the stack.

That's assuming all the data is in L1 cache. But I think you are somewhat right - YUV - RGB translation apparently is not the main show stopper. It just adds to it and doing it on the GPU is more efficient. BTW, this motivated the the whole DMABUF implementation in the first place, see https://bugzilla.mozilla.org/show_bug.cgi?id=1580169
Likes 1
Leave a comment:
curfew replied

05 July 2020, 09:02 AM
Originally posted by pal666 View Post

it's the other way around. market share is a result of user choice. i.e. everyone already switched to chrome and improvements in firefox will not affect majority of users

They switched to Chrome because Firefox was shit. Now Firefox is getting better and the same users will switch back to Firefox unless Chrome catches up. Simple.

Firefox has taken huge steps forward since Fx 75 and finally I can feel satisfied personally.
Likes 1
Leave a comment:
bug77 replied

05 July 2020, 08:47 AM
Originally posted by horizonbrave View Post

Sorry I missed the memo and I'm dumb as fuck. What this brings to the table? Just a bit of power efficiency for laptop users??
Thanks

Freeing the CPU from some tasks usually yields a smoother experience. Also, when playing back several streams, without hardware acceleration even a modern CPU will choke. Fast.

Originally posted by Veto View Post

Well, let's have a look at your assertion: That is 15 floating point operations per pixel you show there. So you need 15*1920*1080*60 = 1 866 240 000 or approximately 2 GFLOPS just to do a simple YUV conversion on your CPU. For 4k video that will be 7½ GFLOPS...

Of course a real implementation will apply some tricks, but still... There is a reason why specialized hardware is a win when doing video conversions!

That still works out to 15*1920*1080 or ~31MFLOPS every frame (~16ms). A CPU shouldn't even notice that, especially with SIMD.
Of course, it's not just the transformation, so I'd accept a 5-10% CPU overhead. Anything on top of that, just screams of sloppy programming somewhere in the stack.
Leave a comment:
pal666 replied

05 July 2020, 07:37 AM
Originally posted by remenic View Post

KDE user here, but not sure what you're talking about.

he is talking about kde's broken wayland support(that's why poor kde users need subj)
Leave a comment:

Announcement

Firefox 80 To Support VA-API Acceleration On X11

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: