Announcement

Collapse
No announcement yet.

dav1d 0.7.1 AV1 Decoder Boosts 32-bit Arm Performance By ~28%

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • schwarzygesetzlos
    replied
    Originally posted by brad0 View Post
    dav1d has a decent amount of SSSE3 codepaths but older AMD processors before Zen did not support SSSE3.
    Not true. My Opteron tells me it got SSSE3. AFAIK SSSE3 has been in since Bulldozer.
    Code:
     # lscpu
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    Address sizes: 48 bits physical, 48 bits virtual
    CPU(s): 16
    On-line CPU(s) list: 0-15
    Thread(s) per core: 2
    Core(s) per socket: 8
    Socket(s): 1
    NUMA node(s): 2
    Vendor ID: AuthenticAMD
    CPU family: 21
    Model: 2
    Model name: AMD Opteron(tm) Processor 6386 SE
    Stepping: 0
    Frequency boost: enabled
    CPU MHz: 1398.765
    CPU max MHz: 2800.0000
    CPU min MHz: 1400.0000
    BogoMIPS: 5602.20
    Virtualization: AMD-V
    L1d cache: 128 KiB
    L1i cache: 512 KiB
    L2 cache: 16 MiB
    L3 cache: 12 MiB
    NUMA node0 CPU(s): 0-7
    NUMA node1 CPU(s): 8-15
    Vulnerability Itlb multihit: Not affected
    Vulnerability L1tf: Not affected
    Vulnerability Mds: Not affected
    Vulnerability Meltdown: Not affected
    Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
    Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
    Vulnerability Spectre v2: Mitigation; Full AMD retpoline, IBPB conditional, STIBP disabled, RSB filling
    Vulnerability Srbds: Not affected
    Vulnerability Tsx async abort: Not affected
    Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold

    Leave a comment:


  • schmidtbag
    replied
    Originally posted by brad0 View Post
    What about? Already mentioned a number of times. 32-bit ARM is far from dead. 32-bit X86 will die off sooner.
    I said dying not dead.

    Leave a comment:


  • uid0
    replied
    Originally posted by brad0 View Post
    The "newest" of CPUs from AMD that would utilize an SSE2 codepath is almost 9 years old. Who the hell is going to have a system that old still running even for testing purposes?
    What do you mean? I'm sure there's lots of such systems.

    Until less than a month ago, my main home workstation was based on a water-cooled Core2 Dual E8500. No AVX. I had to work at home a lot more recently, and 8G of RAM wasn't enough, so I borrowed a somewhat newer machine. Until the end of last year, my old folks ran a similar system.

    My main storage and media (and etc.) server at home runs on a Xeon E3120 (similar to the Core2 E8500). No AVX. Been in service for >11 years, still does its job, is super stable and reliable.

    I have another machine, barely 6.5 years old, which got a Pentium G3220. No AVX, no nuthin' (curse you, Intel!).

    I got some computers at work with non-AVX CPUs that do a variety of tasks; my main office workstation (besides the newer 2950X-based one, which I mostly `ssh -XY` to) does not have AVX2. There are folks who still use older servers, notebooks (for diagnostics, for students) and even desktops (mostly in task-specific roles) with older CPUs.

    So I wouldn't say SSE is dead. Besides, in some cases SSE can have lower latency than AVX.

    Leave a comment:


  • brad0
    replied
    Originally posted by Orphis View Post
    Using AV1 doesn't mean it uses SVC though. But I'll trust you if you say that libaom 2.0.0 supports SVC as I don't work at all with it directly.
    I should have better separated the two ideas. libaom 2.0.0 -> WebRTC -> Google Duo.

    Leave a comment:


  • Orphis
    replied
    Originally posted by brad0 View Post

    libaom 2.0.0 has support for SVC. AFAIK one of the big pushes for 2.0.0 was for WebRTC. Google Duo is using AV1.
    Using AV1 doesn't mean it uses SVC though. But I'll trust you if you say that libaom 2.0.0 supports SVC as I don't work at all with it directly.

    Leave a comment:


  • brad0
    replied
    Originally posted by nuetzel View Post
    Who the hell is talking about AMD?
    Most readers know my (Mesa) development system.
    There isn't as much interest in older SSE on the Intel side of things as their older processors support SSSE3. Duh.

    Leave a comment:


  • nuetzel
    replied
    Originally posted by brad0 View Post

    The "newest" of CPUs from AMD that would utilize an SSE2 codepath is almost 9 years old. Who the hell is going to have a system that old still running even for testing purposes?
    Who the hell is talking about AMD?
    Most readers know my (Mesa) development system.

    Leave a comment:


  • brad0
    replied
    Originally posted by Orphis View Post
    libvpx can do SVC (we use it for WebRTC). libaom can probably do SVC too. No idea about containers though as I work with RTP.
    x264 cannot do SVC.
    libaom 2.0.0 has support for SVC. AFAIK one of the big pushes for 2.0.0 was for WebRTC. Google Duo is using AV1.

    Leave a comment:


  • brad0
    replied
    Originally posted by schwarzygesetzlos View Post
    True. I obviously must have found out by reading the link I posted myself. Just thought you were refering specifically to AVX.
    No, SSE2.

    dav1d has a decent amount of SSSE3 codepaths but older AMD processors before Zen did not support SSSE3. So that leaves the little bits of SSE4.1 and SSE2 for the older processors. I'm sure over time people will contribute more SSE4.1 / SSE2 for older processors, but not with the same level of dedication or pace.

    Leave a comment:


  • brent
    replied
    What about x86_64? Already well optimized.

    And what about ARM64? Same.

    Leave a comment:

Working...
X