Announcement

Collapse
No announcement yet.

PulseAudio 0.9.20 Arrives With Fixes

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blackshard
    replied
    Originally posted by misiu_mp View Post
    You can't write programs optimized for any particular hardware (be it old or new or Phenom or C2D). This would make them not portable. You write it using hardware-neutral algorithms and general principles of doing so efficiently having the general architecture of the target computing platform in mind (such as what is random-access and what is not). Then you leave it to the compiler to fix up the details (optimize).

    [...cut...]

    If you want to have a chance for portability you should rely on the compiler for architecture-specific optimizations. In cases when compilers are not good enough (many uses of parallelisation) you need to maintain the optimized version and the generic version or be damned.
    When developing and test running on new hardware it is easy to not notice the performance bottlenecks that would be apparent on older hardware.
    I absolutely agree.
    All the simple dsps I wrote in the past are at least in generic x86/x87 assembly/C code, and then come in the MMX/3dNow/SSE flavour.

    Anyway, I tried newly released OpenSuse 11.2 on three completely different machines (a desktop, a laptop and a netbook) with three different audio chips, and pulseaudio was reported to not function on all of them.

    Leave a comment:


  • misiu_mp
    replied
    To add to the confusion, just took a look at my netbook pulse performance. Its also running fedora 11 but the cpu is atom.
    Pegged at 800Mhz, atom works at 4-5% for pulseaudio and 6-10% for vlc (ogg) to play a song.
    Anybody can give a shot explaining that (spin-locks or timer-based busy-wait loops come to mind)?

    Leave a comment:


  • misiu_mp
    replied
    Originally posted by BlackStar View Post
    Actually, new hardware is vastly different from older hardware. My 850MHz Duron guzzles ~90-110W on idle (including motherboard and memory, excluding hard disk and GPU). My 1300MHz Atom board needs <20W (including motherboard, memory and GPU, excluding hard disk), while performing better than the Duron. A Phenom 2 might require 70W on idle, but it performs an order of magnitude better than the Duron.

    Hardware changes and programs adapt to take advantage of those changes. This is not sloppy programming, this is evolution.
    You can't write programs optimized for any particular hardware (be it old or new or Phenom or C2D). This would make them not portable. You write it using hardware-neutral algorithms and general principles of doing so efficiently having the general architecture of the target computing platform in mind (such as what is random-access and what is not). Then you leave it to the compiler to fix up the details (optimize).
    Hardware changes, thats for sure but fundamentally both duron and C2D is the same super-scalar, out-of-order, speculative execution P6 design.
    (atom is rather a biffed-up P5, being in-order with 2 execution units). The new hardware is more efficient because its done in finer process, have huge power management logic (its transistor count is comparable to whole older cpus), can execute more instructions in parallel and have more cache. None of these you can take into the account when writing software. Unless for example you write a program using excessive buffers, hoping they will fit into the huge modern caches, which is an example of sloppy programming. If you write half as efficient, but simpler algorithm and count on multiple hardware cores (or simd), to make up for the difference, thats sloppy programming. If a programmer writes for SIMD but neglects the generic version, thats sloppy programming too.
    The cpu instructions are essentially the same, they just execute faster. The performance increase is rather linear. Even if you knew that an instruction is executing much faster on one architecture, you wouldn't be able to take advantage of it because you do not write in assembly. Thats the job of compilers. And if a compiler has lost its ability to optimize for older hardware, thats sloppy programming on the compilers part.

    In other words everything generic that is supposed to be efficient on modern hardware will be efficient on old hardware.
    If there is an older mp3 decoder implementation that is efficient on a PIII, the modern implementation of an mp3 decoder should also be efficient on a PIII 1GHz or it is simply sloppy. If you want to make it extra fast on a Core6Opto 6GHz, by braking PIII efficiency (use explicit SSE7 instructions) then you are most likely wasting you time by over-optimizing.
    If you want to have a chance for portability you should rely on the compiler for architecture-specific optimizations. In cases when compilers are not good enough (many uses of parallelisation) you need to maintain the optimized version and the generic version or be damned.
    When developing and test running on new hardware it is easy to not notice the performance bottlenecks that would be apparent on older hardware.

    Leave a comment:


  • DeepDayze
    replied
    Originally posted by cruiseoveride View Post
    PulseAudio hasn't died yet? Damn.
    This...PA hates my system period

    Leave a comment:


  • blackshard
    replied
    Originally posted by BlackStar View Post
    Yes.
    Then you are wrong. I programmed lotsa of audio software, including dsps, media players, realtime rendering apps (such as modplayers) and some wrappers for vst and winamp plugins.
    That's the reason why I'm saying 3-4% of cpu usage on a 2.5 Ghz cpu for just an audio engine that supports routing, mixing, amplifying and plain resampling is considered too much by me. Expecially when you're not using any advanced feature of this piece of software (or, at least, you think you're not doing that, since misiu_mp said he was just listening a mp3).

    Originally posted by BlackStar View Post
    Realtek chips only support one hardware voice, so you have to pick a rate and resample everything else to match. Mixing must be done in software.
    Thank you for the basics, but I already know about.

    Originally posted by BlackStar View Post
    You reported CPU usage of 3-4%. You also reported that decoding a 80Kbps ogg file requires 2-3% of the CPU (virtualized, so let's imagine for a moment that we're running on raw hardware but decoding a 192Kbps file instead - results should be close enough). This puts an upper limit to pulse CPU usage of about 0--1%.

    What order of magnitude does 0-1% give you?
    No wait, I was meant that realtime vorbis decoding requires 2-3% of cpu time to decode an 80kbps stream on a virtualized os (windows 98, to be precise). I consider decoding a vorbis stream a relatively heavy audio task, so I just can't imagine what pulseaudio is doing to consume 3-4% of cpu cycles on a 2.5 Ghz core2duo machine.

    BTW, cpu consuming is just the top of the iceberg dealing with PA.

    And note that I'm not saying PA is sh*t and should not be used, or should be killed. I just said that it isn't mature yet to be included in a regular distro.

    Leave a comment:


  • BlackStar
    replied
    New hardware doesnt differ that much from the old one. Since the developers most likely don't do proc-specific assembly and most likely compile it with gcc set to optimize for 686 at best, the only way I see they can 'optimize' for new hardware is by sloppy programming that relies on GHz and operations per cycle for high performance. I really hope this is rather for the sake of code maintainability.
    Actually, new hardware is vastly different from older hardware. My 850MHz Duron guzzles ~90-110W on idle (including motherboard and memory, excluding hard disk and GPU). My 1300MHz Atom board needs <20W (including motherboard, memory and GPU, excluding hard disk), while performing better than the Duron. A Phenom 2 might require 70W on idle, but it performs an order of magnitude better than the Duron.

    Hardware changes and programs adapt to take advantage of those changes. This is not sloppy programming, this is evolution.

    In the case of pulseaudio, you can reduce CPU usage by using a faster resampler: "pulseaudio --dump-resample-methods" for a list of algorithms, the default is speex-float-3 which has good audio quality but is probably a bit too hard on the Duron.

    Finally, a word of warning: distro configuration plays a very big role in pulse performance. Ubuntu has historically shipped braindead configurations for pulse (Karmic is no exception, but this will likely change in Lucid). Other distros also fall to configure pulse correctly, so it might be a good idea to compare your configuration against the "perfect setup" page on the pulse wiki and also check your distros bug tracker for issues regarding pulse configuration.

    It is the pulseaudio process that uses 3-4%. As far as I understand pulse does not do the actual audio decoding, but just handles the decoded stream. The decoding is done by vlc in my case. I play a wmv (streaming internet radio) at 1 - 1.3% used by vlc. So in total its ~5% cpu usage.
    My mistake then, I apologize.

    I read a configuration guide on pulseaudio.org today, which mentioned ways to reduce CPU usage for media applications. Interestingly, this CPU usage would be registered as pulseaudio usage, even if it was caused by misprogramming in the media player (e.g. by requesting an unnecessarily low latency value). It might be worth trying a few different media players and check if they all behave the same (and maybe filing bug reports against the media player or the distro).

    Leave a comment:


  • misiu_mp
    replied
    Originally posted by BlackStar View Post
    Yes.


    You reported CPU usage of 3-4%. You also reported that decoding a 80Kbps ogg file requires 2-3% of the CPU (virtualized, so let's imagine for a moment that we're running on raw hardware but decoding a 192Kbps file instead - results should be close enough). This puts an upper limit to pulse CPU usage of about 0--1%.

    What order of magnitude does 0-1% give you?
    It is the pulseaudio process that uses 3-4%. As far as I understand pulse does not do the actual audio decoding, but just handles the decoded stream. The decoding is done by vlc in my case. I play a wmv (streaming internet radio) at 1 - 1.3% used by vlc. So in total its ~5% cpu usage.

    Leave a comment:


  • kraftman
    replied
    Originally posted by BlackStar View Post
    Most distros ship with configurations optimized for current hardware, the Duron is obsolete three times over by now (great chip btw, I have one too. Watt-guzzler, though).

    Anyway, this is easy to fix if you spend a couple of minutes reading the documentation - pulse is very configurable. Just make sure you are using the latest version (pulse has progressively gotten faster).
    I used it about year ago, so Pulse Audio was about one year younger Thanks, if I would be able to configure it to minimize CPU usage then it will be great.

    Leave a comment:


  • misiu_mp
    replied
    Originally posted by BlackStar View Post
    Most distros ship with configurations optimized for current hardware, the Duron is obsolete three times over by now (great chip btw, I have one too. Watt-guzzler, though).
    New hardware doesnt differ that much from the old one. Since the developers most likely don't do proc-specific assembly and most likely compile it with gcc set to optimize for 686 at best, the only way I see they can 'optimize' for new hardware is by sloppy programming that relies on GHz and operations per cycle for high performance. I really hope this is rather for the sake of code maintainability.

    Things that were mandatory years ago to achieve acceptable performance may be overlooked now. One such thing that comes to mind might be the increasing number of abstraction layers. They are very good for the speed of development, compatibility and eventually could provide a single point for performance optimization, but also a single point for performance degradation.

    The obsoleteness of hardware was what m$ relied upon with vista. Linux did not make this assumption and could benefit from the (unexpected) wave of low-power mobile devices that followed. So it doesnt matter that Duron is obsolete. There are new user-grade chips that have similar performance but much improved efficiency. They need an OS too.

    Leave a comment:


  • benmoran
    replied
    Originally posted by misiu_mp View Post
    Your numbers are a really good sign.

    I am using a standard fedora 11 setup with version 0.9.15 of PA. Recently phoronix reported 0.9.20 was released (fedora 12 will ship with it). Could be much have changed. Maybe fedora 12 brings the same performance improvements as Karmic Koala (or maybe Keramic does not resample and fedora does). What version of PA do you have?

    You think the sound card hardware matters? Can pulse use better hardware to accelerate some operations?
    I'm using 0.9.19 currently. As was stated the configuration can play a part, but i'm not sure if hardware has any influence or not on cpu %. Maybe someone who knows can chime in.

    Leave a comment:

Working...
X