Announcement

**blackshard** · 13 November 2009, 08:31 AM

Originally posted by BlackStar View Post

Sorry, but you have no idea what you are talking about.

Are you so sure about?

Originally posted by BlackStar View Post

1. Resampling can be more CPU-intensive than decoding ogg, if you use a high quality algorithm. Pulse's default is a good trade-off between quality and speed that works great for most applications. If you are not happy, modify that.

2. Software mixing (hence resampling) is absolutely necessary on 99.99% of all computers out there. How else are you supposed to listen to your 44KHz ogg using your 48KHz realtek chip?

Sure, of course resampling can be cpu intensive. You can use polyphase filters, spline interpolation or other nice resampling filters. But do we really need it?
AFAIK, if a card is capable of 48 kHz, is also able of converting a 44 kHz stream, or at least it is possible to program the chip to accept such a stream. If the chip has fixed sampling rate, then it will care itself about resampling...

Also, resampling 1 (one) stream in software even with a nice resampling filter (spline for me is already nice, fast and with sufficient high quality) will never require 3-4% of cpu time on a 2.5 Ghz machine (unless you wrote it in intrepreted Basic language), as misiu_mp reported.

It could be a configuration problem, I don't exclude this, but then, probably, default configuration is wrong.

Originally posted by BlackStar View Post

3. How did you come up with this 3-4% number? By playing back an ogg file? If so, do you understand why this is completely invalid?

Sure it doesn't prove anything, but gives and order of magnitude of the computational time required.

**BlackStar** · 13 November 2009, 08:40 AM

Originally posted by blackshard View Post

Are you so sure about?

Yes.

AFAIK, if a card is capable of 48 kHz, is also able of converting a 44 kHz stream, or at least it is possible to program the chip to accept such a stream. If the chip has fixed sampling rate, then it will care itself about resampling...

Realtek chips only support one hardware voice, so you have to pick a rate and resample everything else to match. Mixing must be done in software.

Sure it doesn't prove anything, but gives and order of magnitude of the computational time required.

You reported CPU usage of 3-4%. You also reported that decoding a 80Kbps ogg file requires 2-3% of the CPU (virtualized, so let's imagine for a moment that we're running on raw hardware but decoding a 192Kbps file instead - results should be close enough). This puts an upper limit to pulse CPU usage of about 0--1%.

What order of magnitude does 0-1% give you?

**benmoran** · 13 November 2009, 09:22 AM

Originally posted by misiu_mp View Post

Your numbers are a really good sign.

I am using a standard fedora 11 setup with version 0.9.15 of PA. Recently phoronix reported 0.9.20 was released (fedora 12 will ship with it). Could be much have changed. Maybe fedora 12 brings the same performance improvements as Karmic Koala (or maybe Keramic does not resample and fedora does). What version of PA do you have?

You think the sound card hardware matters? Can pulse use better hardware to accelerate some operations?

I'm using 0.9.19 currently. As was stated the configuration can play a part, but i'm not sure if hardware has any influence or not on cpu %. Maybe someone who knows can chime in.

**misiu_mp** · 13 November 2009, 11:57 AM

Originally posted by BlackStar View Post

Most distros ship with configurations optimized for current hardware, the Duron is obsolete three times over by now (great chip btw, I have one too. Watt-guzzler, though).

New hardware doesnt differ that much from the old one. Since the developers most likely don't do proc-specific assembly and most likely compile it with gcc set to optimize for 686 at best, the only way I see they can 'optimize' for new hardware is by sloppy programming that relies on GHz and operations per cycle for high performance. I really hope this is rather for the sake of code maintainability.

Things that were mandatory years ago to achieve acceptable performance may be overlooked now. One such thing that comes to mind might be the increasing number of abstraction layers. They are very good for the speed of development, compatibility and eventually could provide a single point for performance optimization, but also a single point for performance degradation.

The obsoleteness of hardware was what m$ relied upon with vista. Linux did not make this assumption and could benefit from the (unexpected) wave of low-power mobile devices that followed. So it doesnt matter that Duron is obsolete. There are new user-grade chips that have similar performance but much improved efficiency. They need an OS too.

**kraftman** · 13 November 2009, 12:00 PM

Originally posted by BlackStar View Post

Most distros ship with configurations optimized for current hardware, the Duron is obsolete three times over by now (great chip btw, I have one too. Watt-guzzler, though).

Anyway, this is easy to fix if you spend a couple of minutes reading the documentation - pulse is very configurable. Just make sure you are using the latest version (pulse has progressively gotten faster).

I used it about year ago, so Pulse Audio was about one year younger

Thanks, if I would be able to configure it to minimize CPU usage then it will be great.

**misiu_mp** · 13 November 2009, 12:07 PM

Originally posted by BlackStar View Post

Yes.

You reported CPU usage of 3-4%. You also reported that decoding a 80Kbps ogg file requires 2-3% of the CPU (virtualized, so let's imagine for a moment that we're running on raw hardware but decoding a 192Kbps file instead - results should be close enough). This puts an upper limit to pulse CPU usage of about 0--1%.

What order of magnitude does 0-1% give you?

It is the pulseaudio process that uses 3-4%. As far as I understand pulse does not do the actual audio decoding, but just handles the decoded stream. The decoding is done by vlc in my case. I play a wmv (streaming internet radio) at 1 - 1.3% used by vlc. So in total its ~5% cpu usage.

**BlackStar** · 13 November 2009, 01:38 PM

New hardware doesnt differ that much from the old one. Since the developers most likely don't do proc-specific assembly and most likely compile it with gcc set to optimize for 686 at best, the only way I see they can 'optimize' for new hardware is by sloppy programming that relies on GHz and operations per cycle for high performance. I really hope this is rather for the sake of code maintainability.

Actually, new hardware is vastly different from older hardware. My 850MHz Duron guzzles ~90-110W on idle (including motherboard and memory, excluding hard disk and GPU). My 1300MHz Atom board needs <20W (including motherboard, memory and GPU, excluding hard disk), while performing better than the Duron. A Phenom 2 might require 70W on idle, but it performs an order of magnitude better than the Duron.

Hardware changes and programs adapt to take advantage of those changes. This is not sloppy programming, this is evolution.

In the case of pulseaudio, you can reduce CPU usage by using a faster resampler: "pulseaudio --dump-resample-methods" for a list of algorithms, the default is speex-float-3 which has good audio quality but is probably a bit too hard on the Duron.

Finally, a word of warning: distro configuration plays a very big role in pulse performance. Ubuntu has historically shipped braindead configurations for pulse (Karmic is no exception, but this will likely change in Lucid). Other distros also fall to configure pulse correctly, so it might be a good idea to compare your configuration against the "perfect setup" page on the pulse wiki and also check your distros bug tracker for issues regarding pulse configuration.

It is the pulseaudio process that uses 3-4%. As far as I understand pulse does not do the actual audio decoding, but just handles the decoded stream. The decoding is done by vlc in my case. I play a wmv (streaming internet radio) at 1 - 1.3% used by vlc. So in total its ~5% cpu usage.

My mistake then, I apologize.

I read a configuration guide on pulseaudio.org today, which mentioned ways to reduce CPU usage for media applications. Interestingly, this CPU usage would be registered as pulseaudio usage, even if it was caused by misprogramming in the media player (e.g. by requesting an unnecessarily low latency value). It might be worth trying a few different media players and check if they all behave the same (and maybe filing bug reports against the media player or the distro).

**blackshard** · 13 November 2009, 05:40 PM

Originally posted by BlackStar View Post

Yes.

Then you are wrong. I programmed lotsa of audio software, including dsps, media players, realtime rendering apps (such as modplayers) and some wrappers for vst and winamp plugins.
That's the reason why I'm saying 3-4% of cpu usage on a 2.5 Ghz cpu for just an audio engine that supports routing, mixing, amplifying and plain resampling is considered too much by me. Expecially when you're not using any advanced feature of this piece of software (or, at least, you think you're not doing that, since misiu_mp said he was just listening a mp3).

Originally posted by BlackStar View Post

Realtek chips only support one hardware voice, so you have to pick a rate and resample everything else to match. Mixing must be done in software.

Thank you for the basics, but I already know about.

Originally posted by BlackStar View Post

You reported CPU usage of 3-4%. You also reported that decoding a 80Kbps ogg file requires 2-3% of the CPU (virtualized, so let's imagine for a moment that we're running on raw hardware but decoding a 192Kbps file instead - results should be close enough). This puts an upper limit to pulse CPU usage of about 0--1%.

What order of magnitude does 0-1% give you?

No wait, I was meant that realtime vorbis decoding requires 2-3% of cpu time to decode an 80kbps stream on a virtualized os (windows 98, to be precise). I consider decoding a vorbis stream a relatively heavy audio task, so I just can't imagine what pulseaudio is doing to consume 3-4% of cpu cycles on a 2.5 Ghz core2duo machine.

BTW, cpu consuming is just the top of the iceberg dealing with PA.

And note that I'm not saying PA is sh*t and should not be used, or should be killed. I just said that it isn't mature yet to be included in a regular distro.

**DeepDayze** · 13 November 2009, 07:39 PM

Originally posted by cruiseoveride View Post

PulseAudio hasn't died yet? Damn.

This...PA hates my system period

**misiu_mp** · 15 November 2009, 09:37 PM

Originally posted by BlackStar View Post

Actually, new hardware is vastly different from older hardware. My 850MHz Duron guzzles ~90-110W on idle (including motherboard and memory, excluding hard disk and GPU). My 1300MHz Atom board needs <20W (including motherboard, memory and GPU, excluding hard disk), while performing better than the Duron. A Phenom 2 might require 70W on idle, but it performs an order of magnitude better than the Duron.

Hardware changes and programs adapt to take advantage of those changes. This is not sloppy programming, this is evolution.

You can't write programs optimized for any particular hardware (be it old or new or Phenom or C2D). This would make them not portable. You write it using hardware-neutral algorithms and general principles of doing so efficiently having the general architecture of the target computing platform in mind (such as what is random-access and what is not). Then you leave it to the compiler to fix up the details (optimize).
Hardware changes, thats for sure but fundamentally both duron and C2D is the same super-scalar, out-of-order, speculative execution P6 design.
(atom is rather a biffed-up P5, being in-order with 2 execution units). The new hardware is more efficient because its done in finer process, have huge power management logic (its transistor count is comparable to whole older cpus), can execute more instructions in parallel and have more cache. None of these you can take into the account when writing software. Unless for example you write a program using excessive buffers, hoping they will fit into the huge modern caches, which is an example of sloppy programming. If you write half as efficient, but simpler algorithm and count on multiple hardware cores (or simd), to make up for the difference, thats sloppy programming. If a programmer writes for SIMD but neglects the generic version, thats sloppy programming too.
The cpu instructions are essentially the same, they just execute faster. The performance increase is rather linear. Even if you knew that an instruction is executing much faster on one architecture, you wouldn't be able to take advantage of it because you do not write in assembly. Thats the job of compilers. And if a compiler has lost its ability to optimize for older hardware, thats sloppy programming on the compilers part.

In other words everything generic that is supposed to be efficient on modern hardware will be efficient on old hardware.
If there is an older mp3 decoder implementation that is efficient on a PIII, the modern implementation of an mp3 decoder should also be efficient on a PIII 1GHz or it is simply sloppy. If you want to make it extra fast on a Core6Opto 6GHz, by braking PIII efficiency (use explicit SSE7 instructions) then you are most likely wasting you time by over-optimizing.
If you want to have a chance for portability you should rely on the compiler for architecture-specific optimizations. In cases when compilers are not good enough (many uses of parallelisation) you need to maintain the optimized version and the generic version or be damned.
When developing and test running on new hardware it is easy to not notice the performance bottlenecks that would be apparent on older hardware.

Announcement

PulseAudio 0.9.20 Arrives With Fixes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment