Let me say up front that I'm likely to be rather confrontational in my early responses here. You are someone who has played no visible role in Linux audio over the last decade, you've said a bunch of stuff about digital audio that isn't true, and I've also watched your talk on desktop linux from last year in which you said a bunch of misleading or outright false stuff about that area as well. This doesn't mean that you're a bad or stupid person, but for me personally, it creates a barrier. I apologize for that.
unfortunately, DLL's have a better track record. PLL's make sense when you are trying to track phase, which is not the case here. on the other hand, had you actually been in the linux audio community at any point in the last 10 years, you'd probably be familiar with: http://kokkinizita.linuxaudio.org/papers/usingdll.pdf
which you should know is precisely why a DLL is more suited for this purpose. but anyway, its all a bit academic, because its far more important what constants you use with either a PLL or a DLL than precisely which variant it is.
oh really? see, this is another example of why i feel that you know just enough about digital audio to be dangerous, but not enough to get it right. Something you think is a "no brainer" is actually the source of significant disagreement: http://blog.bjornroche.com/2009/12/i...out-there.html
your comments on floating point vs. fixed point are about a decade out of date, and simply wrong. i'm thinking back to your post at ardour.org in which you said:
this just isn't true. the main use to non-integer formats is to avoid clipping due to overflow during summing. it has nothing to do with "large dynamic range".
the choice between floating point and fixed point comes down to implementation availability and speed. on general purpose computers, which overwhelmingly dominate the systems on which linux runs, floating point is nearly always faster, except on systems which don't provide it at all. there is no widely accepted fixed point math library (people differ on the best ways to implement certain aspects).
its very very far from the most important reason: at any point in time, fixed point DSP has historically offered a better $/computation ratio than floating point hardware, and thus people who need to sell gear used fixed point. whenever they do so, they can rightly say "its faster, and cheaper". the problem is that this state of affairs lasts approximately 10-15 months, at which point, the then-available general purpose systems have floating point that is as fast or faster than what the customer bought. this is why all serious audio software for windows, os x, linux and other systems using floating point for audio, and not fixed point.
so you plan to replace the cost of some context switches with the cost of converting floating point (which will continue to be the overwhelmingly dominant form in user space, unless you manage to figure out how to get cross-platform developers to switch too) to and from floating point. you cannot be serious. remember, the hardware will still be integer and in some cases floating point.
i'll continue on with some points from your post at ardour.org ...
goodbye SSE/SSE2/SSE3. sigh.
Sorry, but this is just FUD. How do you think data is exchanged between user space processes? You cross the userspace-kernel boundary twice doing so. Any sort of IPC always involves system calls. Even if it goes over shared memory. Because shared memory is actually shared address space and all sorts of "kernel-hell" breaks loose, touching it.
You cannot be serious. Its hard to take anyone seriously who would claim such a thing. You think that two processes both touching an mmap'ed region in user space causes some kind of kernel hell? this is just ridiculous - you make it appear that you don't know how shared memory works at all! the address spaces have the region mapped. when each process touches part of the region NOTHING HAPPENS - its a memory access.
Lennart's design (which he called glitchfree) is more or less identical to coreaudio's. but it doesn't do ANYTHING to deal with your concerns about power consumption under low latency conditions and doesn't substantively alter my point about low latency audio being the reason why power consumption goes up. Whether its the audio interface interrupt that wakes up the CPU and/or the user space process that wants to write every 8 samples, something is keeping the CPU busy. end of story.
Moving onto API design. There are two basic models for streaming media on any system. One of them is often called "push" and the other is called "pull". In the push model, applications are free to write/read what data they want, when they want to (they may behave in an isochronous manner, if they are smart, or they might read/write in semi-random bursts: either way, they get to make the choices about when and how much data to receive from/deliver to an endpoint. In the pull model, it is the endpoint (typically an actual audio interface) that determines how much data is to be processed, and when it needs to be done.
Now, it is entirely possible to write code that uses the push model and actually behaves, over time, very similarly to what happens under a pull model. A few people do this. But its not a coincidence that every serious audio API for low latency work (ASIO, CoreAudio, WaveRT, JACK and more) all use a push model: the system decides how much data needs processing and when, and the apps just do what they are told.
Why does this matter? It matters because although its very easy to add a push API on top of a pull API (Just Add Buffering (TM) : you're done), it is more or less impossible to add a pull API on top of a push API and simultaneously offer low latency. Several APIs on different platforms over the years have tried this, and they've all been absolutely useless (I'm looking at you, OpenAL).
So, coming back to your beloved unix system calls, open/read/write/ioctl do not permit the creation of a pull API without increasing latency by at least one "period" (to use ALSA terminology). but then, moving on, they encourage developers to treat audio (or video) i/o as if its file i/o - a process without time deadlines. read when you want, write when you want. now as i said, the developer doesn't have to do it that way - they can block sensibly and sort of model a poll/select call - but here's the reality: the overwhelming majority do not. most developers don't have a clue about realtime programming and they need serious nudging to even go halfway in the right direction. this is no fault of their own - they shouldn't really have to know too much. but the unix system call API has absolutely no notion of timeliness and it does nothing to encourage people to design their code around a push model. That's why JACK (and CoreAudio, and ASIO and WaveRT) do not allow developers to write in a push style unless they use some additional library that does buffering for them (there are at least two of these for JACK). You will note that one of the entire reasons why CoreAudio has been so successful is because it forces EVERY application on OS X, even a pathetic little email notification applet that wants to make a beep, use the same pull-based, threaded audio API that Logic, Digital Performer and Cubase/Nuendo have to use. An API that looks absolutely nothing remotely like open/read/write/ioctl.
But stepping back again, here's the meta problem with your entire KLANG proposal/idea. I've been preaching this stuff online and at conferences for nearly a decade now. I've discussed this stuff many times with many people on the linux audio developers mailing list, on the JACK mailing list, and back in the day, even on the ALSA development mailing list. I've been on forums explaining to people who know even less about digital audio than you how things work, and why things have converged on common solutions on all major OS platforms, and what the benefits and disadvantages of those common solutions are. And then one morning, someone on IRC points at your page, which contains a bunch of handwaving half-truths that appear to have formed completely in a vacuum, without any interaction with any of the people who really know about and understand digital audio (on linux, or any other platform for that matter).
You could have showed up on LAD or the JACK ML, or a half dozen other places, and proposed some new ideas. People would have argued with you and you would have changed their minds or they would have changed yours. At the end of the day, we'd either have an idea for a better new "grand design", or we'd back to wondering how/if/when/why to take the next step forward.
Instead, you've designed KLANG in a bubble, misunderstanding so many of the small details that are ultimately why it tends to take years to get one of these systems right, and you've handed the morons on reddit and slashdot who just want to froth at the mouth about how bad linux audio another shiny new diamond. This is not helpful. Schemes that call for moving policy into the kernel, require rewriting every single audio device driver, that replace cheap context switches with (relatively costly) format conversions (and I didn't even get into how the number of context switches is identical typically only 2 greater than without JACK anyway), that removes interposition by reverting to the unix system call API that was designed before realtime streaming was even a twinkle in dennis ritchie's eye ... i'm sorry, but this is so far from helpful that its just offensive.
Originally posted by datenwolf
View Post
Also our sample rates in physics are several orders of magnitudes higher of that found in audio.
In the kernel of course, because for one, sample format conversion is a no brainer,
your comments on floating point vs. fixed point are about a decade out of date, and simply wrong. i'm thinking back to your post at ardour.org in which you said:
The main reason to use floating point numbers in audio is for space efficiency when storing large dynamic range audio.
A lot of people, especially those coming from PC programming, have a rather low regard for fixed point stuff, for some reason I can't fathom.
But if you look around in the high performance, high precision DSP business, most signal processing is done in fixed point there. And that's for good reasons; avoiding loss of precision is one of them.
You can do floating point in the Linux kernel, it's just frowned upon. kernel_fpu_begin and kernel_fpu_end allow for FPU register save, restore. But KLANG doesn't need it, because it does everything fixed point.
i'll continue on with some points from your post at ardour.org ...
KLANG's internal stream format gives at least 8 bits of foot- and headroom for all samples in it. Gain/attenuation is applied by factoring the multiplicator to the closest radix 2 and remainder. Then a bitshift is applied followed by multiplication with the remainder.
(from me) This makes it very inefficient to implement inter-application audio, since everything has to make extra transfers across the kernel/user space boundary.
But then Lennart Poettering (re-)discoverd a rather old method ? this is one of the few cases where I think he did something good ? how you could get low latency even when operating with large buffer sized. This might sound impossible, but only as long as you assume a filled buffer being intouchable. If you accept, that one may actually perform updates on a already submitted buffer, just slightly ahead from where it's currently read from (for example in a DMA transfer) you can get latency down, even with larger buffers. Lennart implemented this in PA when they were approaching very long buffers (256ms and longer) on mobile devices, but still needed low latency for audio events.
Moving onto API design. There are two basic models for streaming media on any system. One of them is often called "push" and the other is called "pull". In the push model, applications are free to write/read what data they want, when they want to (they may behave in an isochronous manner, if they are smart, or they might read/write in semi-random bursts: either way, they get to make the choices about when and how much data to receive from/deliver to an endpoint. In the pull model, it is the endpoint (typically an actual audio interface) that determines how much data is to be processed, and when it needs to be done.
Now, it is entirely possible to write code that uses the push model and actually behaves, over time, very similarly to what happens under a pull model. A few people do this. But its not a coincidence that every serious audio API for low latency work (ASIO, CoreAudio, WaveRT, JACK and more) all use a push model: the system decides how much data needs processing and when, and the apps just do what they are told.
Why does this matter? It matters because although its very easy to add a push API on top of a pull API (Just Add Buffering (TM) : you're done), it is more or less impossible to add a pull API on top of a push API and simultaneously offer low latency. Several APIs on different platforms over the years have tried this, and they've all been absolutely useless (I'm looking at you, OpenAL).
So, coming back to your beloved unix system calls, open/read/write/ioctl do not permit the creation of a pull API without increasing latency by at least one "period" (to use ALSA terminology). but then, moving on, they encourage developers to treat audio (or video) i/o as if its file i/o - a process without time deadlines. read when you want, write when you want. now as i said, the developer doesn't have to do it that way - they can block sensibly and sort of model a poll/select call - but here's the reality: the overwhelming majority do not. most developers don't have a clue about realtime programming and they need serious nudging to even go halfway in the right direction. this is no fault of their own - they shouldn't really have to know too much. but the unix system call API has absolutely no notion of timeliness and it does nothing to encourage people to design their code around a push model. That's why JACK (and CoreAudio, and ASIO and WaveRT) do not allow developers to write in a push style unless they use some additional library that does buffering for them (there are at least two of these for JACK). You will note that one of the entire reasons why CoreAudio has been so successful is because it forces EVERY application on OS X, even a pathetic little email notification applet that wants to make a beep, use the same pull-based, threaded audio API that Logic, Digital Performer and Cubase/Nuendo have to use. An API that looks absolutely nothing remotely like open/read/write/ioctl.
But stepping back again, here's the meta problem with your entire KLANG proposal/idea. I've been preaching this stuff online and at conferences for nearly a decade now. I've discussed this stuff many times with many people on the linux audio developers mailing list, on the JACK mailing list, and back in the day, even on the ALSA development mailing list. I've been on forums explaining to people who know even less about digital audio than you how things work, and why things have converged on common solutions on all major OS platforms, and what the benefits and disadvantages of those common solutions are. And then one morning, someone on IRC points at your page, which contains a bunch of handwaving half-truths that appear to have formed completely in a vacuum, without any interaction with any of the people who really know about and understand digital audio (on linux, or any other platform for that matter).
You could have showed up on LAD or the JACK ML, or a half dozen other places, and proposed some new ideas. People would have argued with you and you would have changed their minds or they would have changed yours. At the end of the day, we'd either have an idea for a better new "grand design", or we'd back to wondering how/if/when/why to take the next step forward.
Instead, you've designed KLANG in a bubble, misunderstanding so many of the small details that are ultimately why it tends to take years to get one of these systems right, and you've handed the morons on reddit and slashdot who just want to froth at the mouth about how bad linux audio another shiny new diamond. This is not helpful. Schemes that call for moving policy into the kernel, require rewriting every single audio device driver, that replace cheap context switches with (relatively costly) format conversions (and I didn't even get into how the number of context switches is identical typically only 2 greater than without JACK anyway), that removes interposition by reverting to the unix system call API that was designed before realtime streaming was even a twinkle in dennis ritchie's eye ... i'm sorry, but this is so far from helpful that its just offensive.
Comment