BruteFIR

Table of contents

News

2020-02-18
Changed service provider and email address after 20+ years. New email address is found in the footer of this page.

2016-11-15
Re-released 1.0o with corrected version number in the output (it incorrectly said 1.0m before).

2016-08-09
Maintenance release 1.0o. Second this day, I was a bit trigger happy on the first. Here I've put in some minor bugfixes received from the Debian package maintainer.

2016-08-09
Maintenance release 1.0n, no functional change.

2013-11-29
There was still a typo in the last uploaded 1.0m affecting SSE2. Uploaded fix.

2013-11-28
There was a typo in the last uploaded 1.0m release causing the S24_LE/S24_3LE formats to break for input. So if you downloaded 1.0m yesterday please do it again.

2013-11-27
BruteFIR v1.0m. Fixed an SSE2 bug introduced in 1.0l. Added 'safety_limit' feature which can be used to protect your expensive speakers (and sensitive ears). Also fixed a rare race condition bug and further synchronized sample formats with ALSA, so now S24_4LE means low 24 bits of 32 bit word. Thus if you used S24_4LE before you should use S32_LE now to get the old behavior.

2013-10-06
BruteFIR v1.0l. Refreshed code to compile well on x86-64, dropped 3Dnow support and replaced the hand-coded SSE with SSE C code, and refreshed JACK and ALSA I/O modules to catch up with changes in the APIs. Also fixed a filter indexing bug in the cffa CLI command.

2009-03-31
BruteFIR v1.0k. Refreshed JACK and ALSA I/O modules to catch up with changes in the APIs.

2009-03-05
BruteFIR v1.0j. Fixed a memory leak in the CLI.

As you may have noted, I do not any longer actively develop BruteFIR further. From my point of view the software is "complete". I have had plans to start a next generation BruteFIR from scratch with a more modern design (using threads instead of forked processes etc), but priorities in life change, and I do not any longer have much time to write code so it is not likely to happen.

However, I'm happy to see that there are many BruteFIR users out there. Do continue to report bugs as my intention is to keep the code working and fix any bugs that arise.

2006-10-08
BruteFIR v1.0i. Minor fixes in CLI. Sub-sample delay now works also with negative delays.

There's also some interesting patent news, actually this is old news, but I did not know about it until now - the patent EP0649578 has been revoked after an opposition. It was argued that from the existing prior art (of which most is referenced here), the non-uniform partitioned part of the patent lacks inventive step. Additionally, there is a testimony that claims that the "invention" was actually exposed in advance - an academic person at a Danish university explained the idea to the people that then went home and filed the patent which has plagued the industry and open-source world for so long. A theft and lockdown of ideas which we so often before have seen in the world of patents. Anyway, based on this opposition, the EPO has revoked the patent.

For BruteFIR this does not mean anything since it employs uniform partitioned convolution. However, the non-uniform partitioned convolution algorithm is probably free to use in open-source software. There are still patents on this in other countries (such as the US), but with the corresponding patent revoked in Europe, they will be hard to defend. Note that I'm not a patent lawyer, so if you really are going to implement non-uniform partitioned convolution I recommend to consult a professional first, because there is no 100% clear prior art as in the uniform partitioned convolution case.

In the future, there might be a non-uniform version of BruteFIR, but not likely in the near future since it will require new design from the ground and up. The convolution principle is the same, but implementation is much different with non-uniform partitions, since you have to perform different size FFTs in parallel. The simplest idea would be to simply run the same convolution engine in several "layers" with different partition sizes and mixing together the result in the end. This way the implementation difference is small and could be realized quite easily in BruteFIR, but efficiency will suffer. I probably will rather spend time to do it from the ground and up. But not this year.

2006-07-12
BruteFIR v1.0h. Added a sub-sample delay function (that is delays smaller than one sample can be specified), support for text format in the file I/O module, and support for naming ports in the JACK I/O module.

2006-03-30
BruteFIR v1.0g. Fixed input mixer and delay setting bugs.

2005-08-11
BruteFIR v1.0f. Fixed a filter parse bug.

2005-06-28
BruteFIR v1.0e. Fix to work with GCC 4.0.

2005-06-12
Released a minor maintenance release, 1.0d. Contains some minor adjustments to the JACK I/O module, and a fatal bug fix concerning multiple inputs/outputs, which was introduced in 1.0b.

2005-01-04
BruteFIR v1.0c. Mistake in 1.0b caused the CLI module not accepting return characters, causing problems for telnet operation. Fixed that.

2004-11-21
BruteFIR v1.0b. Updated the JACK I/O module, it is now possible to run several BruteFIR instances using JACK at the same time, and it is not necessary to connect to external ports at startup. The CLI can now take commands from a serial line. Additionally, a couple of remaining bugs in the equalizer module have been fixed.

2004-08-07
BruteFIR v1.0a. Minor update, removed the coefficient set limit and updated the example configuration files in the package.

2004-04-21
BruteFIR v1.0. I felt it was time to release 1.0 now. I have fixed up the code so it compiles on FreeBSD and Solaris again, and I added an OSS module, which makes BruteFIR truly usable on FreeBSD platforms.

2004-02-22
BruteFIR v0.99n. As suspected, the merge function was not good enough, and has now been removed. However, instead, a cross-fade algorithm has been added, which indeed is a bit costly in terms of CPU time, but makes coefficient changes truly seamless.

2004-01-17
BruteFIR v0.99m. Fixed a few bugs, and updated the ALSA code to support the 1.0 version of the API. Now it is also possible to make more time-precise CLI scripting. I'm suspecting that the merge function is not good enough to be very useful, I may remove it or replace it with a better sounding (but inefficient) cross-fade algorithm.

2003-10-26
BruteFIR v0.99l. Added a function to hide discontinuities that may occur when filter coefficients are changed in runtime (function is called "merge"). Also added a skip option to coefficient loaded, to skip a given number of bytes in the beginning of a file.

2003-08-10
BruteFIR v0.99k. This is a maintenance release, which fixes a few bugs, including a severe powersave bug which could cause unexpected and very loud noise come out. It also adds an option to run in daemon mode.

2003-07-11
BruteFIR v0.99j. Now we are getting near a 1.0 release. This release contains quite many new features, and bug fixes. Some feature highlights: BruteFIR now employs FFTW3, there is support for 32 and 64 bits in the same binary and buffer over/underflows can be ignored. Among important bug fixes are that FFTW wisdom is now stored properly, so it can be re-used more often, and the equalizer module now sets the magnitude properly at the edges.

2003-02-11
BruteFIR v0.99i. I released the h-version a bit too early, lots of small but significant mistakes followed. This version fixes those (hopefully).

2003-02-09
BruteFIR v0.99h. A couple of bug fixes associated to the new callback I/O. It also adds support for native endian and auto sample formats, and a simple automatic load balancer for multi-processor machines.

2003-02-02
BruteFIR v0.99g. This release adds support for callback I/O. One callback I/O module is available, supporting JACK. This support means that the program has went through quite radical reorganizations, so something might be broke. If you discover any problems, please let me know.

2003-01-05
BruteFIR v0.99f. Minor peak meter adjustment and bug fix.

2002-12-25
BruteFIR v0.99e. Lots of tuning have been made to work better with sound card I/O. It should now be more reliable in low latency configurations. The release also includes some various minor improvements and bug fixes.

For those that find the default configuration file unnecessary and just in the way, there is now the -nodefault command line option, which will cause BruteFIR to skip the default configuration file.

2002-11-28
BruteFIR v0.99d. Fixes yet another bug in the ALSA code, which caused the software not to work with hardware with odd period sizes, such as some (all?) ice1712-based cards. The real-time index has also been much simplified and improved in terms of reliability, and a power-save feature was added.

Sometime soon, there will be 1.0...

2002-10-10
BruteFIR v0.99c, is an important bug fix release. Among other fixes, it fixes the slightly embarrassing bug of incorrect reading of 3 byte 24 bit formats. Apart from many bug fixes, it adds double buffer support to the equalizer module, and a simple script function to the CLI. The risk of buffer underflow at startup has also been strongly reduced.

2002-09-12
BruteFIR v0.99b, fixed a serious bug in the ALSA code, which caused buffer underflow when the software buffer size was larger than the hardware buffer size.

2002-08-25
BruteFIR v0.99a, a couple of minor bug fixes, discovered during the development of AlmusVCU.

2002-08-04
This new release (v0.99) contains a first version of an equalizer module, which allows equalization to be changed in runtime. Now the I/O delay is fixed, always exactly twice the filter block length (if the sound card hardware is properly designed). Good for synchronization with other audio processors, or clustering. There is also a slight change in configuration file format, so you know why it will complain when run with an old configuration file.

2002-07-26
Added a minor feature that proved necessary for some applications, such as Ambisonics. This feature makes it is possible to multiply inputs/outputs in mixing with negative values, not just positive. The new version is BruteFIR 0.98e. An invalid version was available a few hours during this day (forgot to include some CLI patches), so if you downloaded your v0.98e at this date, download it again.

2002-07-21
Two bug fixes in this new release, BruteFIR 0.98d. The first concerns scaling of coefficient parameters, where PCM coefficients where incorrectly scaled. The other fix is in the ALSA I/O module, which could at some occasions fail to set the sample rate.

2002-06-14
BruteFIR 0.98c, another small step towards 1.0. This contains an important bugfix. Earlier versions could mix up the mix buffers which caused looping sound with some filter configurations, this is now fixed. The common mistake (at least for me) to link a 32 bit BruteFIR with a 64 bit FFTW or the other way around is now taken care of.

2002-05-05
BruteFIR 0.98b. The sample rate monitoring added in 0.98a is now optional, through the option monitor_rate. Also support for SSE2 for Pentium 4 processors is implemented (only used when compiled with double precision). It is also possible to compile and run on Solaris with Sparc processors.

2002-04-16
Yet another of the usual minor updates: BruteFIR 0.98a. This fixes a minor bug which could cause stray processes to be left after exit. It also improves the real-time index calculation so it works properly on SMP, and the program now exits with an error when sample rate is changed in runtime. There are now interpretable exit codes from the program as well, so one can now why it exited.

2002-03-25
BruteFIR 0.98: This new release supports virtual inputs and outputs, which can be used to control delay of individual outputs even if they are mixed to the same physical output.

2001-12-20
Another bugfix release, 0.97d. Also added a -quiet command line parameter to suppress title, warnings and informational messages at startup.

2001-12-17
Due to popular demand, the ALSA I/O module has got support for accessing the software modes of the ALSA library. The new release is 0.97c.

2001-12-16
Ooops. The new sample format handling was not as good as I initially thought. Now that has been fixed. Oh, clipping for 32 bit formats works again. I hope I did not burst anyone's ears (other than mine). The release version is 0.97b.

2001-12-15
Some major bugs was introduced in 0.97, hopefully most of them has been squashed in this new release, 0.97a.

2001-12-09
BruteFIR 0.97: a new release with lots of major changes. The software is now much more modular. It uses modules for input and output, ALSA and file I/O being the first modules available. It also supports logic modules, the old BruteFIR CLI being the first example. The logic modules can be used to achieve adaptive filtering. The new module architecture will probably need some time to stabilize, and due to the large amount of changes to the code, there is a great risk that this new version is less stable than the last. A few details in the configuration file format has changed as well, for which the documentation has been updated. The documentation for how to program a BruteFIR module is not yet available though.

2001-11-04
Added a todo list. Any suggestions are welcome of course.

2001-10-27
Added some quick and dirty benchmarks, and added some new documentation. I made a low latency benchmark due to popular demand, and the interesting result is that it is possible to get as low as three milliseconds I/O delay, which is much lower than what I expected.

2001-09-27
New release, BruteFIR 0.96a. Some minor bugfixes, and at last processor capability detection code has been included, so BruteFIR will detect SSE or 3DNow, and use the optimized code accordingly.

2001-08-26
Updated documentation to cover all the new features of BruteFIR 0.96.

2001-08-20
BruteFIR 0.96 has been released, with a few important bugfixes, but also much new features, which not yet has been documented here. It is now possible to make filter networks, and have different length on different filters.

2001-07-18
A new release, BruteFIR 0.95b, which contains an important bugfix is available for download. It fixes a block bounds violation error when converting from 32 bit integers to floating point. It also contains some tuning of realtime priorities.

2001-06-10
Some minor updates to the documentation.

2001-06-03
A bugfix release, BruteFIR 0.95a, is available for download. It fixes a bug which caused the program to crash when long filters in raw format was read.

The documentation is now up to date again.

2001-05-26
New release, BruteFIR 0.95. This includes some new features, for example support for changing delay in runtime and support for non-interleaved sound cards. An important bug fix has also been applied, when mixing files and sound cards for inputs/outputs trouble could occur, but that should be fixed now.

Again, the documentation on this page is not entirely up to date with the software itself.

2001-04-11
BruteFIR 0.94a released, which is a bugfix release. A severe bug in the ALSA support code caused the error "Hardware does not support enough fragments." with common sound cards. Now it is gone. Still there is some work to do on the ALSA support code, like adding support for cards with non-interleaved buffer layout (like the RME9652).

2001-04-08
Major changes and cleanups of this page has been done, and the source code has been re-released. The new version is 0.94, and contains a new improved convolution algorithm with hand-coded assembler optimizations for Intel's SSE and AMD's 3Dnow. With this, BruteFIR is now capable of even higher throughput.

Note on the documentation's age

Note that the core of this documentation was written 1999 — 2001 and is thus old. It's up to date regarding how to configure BruteFIR, but there's many references to old kernel versions and old CPUs embedded in here.

What is it?

BruteFIR is a software convolution engine, a program for applying long FIR filters to multi-channel digital audio, either offline or in realtime. Its basic operation is specified through a configuration file, and filters, attenuation and delay can be changed in runtime through a simple command line interface. The FIR filter algorithm used is an optimized frequency domain algorithm, partly implemented in hand-coded assembler, thus throughput is extremely high. In realtime, a standard computer can typically run more than 10 channels with more than 60000 filter taps each.

Through its highly modular design, things like adaptive filtering, signal generators and sample I/O are easily added, extended and modified, without the need to alter the program itself.

BruteFIR is free and open-source. It is licensed through the GNU General Public License [6].

The preferred operating system platform for the program is Linux [11], but it is easily ported to other Unixes as well, and supports for example FreeBSD out of the box. BruteFIR uses the high-performance FFTW library [7] for the Fast Fourier Transform (FFT, [5]) calculations, and ALSA, the Advanced Linux Sound Architecture [2], is the preferred way of interfacing sound cards, although OSS, Open Sound System [25], is supported as well. The main features are:

What is it good for?

A few examples of applications where BruteFIR could be a central component:

Among these, room equalization and auralization needs the longest FIR filters in the common case. Many applications can do with quite short filters actually, but the thing is that you will probably not need to compromise on the filter lengths when you use BruteFIR, even when sample rates go up. However, BruteFIR is pretty useless by itself, since it is only a FIR filter engine. It does not provide any filter coefficients, thus it is not a filter design program. Also, due to its relatively high I/O-delay, BruteFIR is most suited for applications when the input signal is not live.

If you are interested in room equalization, my old NWFIIR project [18] might be of interest. It's a bit dated though. A better program for room equalization is Denis Sbragion's DRC [22].

BruteFIR convolution

The main design goal of BruteFIR is to achieve as high throughput as possible when filters are long (longer than 10000 taps). This means that the filter algorithm must be very fast, since it will be consuming almost all processor time of the whole program. BruteFIR's convolution algorithm is an example of a situation where a theoretically less efficient algorithm is faster in practice, because it is easily optimized and hides performance problems of more complex components.

Frequency domain algorithms for convolution is much faster than the straight-forward time domain one when filters are long. The well known overlap-save algorithm is used as the base in BruteFIR's convolution. However, there are practical problems with this algorithm as we will see.

The problem of complexity

Efficient convolution is done in the frequency domain and therefore an FFT algorithm is needed. The FFT calculations occupy typically more than 90% of all processing time when plain overlap-save is employed. Unfortunately, FFT it is not easy to implement. There exist numerous implementations which vary greatly in performance, which is one proof of the complexity. Since it takes up almost all processing time, we must optimize it in order to make the convolution faster. This leaves us with a quite hard optimization problem.

One way to optimize is to code assembler by hand and try to be better than the compiler. Modern processors for personal computers like Intel's Pentium III [10] or AMD's Athlon [1] has custom SIMD instructions (Single Instruction Multiple Data), which allows for a single instruction to operate on more than one data element at a time. For example, a single instruction may add together four or eight floating point numbers. Typically, one can improve the performance of an algorithm four times when using these instructions. They are not used by common compilers like GCC (GNU Compiler Collection [9]), meaning that we have a good opportunity to write assembler code that will with a wide margin outperform code generated by the compiler. Most FFT libraries are written in C, and thus does not use these efficient SIMD instructions. So, theoretically, we could implement an FFT algorithm using SIMD instructions and beat the ones already available. However, we are going for a simpler approach as we shall see. Since one of the design goals of BruteFIR is to be fairly portable, we want to make any assembler implementation small and simple, so it easily can be ported to other processor architectures. Maybe 'small', but certainly not 'simple' would be applicable on an assembler implementation of FFT. In conclusion, we find optimization with assembler as an attractive method to increase performance of existing algorithms. However, the algorithm we need to optimize, FFT, is quite complex and thus not an attractive target for optimization.

One of the fastest FFT libraries available is FFTW [7], [8], which is used by BruteFIR. There are more efficient FFT libraries out there (?), but they are often limited to short lengths (typically less than 8192), or are not free software nor open-source, which is a requirement of the BruteFIR project.

Problems with long FFTs

Many of the fastest FFT implementations support only shorter filter lengths (djbfft [3] being one example), and those that support long lengths may behave poorly on some architectures. One example is FFTW which on my 900 MHz AMD Athlon test system gets a large performance dip when FFT lengths become larger than 32768 (real-valued transforms). On the test system, a 262144 point FFT is 30 times slower than a 32768 point, which theoretically should be only 10 times. Although the behavior is more stable on my 550 MHz Pentium III test system, performance drops more than O(n * log2(n)) which is the complexity of the FFT algorithm. Note that these tests were performed using FFTW2.

These performance problems is of course due to memory accesses, and poor cooperation between the hardware caching architecture and the software. When the data of the algorithm exceeds the cache size, the problem becomes obvious.

Both Pentium and Athlon architectures allows for giving the cache hints from the software to reduce problems in these situations, but this must be done in assembler, and is therefore seldom used.

Apart from performance problems, long FFTs include more multiplications and scalings which induces a larger quantization error. This is however a minor problem (?).

Partitioned convolution

We have seen that the central algorithm of fast convolution, the Fast Fourier Transform, is complex to implement and optimize. We have also seen that the need of long FFTs reduces the choices of available implementations and that the existing can behave poorly on some hardware architectures. A modified fast convolution algorithm that uses shorter FFTs, and where most time is spent in code which is small and easily optimized, would be ideal.

Many have worked on improving the standard frequency domain convolution algorithms for different purposes. The central idea found in many of these improvements, is that the impulse response, that is the filter, is partitioned into several smaller parts. When each part is filtered with the input, the results delayed suitably and finally added together, one gets the same result as when processing the whole filter at once. As far as I know, the earliest user of this simple but powerful concept is T.G. Stockham [16], who published his results only one year after the famous Cooley and Tukey FFT paper [5]. The concept can be used to solve several problems. Stockham used it for saving memory, but in later work made in the eighties and early nineties, at the time when realtime DSP became feasible for the first time, it was stated that it can also be used to reduce quantization errors, reduce I/O-delay, and adapt to optimal FFT lengths of a specific implementation. All these improvements are described by J.S. Soo and K.K. Pang [14], [15]. Other realtime partitioned convolution pioneers are B.D. Kulp [17], P.C.W. Sommen [12], [13] and J.M.P. Borrallo and M. G. Otero [4]. Their work is a good place to start reading for the one interested in getting a more detailed description of partitioned convolution. The convolution algorithm in BruteFIR is conceptually exactly the same as the one found in these papers.

When partitioned convolution is used, something interesting happens in the processing time distribution of the algorithm. The major part of processing is moved from the FFT algorithm, to the trivial operation of convolution in the frequency domain which is simply multiplication. The more parts we split the impulse response into, the more convolution and less FFT is done. Naturally the FFTs get shorter, and thus we get rid of the problems associated to long FFTs. We now realize that partitioned convolution is the answer to our wishes, we do not need long FFTs and it becomes less important to optimize the FFT algorithm.

Optimizing where it counts

We notice that we will earn most from optimizing the operation where a segment of input converted to the frequency domain is multiplied with the corresponding part of the filter also in the frequency domain. The result is then added to the output. When the data format is half-complex, a format used by most real-valued FFTs, The straight-forward implementation look like this when programmed in C:

    d[0] += b[0] * c[0];
    for (n = 1; n < n_fft / 2; n++) {
	d[n] += b[n] * c[n] - b[n_fft - n] * c[n_fft - n];
	d[n_fft - n] += b[n] * c[n_fft - n] + b[n_fft - n] * c[n];
    }
    d[n] += b[n] * c[n];

b is the input, c is the filter coefficients, and d is the output. As we see, this is a very short and simple algorithm, which is easy to implement in assembler. There are a couple of problems though. The data in each array is accessed from the tail and the front at the same time. It would be better for the cache to localize the accesses, and move from front to end only. It is also a problem that the data is accessed both in forward and reverse order (both 0,1,2,3 and 3,2,1,0), since we want to used SIMD instructions. To solve the problem, we need to reorder the data. This will only be necessary to do once with the filter coefficients, so it is free. For the input however, we need to do this once after each forward transform, and for the output we need to restore the half-complex order prior to each inverse transform. In BruteFIR the input reordering is put into the mixing and scaling step, and the output reordering in the quantization step, so the cost is next to nothing. Below is a C implementation of the previous algorithm, when data has been reordered to better fit SIMD instructions and to improve the memory access pattern:

    d1s = d[0] + b[0] * c[0];
    d2s = d[4] + b[4] * c[4];
    for (n = 0; n < n_fft; n += 8) {
	d[n+0] += b[n+0] * c[n+0] - b[n+4] * c[n+4];
	d[n+1] += b[n+1] * c[n+1] - b[n+5] * c[n+5];
	d[n+2] += b[n+2] * c[n+2] - b[n+6] * c[n+6];
	d[n+3] += b[n+3] * c[n+3] - b[n+7] * c[n+7];

    	d[n+4] += b[n+0] * c[n+4] + b[n+4] * c[n+0];
    	d[n+5] += b[n+1] * c[n+5] + b[n+5] * c[n+1];
    	d[n+6] += b[n+2] * c[n+6] + b[n+6] * c[n+2];
    	d[n+7] += b[n+3] * c[n+7] + b[n+7] * c[n+3];
    }
    d[0] = d1s;
    d[4] = d2s;

The above function is easily converted into assembler using Intel's SSE instructions, or AMD's 3Dnow instructions, with cache hint instructions. The key loop (which is unrolled to further improve performance) becomes less than 50 lines long.

It is interesting that partitioned convolution makes much more memory references than ordinary overlap-save. In the most simple algorithm analysis, only the number of mathematical operations (like multiplications and additions) are considered when evaluating performance. Better analysis also counts the number of memory references, but unfortunately that is not enough considering the modern computer architecture; it is also of profound importance to take how the accesses are done into consideration. One bad reference can be worse in terms of performance than ten good ones on a modern computer.

Conclusion

By implementing partitioned convolution we have avoided the need of using long FFTs, and moved the major part of the processing time from the FFT to a simple multiplication loop. By reordering data after the forward transform and restoring it prior to inverse transform, the multiplication loop can be easily realized with SIMD instructions, and thus become very efficient. On the 900 MHz AMD Athlon test system, filtering of a 131072 tap long filter is twice as fast when 16 partitions of 8192 taps each are used instead of a single partition (note: this test case is exceptional, the performance improvement is less in the common case). This despite the new algorithm uses more memory references and more mathematical operations.

Apart from the improvement in throughput, we also get lower I/O-delay (equals about twice the partition length), lower memory consumption, and more flexible filter length options. A 140000 tap filter would require a 262144 tap filter if ordinary overlap-save was used, but with partitioned convolution we can use 18 partitions of 8192 taps, and then get a gross performance improvement, coupled with delay reduction.

Still, one must not over-estimate partitioned convolution. If there really is an optimal FFT algorithm available, ordinary overlap-save will certainly outperform the partitioned algorithm. An example of an assembler-optimized FFT algorithm can be found in the non-free and non-portable Intel Native signaling processing library [19].

Where can I get it?

You are free to download version 1.0o.

The package contains the source-code, you will need a supported platform to run it on (Linux is recommended, but FreeBSD or Solaris should work out of the box too, it is not as closely maintained though). Apart from the basic stuff you must also have FFTW3 installed (note that FFTW2, as used by old versions of BruteFIR, won't work). FFTW3 must be compiled for both double and single precision.

If you want sound card support, it is recommended to use ALSA on Linux platforms, and when that is not available, OSS can be used.

If you want to use the JACK support, you need an up to date version of JACK installed.

Be sure that you use an official GCC compiler when compiling BruteFIR. One user reported bad sound quality (noise artifacts in the BruteFIR output), and it was shown that he had used GCC 2.96 (not an official version), that caused errors in the floating point calculations of BruteFIR.

The package does not yet contain configure scripts or other nice things to make compiling easier. However, with some luck it should work simply by typing 'make'. You can also view the Makefile to see what compile options there are. If you have any questions, just mail me (address in the footer).

How fast is it?

BruteFIR's main feature is that is fast. It's brutally fast. The key component making BruteFIR fast is the convolution algorithm described above.

Note: the test descriptions here are a bit dated, made using an old version of BruteFIR. However, the results should provide a rough idea of what BruteFIR can do in terms of throughput. The example configuration files have been updated to work with the current version.

How high throughput can I get?

With a massive convolution configuration file setting up BruteFIR to run 26 filters, each 131072 taps long, each connected to its own input and output (that is 26 inputs and outputs), meaning a total of 3407872 filter taps, a 1 GHz AMD Athlon with 266 MHz DDR RAM gets about 90% processor load, and can successfully run it in real time. The sample rate was 44.1 kHz, BruteFIR was compiled with 32 bit floating point precision, and the I/O delay was set to 375 ms. The sound card used was an RME Audio Hammerfall.

How low I/O delay can I get?

BruteFIR is mainly designed for high throughput, not low delay. However, there is an interest of using BruteFIR for low delay convolution anyway, so here are some benchmarks so you know what to expect. Partitioned convolution can indeed allow for quite low delay, very low if the processing power is available, and the filters are not too long.

Below is an example of a simple cross-talk cancellation application running on a 1 GHz AMD Athlon with 266 MHz DDR RAM and an RME Audio Hammerfall sound card. You can download the cross-talk cancellation configuration file that was used if you want to test yourself. There are only four filters and their length are no more than 8192 taps (note: the example files included in the package are only 4096 taps long, as seen in the updated example configuration file), so it is indeed a very light application, which is a requirement if you want very low delay, since partitioned convolution does not scale very well with low delays (meaning a large number of partitions). The sample rate in these tests is 44.1 kHz, and BruteFIR was running with 32 bit floating point precision.

delay in ms processor load partition size number of partitions
3 ms 60% 64 samples 128
6 ms 30% 128 samples 64
12 ms 16% 256 samples 32
24 ms 11% 512 samples 16
47 ms 8% 1024 samples 8

As seen in the table, BruteFIR allows for as low delay as 3 milliseconds, which is the limit of the sound card used, which cannot have shorter than 64 sample partitions.

If you want to run BruteFIR to achieve high throughput, you should expect to have a delay of at least 100 ms though (and using no more than 16 partitions or so).

If you try to run BruteFIR with shorter delay than the computer can handle, or with too long filters, the program will exit with a broken pipe signal. If you get broken pipe only after a while, this is probably due to that you have not applied a good low latency patch to the kernel (there are bad ones as well), or you have cron jobs running or other software that competes for using the processor. For reasonable low latency, a low latency kernel can handle other processes running, but for as low as 3 milliseconds like in this example, you should have a dedicated clean system for running BruteFIR.

Hardware considerations

Note: the hardware referenced here is a bit dated (a long time ago the text was written), but apart from that, the text is up to date.

What is important for BruteFIR is that the machine has fast memory and fast processor. A Pentium 4 with its RDRAM is probably the best choice today. However, an Athlon with DDR RAM is not bad either, and significantly cheaper. A fast processor on a computer with slow memory is what most often causes disappointment. For example, a dual Pentium III at 1 GHz with good use of both processors was found to be slower than a single processor 1 GHz AMD Athlon with DDR RAM. The problem was that the Pentium III had poor memory performance. The stream benchmark [20] is a good program to use to verify the memory bandwidth if you think you get poor BruteFIR performance.

If you use SDRAM you will never get exceptional memory bandwidth, however, some tuning of timer settings in the BIOS, or overclocking of the memory bus can give you quite decent performance.

When it comes to sound hardware, you should be able to use any card that is compatible with ALSA [2]. However, it is not very likely that the sound card code of BruteFIR will work for all sound cards supported by ALSA, although that is the goal. If you get problems with your sound card, please send me a mail, and I will do my best to get it to work, or even better, try to get it to work yourself and send me a patch.

The best sound cards are those which support partition sizes which are a powers of two. If that is not the case, BruteFIR must run in input poll mode, which is not necessarily less reliable, but will consume a part of the spare processor time.

The worst possible sound card is one which does not support partition sizes with a power of two, and can only transfer large sample blocks at a time. Then BruteFIR will run unreliably or not at all.

If you want to avoid problems I recommend RME Audio [21] Hammerfall (Light) (RME9652 and RME9636) and also cards from the RME Audio Digi96 series (RME96), since those are the cards I use myself. The Hammerfall cards support up to 26 inputs and 26 outputs, the Digi96 cards support up to 8 channels. They are not the cheapest cards out there, but these are clean professional cards, fully digital with ADAT and S/PDIF inputs and outputs, which means you can have high-quality DACs and ADCs outside the computer to get the best sonic performance possible.

The Hammerfall cards allow for shorter delay (minimum partition size is 64 samples) than the Digi96 series (minimum size 1024 samples).

Configuring and running

When BruteFIR is run for the first time (without parameters), it will generate a default configuration file (~/.brutefir_defaults) (if not the -nodefault option is used), and then complain that it cannot find .brutefir_config in the home directory, which is the default location. The default configuration file contains default settings, which is extended and/or overridden in the main configuration file. A setting that is specified in the default configuration file, is not necessary to be listed in the main configuration file.

BruteFIR takes only four parameters, namely the filename of the main configuration file, and optionally -quiet to suppress title, warnings and informational messages at startup, and -nodefault if BruteFIR should read all settings from the main configuration file, and finally -daemon if it should run as a daemon.

If no parameters are given, the filename given in the default configuration file is used. If the filename is "stdin", BruteFIR will expect the configuration file to be available on the standard input.

The (default) default configuration file looks like this:

## DEFAULT GENERAL SETTINGS ##
 
float_bits: 32;             # internal floating point precision
sampling_rate: 44100;       # sampling rate in Hz of audio interfaces
filter_length: 65536;       # length of filters
config_file: "~/.brutefir_config"; # standard location of main config file
overflow_warnings: true;    # echo warnings to stderr if overflow occurs
show_progress: true;        # echo filtering progress to stderr
max_dither_table_size: 0;   # maximum size in bytes of precalculated dither
allow_poll_mode: false;     # allow use of input poll mode
modules_path: ".";          # extra path where to find BruteFIR modules
powersave: false;           # pause filtering when input is zero
monitor_rate: false;        # monitor sample rate
lock_memory: true;          # try to lock memory if realtime prio is set
sdf_length: -1;             # subsample filter half length in samples
convolver_config: "~/.brutefir_convolver"; # location of convolver config file
 
## COEFF DEFAULTS ##
 
coeff {
        format: "text";     # file format
        attenuation: 0.0;   # attenuation in dB
	blocks: -1;         # how long in blocks
	skip: 0;            # how many bytes to skip
	shared_mem: false;  # allocate in shared memory
};
 
## INPUT DEFAULTS ##
 
input {
        device: "file" {};  # module and parameters to get audio
        sample: "S16_LE";   # sample format
        channels: 2/0,1;    # number of open channels / which to use
        delay: 0,0;         # delay in samples for each channel
	maxdelay: -1;	    # max delay for variable delays
	mute: false, false; # mute active on startup for each channel
};
 
## OUTPUT DEFAULTS ##
 
output {
        device: "file" {};  # module and parameters to put audio
        sample: "S16_LE";   # sample format
        channels: 2/0,1;    # number of open channels / which to use
        delay: 0,0;         # delay in samples for each channel
	maxdelay: -1;	    # max delay for variable delays
	mute: false, false; # mute active on startup for each channel
        dither: false;      # apply dither
	merge: false;       # merge discontinuities at coeff change
};
 
## FILTER DEFAULTS ##
 
filter {
        process: -1;        # process index to run in (-1 means auto)
	delay: 0;           # predelay, in blocks
	crossfade: false;   # crossfade when coefficient is changed
};

The syntax of the main configuration file is very similar as we will see. As we can see, there are five sections in the configuration:

The general syntax rules for the configuration files is easily grasped from the default configuration file. The semicolons are important, they note the end of a setting, not line breaks, so you may have several settings on one line if you like. All characters on a line after a # is found are ignored. There are three data types: strings, numbers and booleans. Strings are text between quotes, a number is either with or without a decimal dot, and a boolean is either 'true' or 'false'.

Note that everything is case sensitive, so setting names must be written with small letters. Although the configuration file examples shown here is nicely ordered in sections, it is perfectly alright to mix settings in any order you like.

The general settings section in the main configuration file has the same syntax as in the default configuration file. The difference is that coeff, input, output and filter structures can exist in multiples, and are given names and more parameters.

General settings

Default values of all general settings (except logic) must be given in the default configuration file. Any of these settings may be overridden in the main configuration file (except config_file). These settings are:

float_bits: <NUMBER: internal floating point resolution, either 32 or 64>;
sampling_rate: <NUMBER: sampling rate in Hz>;
filter_length: <NUMBER: length in samples of the (sub)filters>[,<NUMBER: number of subfilters per filter>];;
config_file: <STRING: default location of main configuration file>;
overflow_warnings: <BOOLEAN: echo overflow warnings to stderr>;
show_progress: <BOOLEAN: echo progress to stderr>;
max_dither_table_size: <NUMBER: maximum size in bytes of pre-calculated dither>;
allow_poll_mode: <BOOLEAN: allow input poll mode>;
modules_path: <STRING: extra path where to find BruteFIR modules>;
logic: <STRING: logic module name> { <logic module parameters> }[, ...];
powersave: <BOOLEAN or NUMBER: pause filtering when input is zero>;
monitor_rate: <BOOLEAN: monitor sample rate, and abort if it changes>;
lock_memory: <BOOLEAN: try to lock memory if realtime prio is set>;
sdf_length: <NUMBER: sub-sample delay filter half length in samples>[, <NUMBER: kaiser window beta>];
convolver_config: <STRING: file to store FFTW wisdom in>;
benchmark: <BOOLEAN: start in benchmark mode (can only be used in main config file)>;
safety_limit: <NUMBER: if non-zero max dB in output before aborting>;

The filter_length setting specifies how long the filters should be. This can be done in two ways. Either by specifying the length in one number, which must be a power of two. If so, the convolution will be done on the whole filter length. To partition a 65536 tap filter in 16 parts, you write filter_length: 4096,16. Partitioned filters can be used to improve performance and reduce I/O-delay.

The convolver_config setting specifies where FFTW wisdom should be stored, that is optimization information for the FFT calculations.

If overflow_warnings is set to true, information about overflows will be printed to the screen when they occur. Note that overflowed samples are always set to the maximum output value of the output device, so there is no actual overflow on the output (unless the actual floating point value is overflowed). If overflow occurs, it means that the filter is amplifying too much, either through its coefficients or through input and output attenuation. Overflow is not checked for if the output values are floating point.

If dither is applied to any output, a dither table will be calculated when the program is started. It contains uncorrelated random values that is used to generate the dither. The more channels that applies dither, the larger table is needed, if to keep the dither uncorrelated between channels. This table can get quite large memory-wise. If you want to limit its size, set max_dither_table_size to a value. It should rather not be less than one megabyte though. If it is set to zero or negative, the program will itself choose a size.

BruteFIR uses external modules to provide sample I/O, and optionally add new logic. It will search a few default directories to find any modules that should be loaded, as specified in the configuration. The setting modules_path will add an extra directory, which is searched first. The value in the created default configuration file will be ".", that is the current working directory.

If any logic modules should be loaded, these are listed in the logic field, in pairs of module name / module parameters, separated with commas. Which logic modules that are available and what functionality they provide can be found in the Logic modules section.

If there is any sound card used for input or output (or any other sample-clock dependent device), BruteFIR will automatically set its delay-sensitive processes to realtime priority, thus you will typically need to run the program as root. To maintain realtime performance, it is important that there is no memory belonging to the program in the swapfile, thus all memory must be locked to RAM. This is done if lock_memory is set to true. Note that the memory is never locked when realtime priority is not set (that is when there are only files used for input and output). Warning: there seems to be a bug in the Linux kernel which makes the shared memory to be locked one time for each process, meaning that when lock_memory is set to true, BruteFIR will seem to consume a lot more memory than it should. Also, it makes of course no sense to lock memory if your system does not have a swap activated. Due to this issue, the best thing to do is to have a system with no swap and avoid locking the memory.

The powersave feature if activated, will monitor the inputs, and if an input channel provides zero samples, the associated filters will not do any processing, since with zero on the input, BruteFIR knows in advance that there will be zero on the output. BruteFIR will continue run as normal, and filters with non-zero inputs will continue to to process normally. As soon as there is non-zero input on a suspended filter, it starts processing again. This powersave feature is transparent, there will be no convolution errors if it is activated. The reason for having it optional is that one may want to make performance tests, without the need to feed a meaningful signal to BruteFIR.

If analog inputs are used, the input will never be exactly zero, and thus the powersave feature will not be triggered. However, if a value is specified instead of the boolean (for example powersave: -80;), that value is interpreted as the lowest level in dB the input signal can be, before BruteFIR will consider the input as zero, and trigger powersave. Thus, a noise floor can be specified, and then powersave can work together with analog inputs.

If benchmark mode is activated (can only be done in the main configuration file), performance statistics will be printed on screen. Note that due to complex caching effects of modern computers, the displayed processing times can look strange, a step that requires much more arithmetic operations than another may in certain circumstances still be considerably faster, if it has better luck with the cache. Since benchmarking measures elapsed time, the computer must not be loaded with any other tasks in order to get reliable results.

If a sound card which is used for input cannot be configured to have a period size (interrupt interval) equal to or smaller than the configured filter (partition) length, or if it is cannot be a power of two, BruteFIR must be run in input poll mode. This means that the sound card is polled for data, and sound card interrupts are not used. BruteFIR will run just as reliably (as long as the sound card allows for small transfers) but will consume more of the spare processor time. Thus it will look like BruteFIR uses more processor than it actually needs to. If more processor time is used for filtering, less will be used for polling, thus input poll mode does not mean that it is not possible to have as long filters as running in normal mode. However, for some applications (for example when the spare processor time is used by another vital program), input poll mode is not suitable, and by setting the allow_poll_mode to false, BruteFIR will exit with an error if input poll mode is required.

If subsample delays should be possible to set, the sdf_length setting must be larger than zero. It specifies the half length of a sub-sample delay filter. A sub-sample delay filter is simply a sinc sampled with a sub-sample offset. Thus, when a signal is convolved with the filter it is delayed with the corresponding offset. Since a sinc signal is infinitely long, it must be windowed. A kaiser window is used, default beta is 9.0, but an own value can be specified by adding it after a comma (example: sdf_length: 31, 8.5;), there is little reason to use other than the default though. The distortion caused by the windowing is a soft rolloff at higher frequencies, the shape depends on the beta value. There is no phase distortion. Since the sub-sample filters are linear phase, they will add a pre-response (in practice I/O-delay), which is their half filter length, that is the value given after the sdf_length setting. If sub-sample delay are used only on inputs or outputs, the added pre-response is the same as the sdf_length, if used on both (usually not necessary), it will be twice the length. To activate sub-sample delay, also a valid subdelay must be specified in at least one of the input/output structures. The valid range is -99 to 99.

The advantage of a long sub-sample filter length is that the rolloff in the high frequencies starts later and gets sharper, that is less high frequency information is lost. The disadvantage of long sub-sample filters is that the required CPU time increases, and the added I/O-delay increases. Sub-sample filters are processed separately in the frequency domain using FFT, and therefore it is recommended to keep sdf_length at a power of two minus one (the actual filter length is twice sdf_length plus one), which means that as much as possible of the FFT block is used (an sdf_length of 16 requires as much CPU time as an sdf_length of 31, since the same block length is required). With an sdf_length of 31 and the default beta of 9.0, and a sample rate of 44100 Hz, the response is flat up to 19 kHz, and then a soft rolloff begins which reaches -0.20 dB at 20 kHz, which is good enough for most needs. The next natural step, 63, keeps a flat response up to about 20500 Hz, with -0.20 dB at 21 kHz.

The purpose of the safety_limit setting is to protect your ears and expensive speakers, it's active if set to a non-zero value. Every output sample is checked and if it exceeds this value (in dB) BruteFIR will immediately exit with an error message, before any sound is sent to the output.

General structure syntax

<structure type name> <STRING: name (list for some) | NUMBER: index> {
	<field name 1>: <setting 1>;
	[...]
};

Names of structures (given after the type name) is not given in the default configuration file, but must be provided in the main configuration file. The name is either a custom string, or an index number, which must then be the same as the order of the structure in the file, that is the first structure must be indexed 0, the second 1 and so on. If a string name is given, the index number is given automatically (the opposite also applies), and when referring to the structure, either the string name or the index number can be used. Some structures, namely input and output, may have a comma-separated list of names, since the names applies to the channels defined in the structure.

After the name, or the structure type name if in the default configuration file, There is a left brace ({), and then structure fields and their settings, each field/setting pair ending with semicolon (;). As for the general settings, field names always end with a colon (:). The order of the fields is not important. The structure is closed with a right brace (}) and ended with a semicolon.

Coeff structure

coeff <STRING: name | NUMBER: index> {
	filename: <STRING: filename>; | <NUMBER: shmid>/<NUMBER: offset>/<NUMBER: blocks>[,...];
	format: <STRING: sample format string | "text" | "processed">;
	attenuation: <NUMBER: attenuation in dB>;
	blocks: <NUMBER: length in blocks>;
	skip: <NUMBER: bytes to skip in beginning of file>;
	shared_mem: <BOOLEAN: allocate in shared mem>
};

In the default configuration file, the filename field is not set, so it must be present in the main configuration file.

The coeff structure defines a set of filter coefficients, which becomes a FIR filter. There are several different file formats:

Note that BruteFIR currently does not provide any way to convert other formats to the "processed" format (well actually it does, but only through its module API).

The coefficients can be scaled, by setting the attenuation to non-zero.

Instead of a filename, comma-separated number groups can be given. The first number will be a shared memory ID (man shmat) where the data is found, the second number is the offset in bytes into the shared memory area where the program starts to read, and the third is how many blocks that should be read. A block is a filter segment, that is if filter_length is 4096, 16 one block is 4096 coefficients, and there can be no more than 16 blocks per coefficient set. If not all blocks covered in the first group, there must be following number groups to provide the full length. When a shared memory segment is given, it is required that the format is "processed".

In some cases, when one wants to test the performance of a certain BruteFIR configuration, but don't feel like generating coefficients, one can set the filename to "dirac pulse". Then BruteFIR will generate a dirac pulse filter internally and use it as any other filter, and thus will cost as much in processing as any other filter of the same length. However, if you need a dirac pulse in the real case, it makes no sense using this feature, since simply setting the coeff field in the filter structure to -1 gives the same effect and uses very little processor power (and memory).

The blocks field says how long in filter blocks the coefficient set should be. If it is set to -1, the full length is assumed. Note that custom lengths are only possible if partitioned convolution is employed (quite naturally, since else there will only be one filter block covering the full length).

The skip field if given specifies how many bytes in the beginning of the file that should be skipped. This can be used to skip headers in a file or similar. The field will be ignored if the coefficients are not read from file.

The shared_mem field indicates if the coefficient should be stored in shared memory. Some modules may require that, such as the equalization module.

Input and output structure

input <STRING: name | NUMBER: index>[, ...] {
        device: <STRING: I/O module name> { <I/O module settings> };
        sample: <STRING: sample format>;
        channels: <NUMBER: open channels>[/<NUMBER: channel index>[, ...]];
	delay: <NUMBER: delay in samples>[, ...];
	subdelay: <NUMBER: additional delay in 1/100th samples (valid range -99 - 99)>[, ...];
	maxdelay: <NUMBER: maximum delay for dynamic changes>;
	individual_maxdelay: <NUMBER: maximum delay for dynamic changes>[, ...];;
	mute: <BOOLEAN: mute channel>[, ...];
	mapping: <NUMBER: channel index>[, ...];
};

output <STRING: name | NUMBER: index>[, ...] {
        device: <same syntax as for the input structure>;
        sample: <same syntax as for the input structure>;
        channels: <same syntax as for the input structure>;
	delay: <same syntax as for the input structure>;
	subdelay: <NUMBER: additional delay in 1/100th samples (valid range -99 - 99)>[, ...];
	maxdelay: <same syntax as for the input structure>;
	individual_maxdelay: <same syntax as for the input structure>;
	mute: <same syntax as for the input structure>;
	mapping: <same syntax as for the input structure>;
	dither: <BOOLEAN: apply dither>;
	merge: <BOOLEAN: merge discontinuities at coeff change>;
};

All fields for the input and output structures except mapping, delay and mute must be set in the default configuration file.

The device field specifies the source/destination of the digital audio. This is always an I/O module. First the name of the module is stated, followed by a its configuration within {}. If the audio is read/written from/to a module which does not continue forever (for example reading from a file), BruteFIR will finish when the first I/O module comes to an end (hopefully an input module, write failure of an output module is considered an error).

The sample format should be one of the following strings:

The common format 16 bit signed little endian found in for example 16 bit wav-files is thus "S16_LE". The floating point formats can be in any range, however all integer formats will be scaled to -1.0 to +1.0 internally, so if to match an integer format, the range should be -1.0 to +1.0. There is no overflow checking for floating point formats (that is values larger than +1.0 or lesser than -1.0 is not truncated).

The channels field specifies the number of open and used channels of the device. If the number of open channels exceed the number of used channels, a slash (/) followed by a comma-separated list of channel indexes of used channels must be appended. If we for example have a eight channel ADAT sound card, but we only want to use the first two, we write 8/0,1 as the channels setting. As you see, the lowest channel index is zero, not one.

The length of the list of names (given after the structure type name) must match or exceed the number of used channels. If there are more channels in the head (the logical, or virtual channels) than there are available through the device, the specified channels must be mapped onto the physical device channels. This is done with the mapping field, which simply is a list of indexes, which index in the head to map to which physical device channel. Here a simplified example:

output 14,15,16 {
        ...
        channels: 8/5,4;
	mapping: 0,1,0;
};

In this example, two channels from the eight channel device are used, channels with index 5 and 4. The order of the channel indexes matter, physical channel 5 will now be considered the first (index 0) of the available physical channels, and 4 the second (index 1). The mapping fields tells how to map the channels called 14, 15 and 16 in the header to those two physical channels. The mapping is in the same order as the channels in the header, that is 14 is mapped to physical channel index 0 (which is channel 5 on the eight channel device), 15 to index 1 (channel 4 on the device), and 16 to index 0, that is the logical channels 14 and 16 will mix into the same output on the device. In the standard case, where logical channels are the same as the amount of channels made available through the channels field, a mapping specification is not needed. Then the first logical channel is mapped to the first listed device channel and so on.

The list of delays specifies how many samples a channel should be delayed. This could be used to compensate for speaker positions that is either to close or too far away. It could also be used to compensate for acasual filters. Delay can be changed in runtime, if maxdelay is not set to a negative value. It defines the upper bound of delay in samples. When the program is started, delay buffers for all channels to match maxdelay is allocated. If it is negative, only the precise amount specified by the delay array is allocated.

The setting individual_maxdelay was added later, and works the same as maxdelay with the difference that it is specified per channel. It is useful to save memory when there are many channels, and only some of them need dynamic delay (or considerably larger buffer than the others).

If the general setting sdf_length is larger than zero, the subdelay setting will take effect. It specifies the sub-sample delay per channel in 1/100th of samples (valid range is -99 to 99). This delay can be changed in runtime. To disable sub-sample delay on a channel, set its sub-delay to a negative value outside the valid range. Since sub-sample delay consumes CPU time, it is recommended to only activate it where necessary. Sub-delay filters adds pre-response, and therefore all channels with sub-delay disabled will be automatically compensated with an I/O delay to make them aligned.

The mute list of booleans, specifies, in order, which channels that should be muted from the beginning. The muted channels can later be unmuted from the CLI.

If the dither flag is set to true, dither is applied on all used channels. Dither is a method to add carefully devised noise to improve the resolution. Although most modern recordings contain dither, they need to be re-dithered after they have been filtered for best resolution. Dither should be applied when the resolution is reduced, for example from 24 bits on the input to 16 bits on the output. However, one can claim that dither should always be applied, since the internal resolution is always higher than the output. When BruteFIR is compiled with single precision, it is not possible to apply dither to 24 bit output, since the internal resolution is not high enough. BruteFIR's dither algorithm is the highly efficient HP TPDF dither algorithm (High Pass Triangular Probability Distribution Function).

If the merge flag is set to true, discontinuities that may occur when coefficients are changed in runtime, is smoothed out with a simple merge algorithm. This avoids "clicks" that may occur in the sound when coefficients are changed. Note that discontinuities occurs also when volume is changed, but that is not merged, since those discontinuities are generally not audible or masked by the volume change itself. If someone does not agree with that, let me know, and I will make it apply the merger at volume changes too.

Filter structure

filter <STRING: name | NUMBER: index> {
        from_inputs: <STRING: name | NUMBER: index>[/<NUMBER:attenuation in dB>][/<NUMBER:multiplier>][, ...];
        from_filters: <same syntax as from_inputs field>;
        to_outputs: <same syntax as from_inputs field>;
        to_filters: <STRING: name | NUMBER: index>[, ...];
        process: <NUMBER: process index>;
	coeff: <STRING: name | NUMBER: index>;
	delay: <NUMBER: pre-delay in blocks>;
	crossfade: <BOOLEAN: cross-fade when coefficient is changed>;
};

Only the process field should be given in the default configuration file.

The filter structure defines where a filter is placed and what its parameters are. This is done in a filter:

  1. Possible attenuation is applied to the inputs, where-after they are mixed together.
  2. The mixed-together inputs are filtered.
  3. The filter output is copied to the output channels, possibly with individual attenuation. Attenuation is however not applicable to outputs going to other filters.

If an output channel exists in several filter structures, the filter outputs will be mixed into that channel. Thus, a set of filter structures defines how inputs and outputs should be copied, mixed and filtered.

With help of the from_filters and to_filters fields, filters can be connected to each-other. The only real constraint is that there must be no loops. BruteFIR will detect and point out errors if such exist in a given filter network. Note that if possible coefficients should be pre-convolved rather than put as filters in series, since a 2N length filter computes much faster than two cascaded N length filters.

The from_inputs, from_filters and to_outputs fields have the same syntax. One channel/filter is given as the string name or index number, and if attenuation should be applied, it is followed by a slash (/) and attenuation in dB. Instead of, or combined with, attenuation in dB, a multiplier can be given, a number which all samples will be multiplied with. The writing "channel 1"/6/-1 means that channel 1 is attenuated 6 dB and the polarity is changed (multiplication with -1). It is also possible to write "channel 1"//-0.5 which is equivalent to the first example.

If more than one channel should be included, they are separated with commas. The to_filters field has the same syntax with the exception that attenuation is not allowed.

The process field specifies in which Unix process the filter should be run. All filters with the same process index will run in the same process. Process index 0 must exist, and if there are more processes they should be in series, 0, 1, 2, 3 and so on. This field is important if BruteFIR runs on a multi-processor machine. The optimal situation is that there is one process per processor, and that each process requires the same processor time. Then you will get most out of your multi-processor computer. There is one limitation of how filters can be distributed between processes: mixing to an output channel or a filter input must be done within the same process.

If the process field is set to -1, an automatic but naive load balancing will take place, which may or may not be as good as a hand-made load balancing.

The coeff field defines which coefficient set that should be used for the filter. It could be given as the string name of the set, or as its index number. If the index number is set to minus one (-1), there will be no filtering in the filter, it will just mix and copy inputs/outputs as specified. Note that the length of the coefficient set specifies how processor intensive the filter will be.

The delay field specifies how many filter blocks pre-delay there should be. Zero or negative means no delay. The maximum allowed delay is one block less than full length. Thus, with unpartitioned filtering there can be no delay at all. The delay cost is zero both in terms of memory and processing.

If the crossfade setting is set to true, there will be a cross-fade when the coefficient is changed in runtime, making the coefficient change totally seamless. This means that when changing coefficient (using the CLI for example), the filter will convolve one block with the old coefficient, fade out that and mix it with a fade in block with the new coefficient. This means that at the time of coefficient change, there will be roughly twice the amount of processing for that filter. This processing spike can of course cause buffer underflow if running with a sound card and heavy CPU load in the normal case. If there for example are 10 filters in a configuration (all with crossfade active), and all coefficients are changed at the same time, the normal CPU load should not exceed 50%, since the spike will roughly require twice the load. However, if the coefficients are changed only one filter at a time, only 10% extra processing is required compared to the normal case in the example.

Configuration file example

Here follows an example of a main configuration file, showing some of the aspects of BruteFIR's possibilities. It implements a cross talk cancellation filter for a stereo dipole. The two filters are placed in two processes get the max out of a dual processor machine. A computer with a single processor should if possible keep all filters within the same process for best performance. Note that the configuration uses the default settings extensively. For example, no general settings have been specified apart from the addition of the CLI logic module, and in the coeff structures, only the filename field is used.

logic: "cli" { port: 3000; };

coeff "direct path" {
        filename: "direct_path.txt";
};

coeff "cross path" {
        filename: "cross_path.txt";
};

input "left", "right" {
        device: "file" { path: "/disk0/tmp/music.raw"; };
        sample: "S16_LE";
        channels: 2;
};

output "stereo dipole left", "stereo dipole right" {
        device: "file" { path: "output01.raw"; };
        sample: "S16_LE";
        channels: 2;
};

filter "left speaker direct path" {
        inputs: 0/6.0;
        outputs: 0;
        process: 0;
	coeff: "direct path";
};

filter "left speaker cross path" {
        inputs: "right"/6.0;
        outputs: "stereo dipole left";
        process: 0;
	coeff: "cross path";
};

filter "right speaker direct path" {
        inputs: "right"/6.0;
        outputs: "stereo dipole right";
        process: 1;
	coeff: "direct path";
};

filter "right speaker cross path" {
        inputs: "left"/6.0;
        outputs: "stereo dipole right";
        process: 1;
	coeff: 1;
};

I/O modules

I/O modules are used to provide sample input and output for the BruteFIR convolution engine. It is entirely up to the I/O module of how to produce input samples or store output samples. It could for example read input from a sound card, a file, or simply generate noise from a formula.

In the BruteFIR configuration file, an I/O module is specified in each input and output structure.

The purpose of having I/O modules instead of building all functionality directly into BruteFIR is that it should be easy to extend with new functionality, without compromising the core convolution engine.

All I/O modules has the extension ".bfio".

ALSA sound card I/O (alsa)

The ALSA I/O module (named "alsa") is used to read and write samples from/to sound cards. It supports all BruteFIR sample formats also supported by the referenced sound device. The basic configuration is simple, only one field, called device need to be set, where the associated value is a string which is passed without modification to ALSA's device open function. Examples: "alsa" { device: "hw"; } or "alsa" { device: "hw:1"; }.

In the above examples, the hardware is accessed directly (the "hw" prefix), but you can also use ALSA's software modes. That is however not recommended, since some functions of BruteFIR, for example overflow protection, expects to be at the very last output stage, and not before another software layer which may perform for example mixing or volume control.

In theory it should also be possible to access files (for example wav-files) through ALSA, "alsa" { device: "file:test.wav"; } but this does not seem to work currently, and is not recommended, since the module assumes that all devices are driven by a sample clock (thus is a sound card).

If the ALSA I/O module is used in several input/output structures, all referenced sound cards will be linked together using the ALSA API. This makes starting and stopping sound cards synchronized, if the hardware and driver supports it, if not, the ALSA subsystem tries to make starting and stopping is synchronized as it can. However, when there are many alsa devices used, this linking can cause the computer to lock up, at least it has happened in the past. This is probably due to a problem in ALSA, and may have been resolved when you read this. However, should you bump into problems, you can disable linking by setting link to false (example: "alsa" { device: "hw:1"; link: false; }).

Per default, when reading fails due to an overflow, or writing fails due to and underflow, BruteFIR will abort. If your computer is heavily loaded, and/or partitions are short, and/or other services are running on the computer, over/underflow can occur occasionally. In those cases, one might rather get occasional clicks in the sound rather than a total stop. The ALSA I/O module can hide over/underflow from BruteFIR, and thus it will not abort when that occurs. Just set the ignore_xrun parameter to true (example: "alsa" { device: "hw:1"; ignore_xrun: true; }).

OSS sound card I/O (oss)

The OSS I/O module (named "oss") provides sound card I/O through the OSS API. It has only one parameter, device, which points out the device to open. Example: "oss" { device: "/dev/dsp"; }.

The I/O module supports OSS multi-channel and full duplex modes.

JACK audio server I/O (jack)

The JACK I/O module (named "jack") provides BruteFIR with support for the low-latency JACK audio server [23]. JACK is an audio server under development, and the goal for the JACK I/O module is that it should be compatible with the current CVS version.

To avoid putting I/O-delay into the JACK graph, the JACK buffer size should be set to the same as the BruteFIR partition size. It is however possible to set the JACK buffer size to a smaller value. The I/O-delay in number of JACK buffers as seen by following JACK clients will be:

2 * <BruteFIR partition size> / <JACK buffer size> - 2

Note that both the JACK buffer size and BruteFIR period size is always a power of two.

Currently, the JACK I/O module assumes that jackd is run with the -R parameter, at its default client realtime priority which is 9.

The names of the BruteFIR ports will be "brutefir:input-X" for the inputs, and "brutefir:output-X", where X is the channel index. The JACK client name which is per default "brutefir" can be changed, by setting "clientname" (example: clientname: "brutefir-A";). It is a global setting, and if used it must be set in the first JACK device clause (the first from the top in the configuration file). The clientname will change the port name prefix as well (the prefix is the client name). If multiple BruteFIR instances should be run, they must have different client names, or else the port names will collide.

If the local ports should be connected to other JACK ports at startup, the setting ports is used, where the associated string values are the names of the ports to connect to. Examples: "jack" { ports: "alsa_pcm:capture_1", "alsa_pcm:capture_2"; } for input, and "jack" { ports: "alsa_pcm:playback_1", "alsa_pcm:playback_2"; } for output. The port listing must be set to the same amount as the number of channels for the device. However, empty strings could be used if a specific channel index should not be connected, for example: "jack" { ports: "", "alsa_pcm:capture_2"; } will only connect the second port.

It is also possible to optionally specify the port names to other than the default naming, like this: "jack" { ports: "alsa_pcm:capture_1"/"in-A"; }, that is adding a slash and specifying a name after that, this will replace the default "input-X" for inputs and "output-X" for outputs. If a port should not be connected but still be named, the first string is empty, like this: "jack" { ports: ""/"in-A"; }.

If no ports should be connected, and the client name is left to the default, the JACK device clause is empty ("jack" { };).

The sample format for the JACK device should be set to AUTO, which will be the JACK sample format (floating point).

Raw PCM file I/O (file)

The raw PCM file I/O module (named "file") is used to read and write samples from/to files. It supports all BruteFIR sample formats and reads/writes them directly in raw form, interleaved format. The parameter string is in the simplest case the filename. Example: "file" { path: "test.pcm"; }. One can also specify how many bytes to skip in the beginning for input files, and if to append output files. Examples: "file" { path: "test.pcm"; skip: 44; } and "file" { path: "test.pcm"; append: true; }.

It is also possible to read from and write to text files (X floating point ASCII values per line separated with whitespace, where X is the number of channels). Just add the option text: true;. The module will convert to/from 64 bit floating point, and thus requires that sample format (or use AUTO).

If the file I/O module is used for input, the input file can be looped, by setting loop to true.

By using /dev/stdin like this "file" { path: "/dev/stdin"; }, BruteFIR will read data from standard input, so it is then possible to do things like mpg123 -s test.mp3 | brutefir.

Writing your own I/O module

This will probably never be documented. The best way is to look at the source code to see how it is done.

Logic modules

Command line interface (cli)

The CLI logic module (named "cli") provides a command line interface available through telnet, a local socket, a pipe, or a serial line. The CLI is used for changing settings in runtime, which is of course only suitable when BruteFIR is used in realtime. It can be used interactively by hand, for example by connecting to it through telnet. It is also suitable for scripting BruteFIR, or using it as a means of inter-process communication if BruteFIR is used as the convolution engine for another program.

The context sensitive port field specifies which interface will be used as follows:

The CLI does not have much terminal functionality to speak of, and is thus a bit cumbersome to use interactively. It reads a whole line at a time, and can interpret backspace, but that is about it. There is no echo functionality so the connecting client needs to handle that (telnet does, and terminal software for serial lines usually have a function to enable local echo).

Instead of specifying a port, one can specify a string of commands, which will be run in a loop as a script. Example: "cli" { script: "cfc 0 0;; sleep 10;; cfc 0 1;; sleep 10"; }. The script may span several lines. Each line is carried out atomically (this is also true for command line mode), so if there are several commands on a single line, separated with semicolon, they will be performed atomically (an atomic set of statements). The exception is when an empty statement is put in the line (just a semicolon), like in the script example, this will work as a line break, and thus separate atomic sets of statements.

A typical use for atomic set of statements is to change filter coefficients and volume at the same time.

The sleep function in the CLI allows for sleeping in seconds, milliseconds or blocks. One block is exactly the filter length in samples, and if partitioned, it is the length of the partition. Block sleep can only be used in script mode.

When in script mode, the first atomic statements will be executed just before the first block is processed, then the block is processed (and sent to the output), and then the next set of atomic statements is run. That is, each set of atomic statements is performed before the corresponding block is processed. The next atomic statement set is not performed until the next block is about to be processed.

The block sleep command (only works in script mode) works such that the sleep is commenced at the next block. The statement sleep b1; will thus cause the next block to be skipped. Note that since one block passes for each atomic statement set, a single line with only sleep b1; will skip two blocks, not one, since one block is consumed when parsing the sleep command, and the other is skipped by the sleep duration. That is to skip only one block, either use sleep b0; alone, or use sleep b1 as the last statement together with other statements in an atomic statement set (recommended).

Sleep in seconds and milliseconds will start the timer when the command is issued (at the start of the block if in a script), and continue with the next command after at least the given time has passed. If run in a script, the timer is polled at the start of each block, and the next command is then executed at the start of the first block where the timer has expired.

If several sleep commands are executed in the same atomic statement set in a script, only the last will take effect, and will be executed only when all other commands in the set have been processed. To avoid confusion, it is thus recommended to employ sleep commands either alone, or as the last in the atomic statement set.

If the field echo is set to true, the CLI commands will be echoed back to the user (the whole line at a time). This is off per default.

When connected and you type "help" at the prompt, you will get the following output:

Commands:

lf -- list filters.
lc -- list coefficient sets.
li -- list inputs.
lo -- list outputs.
lm -- list modules.

cfoa -- change filter output attenuation.
        cfoa <filter> <output> <attenuation|Mmultiplier>
cfia -- change filter input attenuation.
        cfia <filter> <input> <attenuation|Mmultiplier>
cffa -- change filter filter-input attenuation.
        cffa <filter> <filter-input> <attenuation|Mmultiplier>
cfc  -- change filter coefficients.
        cfc <filter> <coeff>
cfd  -- change filter delay. (may truncate coeffs!)
        cfd <filter> <delay blocks>
cod  -- change output delay.
        cod <output> <delay> [<subdelay>]
cid  -- change input delay.
        cid <input> <delay> [<subdelay>]
tmo  -- toggle mute output.
        tmo <output>
tmi  -- toggle mute input.
        tmi <input>
imc  -- issue input module command.
        imc <index> <command>
omc  -- issue output module command.
        omc <index> <command>
lmc  -- issue logic module command.
        lmc <module> <command>

sleep -- sleep for the given number of seconds [and ms], or blocks.
         sleep 10 (sleep 10 seconds).
	 sleep b10 (sleep 10 blocks).
	 sleep 0 300 (sleep 300 milliseconds).
abort -- terminate immediately.
tp    -- toggle prompt.
ppk   -- print peak info, channels/samples/max dB.
rpk   -- reset peak meters.
upk   -- toggle print peak info on changes.
rti   -- print current realtime index.
quit  -- close connection.
help  -- print this text.

Notes:

- When entering several commands on a single line,
  separate them with semicolons (;).
- Inputs/outputs/filters can be given as index
  numbers or as strings between quotes ("").

Most commands are simple and don't need to be further explained. Naturally, any changes will lag behind as long as the I/O delay is. The exception is the mute and change delay commands, they will lag behind as long as the period size of the sound card is, which most often is smaller than the program's total I/O delay. However, when there is a virtual channel mapping, the mute and delay will be lagged as well.

The imc, omc and lmc commands are used to give commands to I/O modules and logic modules in run-time. To find out which modules that are loaded and which indexes they have, use the command lm. Not all modules support run-time commands though.

Changing attenuations with cffa, cfia and cfoa can be done with dB numbers or simply by giving a multiplier, which then is prefixed with m, like this cfoa 0 0 m-0.5. Changing the attenuation with dB will not change the sign of the current multiplier.

Run-time equalizer

The equalizer logic module takes control over one or more coefficient sets, and renders equalizer filters to them, as specified by the user. This can be done in the initial configuration, and also updated in runtime, through the CLI.

The startup configuration can look like this:

  "eq"  {
		debug_dump_filter: "/tmp/rendered-%d";
		{
			coeff: 0, 1;
			#bands: "ISO octave";
			#bands: "ISO 1/3 octave";
			bands: 100, 200, 500;
			magnitude: 20/-3.2, 100/8.5;
			phase: 20/0, 100/180;
		};
		{
			coeff: "eq-1";
			bands: "ISO octave";
			magnitude: 31.5/-3.2, 125/8.5;
			phase: 31.5/3.2;
		};
	};

If you want to analyze the rendered filters, the debug_dump_filter setting specifies a file name where the rendered coefficients will be written. It must contain %d, which will be replaced by the coefficient index. Then follows equalizers. Each specify which coefficient index (or name) it should render the equalizer filter to. These must be allocated and must be stored in shared memory, for example like this:

coeff 0 {
        filename: "dirac pulse";
	shared_mem: true;
	blocks: 4;
};

The dirac pulse will be replaced by the rendered filter. Each equalizer has a set of frequency bands (max 128), they can be manually specified, or use the ISO octave band presets. Optionally, magnitude (in dB) and phase (in degrees) settings can be specified. The frequency value must then match one of the given bands.

If you specify two filters, the rendering will be double-buffered, meaning that the eq module will keep one coefficient active in the filter(s), and render to the other, and switch when ready. This means that there is no risk of playing an incomplete equalizer, which can cause some noise (usually in the form of a beep), thus it is recommended to use double-buffered mode if the equalizer will be altered in runtime. In the filter configuration and when referring to the equalizer in the CLI, the first of the two coefficients should then be used.

In run-time, equalizers can be modified through the CLI. An example: lmc eq 0 mag 20/-10, 4000/10 will set the magnitude to -10 dB at 20 Hz and +10 dB at 4000 Hz for equalizer for coefficient 0. Instead of mag, phase can be given. The command lmc eq "eq-1" info will list the current settings for the equalizer stored in the coefficient called "eq-1".

The more heavily loaded the computer is by convolution, the longer time it will take to render the new equalizer. If the coefficient set it renders to is very short, and the magnitude and phase response is very detailed (sharp edges etc) it will not be able to adapt to it fully.

Writing your own logic module

This will probably never be documented. Just look at the source code and see how it is done.

Tuning

Realtime index

The program calculates a realtime index which can be shown through the CLI, or will be printed periodically to the screen if the show_progress flag is set. The realtime index is a floating point value. When it is 1.0, 100% of the available processing power must be used at all times to be able to achieve realtime performance. If it is larger than 1.0, it means that with the current configuration, BruteFIR will not manage realtime performance.

If your configuration is too demanding for realtime, you should shorten the filters (or remove channels) until the realtime index is very close below 1.0, perhaps 0.95. This way you make full use of your computer. However, if you have multiple processors, it is not as simple. The realtime index will show how much is needed from the most loaded processor, but leaves a proper load balancing to you. So, devise your configuration carefully if you have multiple processors. The number of input and output channels and the filter length is what steals processor time. The number of filters, dither, delay, mixing and attenuation is very cheap in comparison.

When testing with realtime indexes above 1.0, inputs and outputs must of course be files. For performance testing, you could use "/dev/zero" for input and "/dev/null" for output. Also note that it takes some time for the index to stabilize.

The realtime index typically matches the processor load, if running with a sound card. However, if input poll mode is employed, real time index can be considerably lower than the processor load, since input polling is performed in the spare processor time.

FFTW wisdom

When BruteFIR runs for the first time, it will generate FFTW wisdom, which takes some time. FFTW wisdom is benchmarking information which tells the FFTW library how to run FFT the most efficient way on the given computer. Since the information is hardware and binary dependent, the file should be removed when hardware is changed/upgraded or BruteFIR is recompiled. A wisdom file that was not generated on the hardware BruteFIR is running on, or not by the binary that is run, may yield sub-optimal performance. When BruteFIR is calculating FFTW wisdom, the computer should not be running other processor-demanding software.

Naturally, it is very important that FFTW was compiled with the correct optimization flags to achieve optimal performance.

The wisdom is loaded used and updated each time BruteFIR is run. Each time BruteFIR uses a partition length it has not used before (and thus there is no wisdom available), it will need to generate new wisdom, which will take some time.

Low latency patch

If you are going to use BruteFIR in realtime, it is strongly recommended that you patch your kernel to reduce latency, or else the program may fail to keep up when a cron-job or a screen saver starts. The Linux kernel's latency problems has been reduced in the 2.4 kernel, but it is still not satisfactory without the patch applied.

For the 2.4 kernel, Andrew Morton's low latency patches are recommended [24].

The new 2.6 kernel does have a low-latency setting in the kernel configuration, which should be activated. Although no extra patches should be required for a 2.6 kernel in the normal case, there still are low-latency patches out there for really demanding situations.

Sample clock problems

If you use digital input and output, as I would recommend, you may get problems if the sound card is not configured properly. It is very important that the input and output sample clock use the same clock as reference. Or else, micro-differences between the input and output sample clock will make BruteFIR's IO buffers to slide apart, and eventually make the program stop. Usually there is an option to set the digital sound card's sample clock to 'slave'.

If you have analog input or output or both, you cannot get this problem (unless you use several different sound cards, then it will fail due to differences in clocking).

Digital sound cards that work in slave mode allows that the sample clock is changed in runtime. Usually, this is not what one want for BruteFIR, since the filters are designed for only one sample rate. Therefore BruteFIR can be configured to exit if it detects a sample clock different from the one mentioned in the configuration file.

Double precision or not

BruteFIR can run with 32 or 64 bit floating point internal resolution. Traditionally, 32 bit is called "single precision", and 64 bit "double precision". The float_bits setting is used to change resolution. Per default, BruteFIR runs in 32 bit.

Depending on processor used, you may lose assembler optimizations when running in 64 bit. Also, memory bandwidth used by BruteFIR will naturally double, which reduces performance. Thus, although 64 bit and 32 bit operations are generally equally fast, due to increased memory usage, BruteFIR needs 30 - 50% extra processor time, not counting additional effects if assembler optimizations are lost.

When do you need double precision? If you are picky enough on sound quality that you would require dither on 24 bit output, then you need double precision. For most audio work however, 32 bit precision is enough.

Choosing number of partitions

There is no formula for calculating the optimal number of partitions to get maximum throughput. It varies between hardware platforms, so trial and error is the only working method. More than about 16 partitions are generally not recommended though.

If you are using partitioned filters to reduce the I/O-delay for realtime filtering, make sure that it does not get too low. If I/O-delay is too low, the sound card can get overflowed/underflowed causing the program to exit with a broken pipe signal.

Realtime issues

Extreme low latencies, such as 64 sample partitions, will probably not work for long periods of time, even with a low latency patched kernel.

The processor cannot be loaded more than typically 85% for safe realtime operation. For very low latencies, this number could go down to 70%. The reason for this is that computing time will vary somewhat, that is how modern computers work, and to be able to cope with the maximum computing times, some spare processor time must be left.

Request features

Which new features that get into BruteFIR are decided by its users. If you need a feature, let me know, and I'll see what I can do (and want to do).

References

  1. Advanced Micro Devices, Inc. website. http://www.amd.com.
    Makers of the Athlon processor.
  2. A. Bagnara, J. Kysela et al ALSA, Advanced Linux Sound Architecture. http://www.alsa-project.org.
    A powerful and flexible audio applications API developed primarily for Linux.
  3. D.J. Bernstein djbfft. http://cr.yp.to/djbfft.html.
    A compact FFT library implemented in C, faster than most, including FFTW.
  4. J. M. P. Borallo, M. G. Otero On the implementation of a partitioned block frequency domain adaptive filter (PBFDAF) for long acoustic echo cancellation. Elsevier Signal Processing, vol 27 No 3 June 1992, page 301-315.
  5. J. W. Cooley, J. W. Tukey An Algorithm for the Machine Computation of the Complex Fourier Series. Mathematics of Computation, Vol. 19, April 1965, pp. 297-301.
  6. Free Software Foundation GNU General Public License. http://www.gnu.org/copyleft.
    One of the most common free software licenses. Its main purpose is to make sure that the software is kept free and open source.
  7. M. Frigo, S. G. Johnson FFTW. http://www.fftw.org.
    A fast and full-featured FFT library implemented in C. Called "Fastest Fourier Transform in the West".
  8. M. Frigo, S. G. Johnson FFTW: An Adaptive Software Architecture for the FFT. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Vol. 3, 1998, pp. 1381-1384.
  9. GNU Compiler Collection. http://gcc.gnu.org.
    A free software multi-platform compiler supporting the programming languages C, C++, Objective C and Fortran.
  10. Intel Corporation website. http://www.intel.com.
    Makers of the Pentium processor.
  11. Linux Online website. http://www.linux.org.
    Linux is a free Unix-type operating system originally created by Linus Torvalds with the assistance of developers around the world.
  12. P. C. W. Sommen Adaptive Filtering Methods. Ph. D. dissertation, Tech. Univ. Eindhoven, Eindhoven, The Netherlands, 1992.
  13. P. C. W. Sommen Partitioned frequency domain adaptive filters. Proc Asilomar Conf. Signals, Systems and Computers, 1989, pp. 676 - 681.
  14. J. S. Soo, K. K. Pang A new structure for block FIR adaptive digital filters. Proc. IREECON, vol 38, pp. 364 - 367, 1987.
  15. J. S. Soo, K. K. Pang Multidelay block frequency adaptive filter, IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-38, No. 2, February 1990.
  16. T. G. Stockham Jr. High-speed convolution and correlation. AFIPS Proc. 1966 Spring Joint Computer Conf., Vol 28, Spartan Books, 1966, pp. 229 - 233.
  17. B. D. Kulp Digital Equalization using Fouring Transform Techniques. AES preprint 2694, 1988.
  18. A. Torger NWFIIR Audio Tools. http://www.ludd.ltu.se/~torger/filter.html.
    A set of tools for measuring and processing impulse responses, room equalisation being the target application.
  19. Intel Signal Processing Library. http://developer.intel.com/software/products/perflib/spl/index.htm.
  20. STREAM: Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream/.
    A portable and simple memory benchmark program.
  21. RME Audio. http://www.rme-audio.com.
  22. D. Sbragion Digital Room Correction. http://freshmeat.net/projects/drc.
    A program which generates room correction FIR filters to be used in HiFi systems.
  23. P. Davis et al JACK audio server. http://jackit.sourceforge.net/.
    A low-latency audio server, written primarily for the GNU/Linux operating system.
  24. A. Morton Linux Scheduling Latency. http://www.zip.com.au/~akpm/linux/schedlat.html.
    A collection of notes and tools related to an effort to decrease the typical scheduling latency of the 2.4.x kernel.
  25. Open Sound System. http://www.opensound.com.
    A highly portable sound card API available on a large variation of (Unix) platforms.





© Copyright 2001 – 2006, 2009 – 2016, 2020 Anders Torger