DSP Filtering¶

DSP filters perform functions such as low-pass or peaking filters. Filters work in the time domain, and do not transform data from the time to frequency domains; look for sc_dsp_transforms for that.

There are shelves of textbooks that cover filtering; a two-page summary is necessarily incomplete. Typical characterisations of filters include:

Sample rate. Input and output sample rates may be identical, or the filter may perform some form of sample-rate conversion (eg, 44.1 to 48 KHz audio)
Number of channels. The number of channels on which identical filters are performed. For example, two audio channels that have identical treble and bass settings.
Precision. The number of bits before and after the binary point that contain valid data. Typical precisions are 16 bits or 24 bits for audio signals, or XXX for motor control.

Inside the filter, data is accumulated with a certain precision. This may be higher precision than the input operation in order to avoid accumulative rounding errors.
Overflow. What to do on overflow: saturate, wrap around, or trap.
Rounding. After a filter round there will be some bits that are discarded, and some for of rounding needs to take place. Typical ones include dithering (adding triangular noise in the range [-0.5 .. 0.5]), or ordinary rounding.
Type of filter. The two main classes are Finite Impulse Response (FIR) and Infinite Impulse Response (IIR). For FIR filters the number of “taps” needs to be specified (that is, the number of coefficients to filter with).
The filter coefficients. This is often where a company adds value. There are many methods for computing coefficients for, for example, low-shelf, or bandpass filters, but all have imperfect responses. A domain expert can create filter-coefficients that have a good response for a particular domain.

As a rule of thumb, computational requirements are linear in the sample rate and linear in the number of channels. Double either, and the computational requirements doubles. Double both and the computations go up four-fold.

Most filters are based around 32-bit input data, with a 64-bit accumulator. The coefficients should be worked out so that there is sufficient “headroom” in the accumulator and in the data. For example, if all data samples are between -1 and +1 represented as a 24-bit signed number, then there are 7-bits (32 - 24 data bits - 1 sign bit) headroom. All numbers are represented as fixed-point numbers, and the precision is defined by the filter.

Typically, overflow leads to saturation, but this adds computational complexity, and may be removed if not desired. Numbers are typically rounded to the nearest value, with 0.5 being rounded up.

There are at present three modules, an FIR filter, a Biquad filter and an Asynchronous Sample Rate Converter. The Biquad the latter implements an IIR filter. Scripts are provided to compute coefficients, but it is worth noting that these are textbook computations, and that better coefficients can be designed by domain experts.

module_cascading_biquad¶

This module provides a function that filters a data stream through a series of N Biquads. The input data is in 8.24 format, a sign bit, seven bits before the binary point, and 24 bits behind the binary point. Assuming a single 50 MIPS thread:

Functionality provided		Resources required			Status
Channels	Filters	Thread cycles	Max rate	Memory	Status
1	1	53	943 KHz	? KB	Implemented
2	1	106	471 KHz	? KB	Implemented
1	2 in series	77	649 KHz	? KB	Implemented
M	N in series	M (29+24 N)	50/...	? KB	Implemented

Biquads that are executed in parallel can be implemented at a similar cost, but lead to interesting phase shifts.

module_fir¶

The FIR module implements a function that performs a single FIR. If multiple FIRs are to be applied, the coefficients can be summed at compile time. Its performance depends on the number of taps and the number of logical cores that are deployed. There are two versions of the FIR: one is simple, not efficient, but works on any lenght filter; the other is highly optimised but it requires the filter length to be a mulitple of 12. The numbers are below are for a single channel - multiple channels scale linearly. All times are measured on an empty 400 MHz processor, ie, this assumes there are 1, 2, 3, or 4 threads that have 100 MIPS each.

General and simple version:

Functionality provided		Resources required			Status
Threads	Taps	Thread cycles	Max rate	Memory	Status
1	1	43	1.1 MHz	? KB	Implemented, TBC
1	2	43	943 KHz	? KB	Implemented, TBC
1	N	(33+10 N)	50/...	? KB	Implemented, TBC

Efficient version, requires number of taps to be a multiple of 12:

Functionality provided		Resources required			Status
Threads	Taps	Time in ns	Max rate	Memory	Status
1	48	2760	362 KHz	? KB	Implemented, TBC
4	48	1160	862 KHz	? KB	Implemented, TBC
1	288	13950	71 KHz	? KB	Implemented, TBC
4	288	3940	253 KHz	? KB	Implemented, TBC
1	large N	46.625 N	21/N MHz	? KB	Implemented, TBC
4	large N	11.583 N	86/N MHz	? KB	Implemented, TBC

(Note: it may be possible to cleverly jump into the loop and get an efficient version that enables any number of coefficients, this is To Do)

module_asrc¶

The ASRC module implements a function that performs a Asynchronous Sample Rate Conversion. The module delays the signal by five samples, or around 100 us at 48 KHz. The noise floor is below -90db, with some noise near the main frequency at -60db. The maximum sample rate given assumes a single 50 MIPS thread solely dedicated to this task. Performance figures given here are absolute worst case - see the performance discussion in the API section (Performance) for a more detailed explanation.

Functionality provided			Resources required			Status
Chans	Order	Upsampling	Thread cycles	Max rate	Memory	Status
1	8	128	210	238 kHz	2630 B	Implemented
2	8	128	420	119 kHz	2710 B	Implemented
4	8	128	840	56 kHz	2870 B	Implemented
N	8	128	210 N	238/N	2550+80N	Implemented
1	8	64	210	238 kHz	1630 B	Implemented
2	8	64	420	119 kHz	1710 B	Implemented
4	8	64	840	56 kHz	1870 B	Implemented
N	8	64	210 N	238/N	1550+80N	Implemented
1	4	256	170	294 kHz	2630 B	Implemented
2	4	256	340	147 kHz	2710 B	Implemented
4	4	256	680	73 kHz	2870 B	Implemented
N	4	256	170 N	294/N	2550+80N	Implemented

This implementation optimises per-sample latency. If throughput is important then a multi-channel version can be implemented that simultaneously operates on all channels, reducing overhead.

DSP Filtering¶

module_cascading_biquad¶

module_fir¶

module_asrc¶

Documentation

Search