BarbaBatch Technical Documents

Index


On samplerate conversion quality


Click if you want to hear

There are two different main approaches that are in use in soundfile converters at the moment:

1. Using a sinc filter to perform the conversion
2. using a linear interpolation

Linear interpolation will give you many unwanted side effects, but it is fast.
It is used in Gallery software (samplesearch, gearbox) , and Logic audio.

Sinc filtering (or polyphase) is the only theoretically correct way to go. It can produce high quality samplerate conversion when applied with enough precision. When not enough precision is exercised, side effects become audible. Generally, the longer the impulse response of the sinc filter the better the quality (the higher the precision) of the sample rate converter. BarbaBatch uses the most precise algorithm available.

The polyphase conversion


The main part of a good sample rate converter is its filter (actually, one could view a good SRC as nothing but a filter). The filter is needed for the following reason: two representations of a signal, having different sample rates, cannot contain exactly the same information. The maximum frequency that can be represented with a particular sample rate, the so-called Nyquist frequency, is exactly half the sample rate. An Analog-to-Digital converter (ADC) uses a filter to remove everything from the analog input above the Nyquist frequency, because otherwise a so-called aliasing effect can occur: frequencies above the Nyquist are translated into other frequencies (aliases) below the Nyquist.

So when converting to a lower sample rate, everything above the new Nyquist frequency should be removed, to prevent aliasing effects.
Converting to a higher rate, the highest part of the new bandwidth should not contain anything, because the old (lower) rate could not represent signals above its Nyquist. But without filtering, signals below the old Nyquist will be mirrored into this new part of the spectrum. This is comparable to the reason why also a Digital-to Analog converter (DAC) uses a filter: the digital, step-like signal must be smoothed.

Converting up or down, the same filter response is needed: ideally a 'brick-wall' filter at the Nyquist of the lowest sample rate. With such a filter, conversion upward will conserve the whole signal, and will not introduce anything irregular. Converting down will conserve everything that CAN be conserved in the new (lower) rate, and will effectively remove anything else, preventing aliasing effects.

The polyphase method of sample rate conversion is a good approximation of a brick-wall filter, implemented for all possible phase relationships between samples in the old and the new signal (hence the name polyphase).

The linear conversion


The linear SRC does not use a filter. It just calculates new samples by linear interpolation between the nearest two samples in the original file. In some cases this isn't as bad as it is theoretically. Incidentally, it can even sound better than the polyphase method: with noise-like signals, played back using a relatively low sample rate, when the Nyquist is in the audible frequency range, sometimes a certain amount of aliasing can be perceived as a welcome high-boost. In general, what is 'best' depends on the program material, on the playback hardware and of course, on the need for speed.

Back To Index

 


Dithering

Like with sample rate conversion, the theory of requantization using dither; is best explained by looking at what happens within an ADC.
A good ADC adds a small amount of noise to the analog signal before quantizing. This linearizes the quantization process and increases the dynamic range of the resulting digital signal.

Let's take an example to get an understanding of this process.
Assume a very low-level sine wave is digitized. Because it's so soft, it will be represented in the least significant bit (LSB) only. So the digital signal contains a square wave!
This is the origin of quantization noise, which in fact isn't noise at all, but higher 'harmonics' introduced by the steps of the digital representation. (In the case of a sinewave at the input, these are really harmonics, but with more complex signals it will give rise to a more or less noisy sound.)

What happens if we add some noise to the input?
The digital signal will also be noisy, of course, but the important thing is that this noise will contain information about the original waveform: when the sine (or any other signal) is at a peak, the probability of a high digital level is greater than when the signal is low.
Even a signal that is below the LSB can be represented this way, so the dynamic range is increased.

Whenever the wordsize of a signal is reduced (the signal is said to be requantized), e.g. from 16 to 8 bits, the original dither is lost with the least significant bits, and again a staircase-like signal will result.
In the requantizing process the signal should be redithered, with a noise level appropriate to the new wordsize.

Even when the final wordsize is the same as the original, redithering can be necessary.
Requantizing occurs in most digital processes, because most calculations are done with a higher precision than the original wordsize. (Common resolutions are 24 or 32 bits).

If 'Use dither' is off, quantization is can be performed by 'round to nearest'.
This is much better than truncation, because very-low-amplitude noise around zero, and below the least significant bit of the new wordsize, will be removed by the rounding process. Truncation would result in a 1-bit noise signal.

Although it is theoretically better, the question whether redithering is necessary (or desirable) depends on the program material and the purpose of the conversion.

Dithering method

In the above, we described adding noise to the input as a method of dithering.
Apart from varying the frequency spectrum of the noise, which of course changes the color of the noise in the resulting signal, there are many more methods of dithering with quite different results. As could be expected, the better methods involve more elaborate calculations.

If you want to read more about dithering methods, then the following book might be interesting for you:

The Art Of Digital Audio, John Watkinson, Focal Press 1988

Back To Index
-

 


Brief overview on the origin and characteristics of several of the BarbaBatch soundfile types

 

AIFF

The Audio Interchange File Format (AIFF) is used on Apple, SGI, and other computers, and was completed by Apple Computer, Inc. It supports a compression scheme, called AIFC. The standard file name extension for this soundfile type is ".aif(f)"

 

AIFC IMA ADPCM 4:1

A goodie for Macromedia content developers. These files run in projector files saving a lot of space. The 16 bit words are crammed into 4 bits while audio suffers remarkably little. You need Sound Manager 3.2 for this since BarbaBatch makes use of operating system encoders.

 

Amiga IFF/8SVX

This type is used on Amiga computers, and was originally developed by Electronic Arts. It supports only 8 bits sample data. The standard file name extension for this soundfile type is ".iff"

 

AVR

This type originates from Audio Visual Research and is mainly used on Atari computers. It doesn't support compression schemes.

 

Dialogic telephony filetypes

The Dialogic filetypes are standards for applications through telephone lines such as voice response systems. BarbaBatch 2.2 supports four different Dialogic filetypes: Dialogic ADPCM, Dialogic a-law, Dialogic µ-law & Dialogic PCM.Only two samplerates are allowed in these filetypes: 6000 Hz and 8000 Hz.Dialogic ADPCM files are the smallest and offer the best quality of the four.

 

Ensoniq Paris files

The native filetype of the Ensoniq Paris harddisk editing system is also supported by BarbaBatch.

 

Microsoft ADPCM

The Microsoft ADPCM file is a WAV file that contains data that is compressed using a technique called Adaptive Pulse Code Modulation. This method packs every 16 bit or 8 bit sample into 4 bits while maintaining an extremely high fidelity. Support for real-time playback is in the Windows 95 operating system.

 

MPEG1 layer I and layer II

MPEG audio is a standard for highly compressed audio data, developed by the Motion Pictures Expert Group of the International Organisation for Standardisation (ISO). The high compression ratio is obtained by analyzing the data in a way that resembles the human auditory system, and then encoding the more important parts with higher accuracy than signals for which our ears are less sensitive, or which would be masked by other signals anyway.

Different bit rates can be used, depending on the purpose, e.g. 192 kbit/s for HiFi stereo and 32 kbit/s for low quality internet audio. There exist three so-called Layers (different standards). Layer II yields better audio than Layer I , when using the same bit rate, therefore there are more low bit rates available in Layer II, and more higher rates in Layer I. Layer III is designed for low bitrates

 

NeXT/Sun linear (.snd)

This type is mainly used on NeXT, Sun and DEC computers. It doesn't support compression schemes. The standard file name extension for this soundfile type is ".snd"

 

NeXT/Sun µ-law (.au)

This type is mainly used on NeXT, Sun and DEC computers and has become a standard for audio on the Internet. µ-Law is pronounced as "mu-law". µ-Law means it uses a simple compression scheme to get better results in the dynamic range (8 bits µ-law sounds as approx. 12 bit linear). The standard file name extension for this soundfile type is ".au"

 

NeXT/Sun a-law files (.au)

Earlier versions of BarbaBatch already supported NeXT/Sun µ-law files. a-law is the other (less common and most often referred to as 'European') flavour of '.AU'-files.

 

QuickTime Movie

This type is mainly used on PC's and Apple Macintosh computers. BarbaBatch cannot take QuickTime soundfiles as input. The standard file name extension for this soundfile type is ".mov"

 

Real Audio 20.0 t/m 5.0 and Real G2

These types are mainly used for streaming audio on the internet. Depending on your bandwith requirements, you can choose from many RealAudio encoding algorithms, each with a different fixed data rate.

 

Sound Designer I

This type is used on Apple Macintosh computers and was created by Digidesign. It only supports mono soundfiles. There is no standard file name extension for this soundfile type.

 

Sound Designer II

This type is used on Apple Macintosh computers and was created by Digidesign. It is one of the most popular pro-audio soundfile types on the Mac. There is no standard file name extension for this soundfile type.

Voc

This type was created by Creative Labs for the PC Soundblaster. It supports 8 bits sample data only. The standard file name extension for this soundfile type is ".voc"

 

Voc 16 bit

This type is used by the PC Soundblaster to support 16 bits sample data. (It supports 16 bits sample data only.) The standard file name extension for this soundfile type is ".voc"

 

Wave

This type is mainly used on PC's and was created by Microsoft. The standard file name extension for this soundfile type is ".wav"

Back To Index