CsoundManual - Previous section - Contents - Index - Next

The Soundfile Utility Programs

AUTHOR: Dan Ellis, dpwe@media-lab.media.mit.edu

The Csound Utilities are soundfile preprocessing programs that return information on a soundfile or create some analyzed version of it for use by certain Csound generators. Though different in goals, they share a common soundfile access mechanism and are describable as a set. The Soundfile Utility programs can be invoked in two equivalent forms:

          csound -U utilname  [flags]  filenames  . . .
          utilname  [flags]  filenames  . . .

In the first, the utility is invoked as part of the Csound executable, while in the second it is called as a standalone program. The second is smaller by about 200K, but the two forms are identical in function. The first is convenient in not requiring the maintenance and use of several independent programsÑone program does all. When using this form, a -U flag detected in the command line will cause all subsequent flags and names to be interpreted as per the named utility; i.e. Csound generation will not occur, and the program will terminate at the end of utility processing.

Directories. Filenames are of two kinds, source soundfiles and resultant analysis files. Each has a hierarchical naming convention, influenced by the directory from which the Utility is invoked. Source soundfiles with a full pathname (begins with dot (.), slash (/), or for ThinkC includes a colon (:)), will be sought only in the directory named. Soundfiles without a path will be sought first in the current directory, then in the directory named by the SSDIR environment variable (if defined), then in the directory named by SFDIR. An unsuccessful search will return a "cannot open" error.

Resultant analysis files are written into the current directory, or to the named directory if a path is included. It is tidy to keep analysis files separate from sound files, usually in a separate directory known to the SADIR variable. Analysis is conveniently run from within the SADIR directory. When an analysis file is later invoked by a Csound generator ( adsyn, lpread, pvoc) it is sought first in the current directory, then in the directory defined by SADIR.

Soundfile Formats. Csound can read and write audio files in a variety of formats. Write formats are described by Csound command flags. On reading, the format is determined from the soundfile header, and the data automatically converted to floating-point during internal processing. When Csound is installed on a host with local soundfile conventions (SUN, NeXT, Macintosh) it may conditionally include local packaging code which creates soundfiles not portable to other hosts. However, Csound on any host can always generate and read AIFF files, which is thus a portable format. Sampled sound libraries are typically AIFF, and the variable SSDIR usually points to a directory of such sounds. If defined, the SSDIR directory is in the search path during soundfile access. Note that some AIFF sampled sounds have an audio looping feature for sustained performance; the analysis programs will traverse any loop segment once only.

For soundfiles without headers, an SR value may be supplied by a command flag (or its default). If both header and flag are present, the flag value will over-ride.

When sound is accessed by the audio Analysis programs , only a single channel is read. For stereo or quad files, the default is channel one; alternate channels may be obtained on request.


CsoundManual - Top of this section - Previous - Contents - Index - Next

SNDINFO

SNDINFO - get basic information about one or more soundfiles.

     csound -U sndinfo  soundfilenames  . . .
or   sndinfo   soundfilenames  . . .

sndinfo will attempt to find each named file, open it for reading, read in the soundfile header, then print a report on the basic information it finds. The order of search across soundfile directories is as described above. If the file is of type AIFF, some further details are listed first.

EXAMPLE

     csound -U sndinfo  test  Bosendorfer/"BOSEN mf A0 st"  foo  foo2

where the environment variables SFDIR = /u/bv/sound, and SSDIR = /so/bv/Samples, might produce the following:

     util  SNDINFO:      
     /u/bv/sound/test:
           srate 22050, monaural, 16 bit shorts, 1.10 seconds
           headersiz 1024, datasiz 48500  (24250 sample frames)

    /so/bv/Samples/Bosendorfer/BOSEN mf A0 st:  AIFF, 197586 stereo samples, base Frq 261.6 (midi 60), sustnLp: mode 1, 121642 to 197454, relesLp: mode 0
     AIFF soundfile, looping with modes 1, 0
     srate 44100, stereo, 16 bit shorts, 4.48 seconds
     headersiz  402, datasiz 790344  (197586 sample frames)

     /u/bv/sound/foo:
           no recognizable soundfile header

     /u/bv/sound/foo2:
            couldn't find


CsoundManual - Top of this section - Previous - Contents - Index - Next

HETRO

HETRO - hetrodyne filter analysis for the Csound adsyn generator.

     csound -U hetro  [flags]  infilename  outfilename
or   hetro  [flags]  infilename  outfilename

hetro takes an input soundfile, decomposes it into component sinusoids, and outputs a description of the components in the form of breakpoint amplitude and frequency tracks. Analysis is conditioned by the control flags below. A space is optional between flag and value.

-s<srate> sampling rate of the audio input file. This will over-ride the srate of the soundfile header, which otherwise applies. If neither is present, the default is 10000. Note that for adsyn synthesis the srate of the source file and the generating orchestra need not be the same.

-c<channel> channel number sought. The default is 1.

-b<begin> beginning time (in seconds) of the audio segment to be analyzed. The default is 0.0

-d<duration> duration (in seconds) of the audio segment to be analyzed. The default of 0.0 means to the end of the file. Maximum length is 32.766 seconds.

-f<begfreq> estimated starting frequency of the fundamental, necessary to initialize the filter analysis. The default is 100 (cps).

-h<partials> number of harmonic partials sought in the audio file. Default is 10, maximum 50.

-M<maxamp> maximum amplitude summed across all concurrent tracks. The default is 32767.

-m<minamp> amplitude threshold below which a single pair of amplitude/frequency tracks is considered dormant and will not contribute to output summation. Typical values: 128 (48 db down from full scale), 64 (54 db down), 32 (60 db down), 0 (no thresholding). The default threshold is 64 (54 db down).

-n<brkpts> initial number of analysis breakpoints in each amplitude and frequency track, prior to thresholding (-m) and linear breakpoint consolidation. The initial points are spread evenly over the duration. The default is 256.

-l<cutfreq> substitute a 3rd order Butterworth low-pass filter with cutoff frequency cutfreq (in cps), in place of the default averaging comb filter. The default is 0 (don't use).

EXAMPLE

hetro -s44100 -b.5 -d2.5 -h16 -M24000 audiofile.test adsynfile7

This will analyze 2.5 seconds of channel 1 of a file "audiofile.test", recorded at 44.1 KHz, beginning .5 seconds from the start, and place the result in a file "adsynfile7". We request just the first 16 harmonics of the sound, with 256 initial breakpoint values per amplitude or frequency track, and a peak summation amplitude of 24000. The fundamental is estimated to begin at 100 Hz. Amplitude thresholding is at 54 db down.

The Butterworth LPF is not enabled.

FILE FORMAT

The output file contains time-sequenced amplitude and frequency values for each partial of an additive complex audio source. The information is in the form of breakpoints (time, value, time, value, ....) using 16-bit integers in the range 0 - 32767. Time is given in milliseconds, and frequency in Hertz (cps). The breakpoint data is exclusively non-negative, and the values -1 and -2 uniquely signify the start of new amplitude and frequency tracks. A track is terminated by the value 32767. Before being written out, each track is data-reduced by amplitude thresholding and linear breakpoint consolidation.

A component partial is defined by two breakpoint sets: an amplitude set, and a frequency set. Within a composite file these sets may appear in any order (amplitude, frequency, amplitude ....; or amplitude, amplitude..., then frequency, frequency,...). During adsyn resynthesis the sets are automatically paired (amplitude, frequency) from the order in which they were found. There should be an equal number of each.

A legal adsyn control file could have following format:

-1  time1  value1  ...   timeK  valueK   32767   ; amplitude breakpoints for partial 1
-2  time1  value1  ...   timeL   valueL  32767   ; frequency breakpoints for partial 1
-1  time1  value1  ...   timeM   valueM  32767   ; amplitude breakpoints for partial 2
-2  time1  value1  ...   timeN   valueN  32767   ; frequency breakpoints for partial 2
-2  time1  value1  ..........
-2  time1  value1  ..........                    ; pairable tracks for partials 3 and 4
-1  time1  value1  ..........
-1  time2  value1  ..........


CsoundManual - Top of this section - Previous - Contents - Index - Next

LPANAL

LPANAL - linear predictive analysis for the Csound lp generators

     csound -U lpanal   [flags]   infilename   outfilename
or   lpanal   [flags]   infilename   outfilename

lpanal performs both lpc and pitch-tracking analysis on a soundfile to produce a time-ordered sequence of frames of control information suitable for Csound resynthesis. Analysis is conditioned by the control flags below. A space is optional between the flag and its value.

-s<srate> sampling rate of the audio input file. This will over-ride the srate of the soundfile header, which otherwise applies. If neither is present, the default is 10000.

-c<channel> channel number sought. The default is 1.

-b<begin> beginning time (in seconds) of the audio segment to be analyzed. The default is 0.0

-d<duration> duration (in seconds) of the audio segment to be analyzed. The default of 0.0 means to the end of the file.

-p<npoles> number of poles for analysis. The default is 34, the maximum 50.

-h<hopsize> hop size (in samples) between frames of analysis. This determines the number of frames per second (srate / hopsize) in the output control file. The analysis framesize is hopsize * 2 samples. The default is 200, the maximum 500.

-C<string> text for the comments field of the lpfile header. The default is the null string.

-P<mincps> lowest frequency (in cps) of pitch tracking. -P0 means no pitch tracking.

-Q<maxcps> highest frequency (in cps) of pitch tracking. The narrower the pitch range, the more accurate the pitch estimate. The defaults are -P70, -Q200.

-v<verbosity> level of terminal information during analysis. 0 = none, 1 = verbose, 2 = debug. The default is 0.

EXAMPLE

          lpanal  -p26  -d2.5  -P100  -Q400  audiofile.test  lpfil22

will analyze the first 2.5 seconds of file "audiofile.test", producing srate/200 frames per second, each containing 26-pole filter coefficients and a pitch estimate between 100 and 400 Hertz. Output will be placed in "lpfil22" in the current directory.

FILE FORMAT

Output is a file comprised of an identifiable header plus a set of frames of floating point analysis data. Each frame contains four values of pitch and gain information, followed by npoles filter coefficients. The file is readable by Csound's lpread.

lpanal is an extensive modification of Paul Lanksy's lpc analysis programs.


CsoundManual - Top of this section - Previous - Contents - Index - Next

PVANAL

AUTHOR: Don Ellis, dpwe@media-lab.media.mit.edu

PVANAL - Fourier analysis for the Csound pvoc generator

csound -U pvanal [flags] infilename outfilename or pvanal [flags] infilename outfilename

pvanal converts a soundfile into a series of short-time Fourier transform (STFT) frames at regular timepoints (a frequency-domain representation). The output file can be used by pvoc to generate audio fragments based on the original sample, with timescales and pitches arbitrarily and dynamically modified. Analysis is conditioned by the flags below. A space is optional between the flag and its argument.

-s<srate> sampling rate of the audio input file. This will over-ride the srate of the soundfile header, which otherwise applies. If neither is present, the default is 10000.

-c<channel> channel number sought. The default is 1.

-b<begin> beginning time (in seconds) of the audio segment to be analyzed. The default is 0.0

-d<duration> duration (in seconds) of the audio segment to be analyzed. The default of 0.0 means to the end of the file.

-n<frmsiz> STFT frame size, the number of samples in each Fourier analysis frame. Must be a power of two, in the range 16 to 16384. For clean results, a frame must be larger than the longest pitch period of the sample. However, very long frames result in temporal "smearing" or reverberation. The bandwidth of each STFT bin is determined by sampling rate / frame size. The default framesize is the smallest power of two that corresponds to more than 20 milliseconds of the source (e.g. 256 points at 10 kHz sampling, giving a 25.6 ms frame).

-w<windfact> Window overlap factor. This controls the number of Fourier transform frames per second. Csound's pvoc will interpolate between frames, but too few frames will generate audible distortion; too many frames will result in a huge analysis file. A good compromise for windfact is 4, meaning that each input point occurs in 4 output windows, or conversely that the offset between successive STFT frames is framesize/4. The default value is 4. Do not use this flag with -h.

-h<hopsize> STFT frame offset. Converse of above, specifying the increment in samples between successive frames of analysis (see also lpanal). Do not use with -w.

EXAMPLE

          pvanal  asound  pvfile 

will analyze the soundfile "asound" using the default frmsiz and windfact to produce the file "pvfile" suitable for use with pvoc.

FILES

The output file has a special pvoc header containing details of the source audio file, the analysis frame rate and overlap. Frames of analysis data are stored as float, with the magnitude and 'frequency' (in Hz) for the first N/2 + 1 Fourier bins of each frame in turn. 'Frequency' encodes the phase increment in such a way that for strong harmonics it gives a good indication of the true frequency. For low amplitude or rapidly moving harmonics it is less meaningful.

DIAGNOSTICS

Prints total number of frames, and frames completed on every 20th.


CsoundManual - Top of this section - Previous - Contents - Index - Next section

CVANAL

CVANAL - Impulse Response Fourier Analysis for CONVOLVE operator

        csound -U cvanal [flags] infilename outfilename

cvanal converts a soundfile into a single Fourier transform frame. The output file can be used by the convolve operator to perform Fast Convolution between an input signal and the original impulse response. Analysis is conditioned by the flags below. A space is optional between the flag and its argument.

-s<rate> sampling rate of the audio input file. This will over-ride the srate of the soundfile header, which otherwise applies. If neither is present, the default is 10000.

-c<channel> channel number sought. If omitted, the default is to process all channels. If a value is given, only the selected channel will be processed.

-b<begin> beginning time (in seconds) of the audio segment to be analysed. The default is 0.0

-d<duration> duration (in seconds) of the audio segment to be analysed. The default of 0.0 means to the end of the file.

EXAMPLE:

                cvanal asound cvfile

will analyse the soundfile "asound" to produce the file "cvfile" for the use with CONVOLVE.

HINT: To use data that is not already contained in a soundfile, a soundfile converter that accepts text files may be used to create a standard audio file. E.g, the .DAT format for SOX. This is useful for implementing FIR filters.

FILES

The output file has a special convolve header, containing details of the source audio file. The analysis data is stored as 'float', in rectangular (real/imaginary) form.

***NOTE***: The analysis file is NOT system independent! Ensure that the original impulse recording/data is retained. If/when required, the analysis file can be recreated.

AUTHOR: Greg Sullivan, sullivan@aussie.enet.dec.com (Based on algorithm given in 'Elements Of Computer Music', by F. Richard Moore.