New Csound Tools for Real-time Spectral Processing.
Richard Dobson, August 2001

pvsinfo,pvsanal,pvsynth,pvsadsyn,pvsfread,pvscross,pvsmaska,pvsftw,pvsftr


Introduction.


With these opcodes, two new core facilities are added to Csound. They
offer improved audio quality, and fast performance, enabling
high-quality analysis and resynthesis (together with transformations)
to be applied in real-time to live signals. The original Csound phase
vocoder remains unaltered; the new opcodes use an entirely separate
set of functions based on "pvoc.c" in the CARL distribution, written
by Mark Dolson. The Csound dnoise and srconv utilities (also by
Dolson, from CARL) also use this pvoc engine.  CARL pvoc is also the
basis for the phase vocoder included in the Composers Desktop
Project.. A few small but important modifications have been made to
the original CARL code to support real-time streaming.

Primary facilities.

(1) support for the new PVOC-EX analysis file format. This is a fully
portable (cross-platform) open file format, supporting three analysis
formats, and multi-channel signals. Currently only the standard
amplitude+frequency format has been implemented in the opcodes, but
the file format itself supports amplitude+phase and complex
(real-imaginary) formats. In addition to the new opcodes, the original
Csound pvoc opcodes have been extended (and thereby with enhanced
audio quality in some cases) to read PVOC-EX files as well as the
original (non-portable) format. Full details of the structure of a
PVOC-EX file are available via the website:

http://www.bath.ac.uk/~masjpf/NCD/researchdev/pvocex/pvocex.html

This site also gives details of the freely available console programs
pvocex and pvocex2 which can be used to create PVOC-EX files in all
supported formats.

(2) a new frequency-domain signal type, fully streamable, with 'f' as
the leading character.  In this document it is conveniently referred
to as an 'fsig'. Primary support for fsigs is provided by the opcodes
pvsanal and pvsynth, which perform conventional phase vocoder
overlap-add analysis and resynthesis, independently of the orchestra
control-rate. The only requirement is that the control-rate kr be
higher than or equal to the analysis rate, which can be expressed by
the requirement that ksmps <= overlap, where overlap is the distance
in samples between analysis frames, as specified for pvsanal. As
overlap is typically at least 128, and more usually 256, this is not
an onerous restriction in practice. The opcode pvsinfo can be used at
init time to acquire the properties of an fsig.

The fsig enables the nominal separation between the analysis and
resynthesis stages of the phase vocoder to be exposed to the Csound
programmer, so that not only can alternatives be employed for either
or both of these stages (not only oscillator-bank resynthesis, but
also the generation of synthetic fsig streams), but opcodes, operating
on the fsig stream, can themselves become more elemental. Thus the
fsig enables the creation of a true streaming plugin framework for
frequency domain signals. With the old pvoc opcodes, each opcode is
required to act as a resynthesiser, so that facilities such as pitch
scaling are duplicated in each opcode; and in many cases the opcodes
are parameter-rich. The separation of analysis and synthesis stages by
means of the fsig encourages the development of a wide range of simple
building-block opcodes implementing one or two functions, with which
more elaborate processes can be constructed.

This is very much a preliminary and experimental release, and it is
possible that the precise definition of the opcodes may change, in
response to user feedback. Also, clearly, many new possibilities for
opcodes are opened up; these factors may also have a retrospective
influence on the opcodes presented here.

Note that some opcode parameters currently have restricted or missing
implementation. This is at least in part in order to keep the opcodes
simple at this stage, and also because they highlight important design
issues on which no decision has yet been made, and on which opinions
from users are sought.

One important point about the new signal type is that because the
analysis rate is typically much lower than kr, new analysis frames are
not available on each k-cycle. Internally, the opcodes track ksmps,
and also maintain a frame counter, so that frames are read and written
at the correct times; this process is generally transparent to the
user. However, it means that k-rate signals only act on an fsig at the
analysis rate, not at each k-cycle.  The opocde pvsftw returns a
k-rate flag that is set when new fsig data is valid.

Because of the nature of the overlap-add system, the use of these
opcodes incurs a small but significant delay, or latency, determined
by the window size (max(ifftsize,iwinsize)). This is typically around
23msecs. In this first release, the delay is slightly in excess of the
theoretical minimum, and it is hoped that it can be reduced, as the
opcodes are further optimized for real-time streaming.


The opcodes.


fsig   pvsanal  ain,ifftsize,ioverlap,iwinsize,iwintype[,iformat,iinit]

Generate an fsig from a mono audio source ain, using phase vocoder
overlap-add analysis.

	ifftsize: the FFT size in samples. Need not be a power of two
(though these are especially efficient), but must be even. Odd numbers
are rounded up internally. ifftsize determines the number of analysis
bins in fsig, as ifftsize/2 + 1. For example, where ifftsize = 1024,
fsig will contain 513 analysis bins, ordered linearly from the
fundamental to Nyquist.  The fundamental of analysis (which in
principle gives the lowest resolvable frequency) is determined as
sr/ifftsize. Thus, for the example just given and assuming sr = 44100,
the fundamental of analysis is 43.07Hz. In practice, due to the
phase-preserving nature of the phase vocoder, the frequency of any bin
can deviate bilaterally, so that DC components are recorded. Given a
strongly pitched signal, frequencies in adjacent bins can bunch very
closely together, around partials in the source, and the lowest bins
may even have negative frequencies.

As a rule, the only reason to use a non power-of-two value for
ifftsize would be to match the known fundamental frequency of a
strongly pitched source. Values with many small factors can be almost
as efficient as power-of-two sizes; for example: 384, for a source
pitched at around low A=110Hz.

	ioverlap: the distance in samples ('hop size') between
overlapping analysis frames. As a rule, this needs to be at least
ifftsize/4, e.g. 256 for the example above. ioverlap determines the
underlying analysis rate, as sr/ioverlap. ioverlap does not require to
be a simple factor of ifftsize; for example a value of 160 would be
legal. The choice of ioverlap may be dictated by the degree of pitch
modification applied to the fsig, if any.  As a rule of thumb, the
more extreme the pitch shift, the higher the analysis rate needs to
be, and hence the smaller the value for ioverlap. A higher analysis
rate can also be advantageous with broadband transient sounds, such as
drums (where a small analysis window gives less smearing, but more
frequency-related errors).

Note that is is possible, and reasonable, to have distinct fsigs in an
orchestra (even in the same instrument), running at different analysis
rates. Interactions between such fsigs is currently unsupported, and
the fsig assignment opcode does not allow copying between fsigs with
different properties, even if the only difference is in
ioverlap. However, this is not a closed issue, as it is possible in
theory to achieve crude rate conversion (especially with regard to
in-memory analysis files) in ways analogous to time-domain techniques.

	iwinsize: the size in samples of the analysis window filter
(as set by iwintype). This must be at least ifftsize, and can usefully
be larger.  Though other proportions are permitted, it is recommended
that iwinsize always be an integral multiple of ifftsize, e.g. 2048
for the example above.  Internally, the analysis window (Hamming, von
Hann) is multiplied by a sinc function, so that amplitudes are zero at
the boundaries between frames. The larger analysis window size has
been found to be especially important for oscillator bank resynthesis
(e.g. using pvsadsyn), as it has the effect of increasing the
frequency resolution of the analysis, and hence the accuracy of the
resynthesis. As noted above, iwinsize determines the overall latency
of the analysis/resynthesis system. In many cases, and especially in
the absence of pitch modifications, it will be found that setting
iwinsize=ifftsize works very well, and offers the lowest latency.

	iwintype:    the shape of the analysis window. Currently only
two choices are implemented:
			0 = Hamming window
			1 = von Hann window.

Both are also supported by the PVOC-EX file format.  The window type
is stored as an internal attribute of the fsig, together with the
other parameters (see pvsinfo). Other types may be implemented later
on (e.g. the Kaiser window, also supported by PVOC-EX), though an
obvious alternative is to enable windows to be defined via a function
table. The main issue here is the constraint of f-tables to
power-of-two sizes, so this method does not offer a complete solution.
Most users will find the Hamming window meets all normal needs, and
can be regarded as the default choice.

	iformat:  the analysis format. Currently only one format is 
implemented by this opcode:
	0	=  amplitude + frequency
	This is the classic phase vocoder format; easy to process, and
a natural format for oscillator-bank resynthesis. It would be very
easy (tempting, one might say) to treat an fsig frame not purely as a
phase vocoder frame but as a generic additive synthesis frame. It is
indeed possible to use an fsig this way, but it is important to bear
in mind that the two are not, strictly speaking, directly equivalent. 

	Other important formats (supported by PVOC-EX) are:
	1      = amplitude + phase
	2      = complex (real + imaginary)

	iformat is provided in case it proves useful later to add
support for these other formats. Formats 0 and 1 are very closely
related (as the phase is 'wrapped' in both cases - it is a trivial
matter to convert from one to the other), but the complex format might
warrant a second explicit signal type (a 'csig') specifically for
convolution-based processes, and other processes where the full
complement of arithmetic operators may be useful.
	
	iinit: skip reinitialzation. This is not currently implemented
for any of these opcodes, and it remains to be seen if it is even
practical.

Example:
	ain	in			       ; live source
	fin	pvsanal  ain,1024,256,2048,0   ; analyse, using Hamming
	fout	pvsmaska fin,1,0.75	       ; apply eq from f-table
	aout	pvsynth	 fout		       ; and resynthesize


ar	pvsynth 	fsrc
ar	pvsadsyn	fsrc,inoscs,kfmod[,ibinoffset,ibinincr,iinit]

Resynthesise fsrc using either FFT overlap-add (pvsynth) or fast 
oscillator-bank (pvsadsyn).

pvsadsyn is experimental, and implements the oscillator bank using a
fast direct calculation method, rather than a lookup table. This takes
advantage of the fact, empirically arrived at, that for the analysis
rates generally used, (and presuming analysis using pvsanal, where
frequencies in a bin change only slightly between frames) it is not
necessary to interpolate frequencies between frames, only
amplitudes. Accurate resynthesis is often contingent on the use of
pvsanal with iwinsize = ifftsize*2. This opcode is the most likely to
change, or be much extended, according to feedback and advice from
users. It is likely that a full interpolating table-based method will
be added, via a further optional iarg. The parameter list to pvsadsyn
mimics that for pvadd, but excludes spectral extraction.

	inoscs: the number of analysis bins to synthesise. Cannot be
larger than the size of fsrc (see pvsinfo, below), e.g. as created by
pvsanal.  Processing time is directly proportional to inoscs.

	kfmod:	     scale all frequencies by factor kfmod. 1.0 = no change, 
                     2 =  up one octave.

	ibinoffset:  the first (lowest) bin to resynthesise, counting from
                     0 (default = 0). 

	ibinincr:    starting from bin ibinoffset, resnthesise bins 
                     ibinincr apart.


Example:
; resynth the first 100 odd-numbered bins, with pitch scaling envelope.
kpch	linseg	    1,p3/3,1,p3/3,1.5,p3/3,1
aout	pvsadsyn    fsrc, 100,kpch,1,2	


ioverlap,inumbins,iwinsize,iformat 	pvsinfo 	fsrc

Get format information about an fsig, whether created by an opcode
such as pvsanal, or obtained from a PVOCEX file by pvsfread. This
information is available at init time, and can be used to set
parameters for other pvs~ opcodes, and in particular for creating
function tables (e.g. for pvsftw), or setting the number of
oscillators for pvsadsyn.

	ioverlap:   the stream overlap size.  
	inumbins:   the number of analysis bins (amplitude+frequency)
                    in fsrc.  The underlying FFT size is calculated as
                    (inumbins-1) * 2.  
	iwinsize:   the analysis window size. May be larger than the
                    FFT size. 
	iformat:    the analysis frame format. If fsrc is created by an 
                    opcode, iformat will always be 0, signifying
                    amplitude+frequency. If fsrc is defined from a
	            PVOC-EX file, iformat may also have the value 1 or
	            2 (amplitude+phase, complex).  

Example:

fin		    pvsfread	"test.pvx"   ; import pvocex file
iovl,inb,iws,ifmt   pvsinfo	fin          ; get inumbins info
ifn		    ftgen  	0,0,inb,10,1 ; and create f-table


fsig	pvsfread	ktimpt,ifn[,ichan]

Create an fsig stream by reading a selected channel from a PVOC-EX
analysis file loaded into memory, with frame interpolation. Only
format 0 files (amplitude+frequency) are currently supported. The
operation of this opcode mirrors that of pvoc, but outputs an fsig
instead of a resynthesized signal.

	ktimpt: Time pointer into analysis file, in seconds. See the
description of the same parameter of pvoc for usage.

	ifn:	name of the analysis file. This must have the .pvx 
file extension.

	A multi-channel PVOC-EX file can be generated using the
extended pvanal utility (see below).

	ichan: the channel to read (counting from 0). Default is 0.

Note that analysis files can be very large, especially if
multi-channel.  Reading such files into memory will very likely incur
breaks in the audio during real-time performance. As the file is read
only once, and is then available to all other interested opcodes, it
can be expedient to arrange for a dedicated instrument to preload all
such analysis files at startup.


Example:

	idur	filelen   "test.pvx" ; find duration of (stereo) analysis file
	kpos	line      0,p3,idur  ; to ensure we process whole file
	fsigr	pvsfread  kpos,"test.pvx",1 ; create fsig from R channel

(NB: as this example shows, the filelen opcode has been extended to
accept both old and new analysis file formats).


	fsig	pvscross	fsrc,fdest,kamp1,kamp2

Perform cross-synthesis between two source fsigs.

The operation of this opcode is identical to that of pvcross (q.v.),
except in using fsigs rather than analysis files, and the absence of
spectral envelope preservation. The amplitudes from fsrc are applied
to fdest, using scale factors kamp1 and kamp2 respectively. kamp1 and
kamp2 must not exceed the range 0 to 1.

With this opcode, cross-synthesis can be performed on real-time audio
input, by using pvsanal to generate fsrc and fdest. These must have
the same format.

Example:

        kcross  linseg	   0,p3/3,0,p3/3,1,p3/3,1 ; progressive cross-synthesis
        fcross  pvscross   fsig1,fsig2,1-kcross,kcross
        across  pvsynth	   fcross


	fsig	pvsmaska	fsrc,ifn,kdepth

Modify amplitudes of fsrc using function table, with dynamic scaling.

	ifn : the f-table to use. Given fsrc has N analysis bins,
table ifn must be of size N or larger. The table need not be
normalized, but values should lie within the range 0 to 1. It can be
supplied from the score in the usual way, or from within the orchestra
by using pvsinfo (see below) to find the size of fsrc, (returned by
pvsinfo in inbins), which can then be passed to ftgen to create the
f-table.

	kdepth : controls the degree of modification applied to fsrc,
using simple linear scaling. 0 leaves amplitudes unchanged, 1 applies
the full profile of ifn.

Note that power-of-two FFT sizes are particularly convenient when
using table-based processing, as the number of analysis bins (inbins)
is then a power-of-two plus one, for which an exactly matching f-table
can be created.  In this case it is important that the f-table be
created with a size of inbins, rather than as a power of two, as the
latter will copy the first table value to the guard point, which is
inappropriate for this opcode.


Example (using score-supplied f-table, assuming fsig  fftsize = 1024):
	; score f-table using cubic spline to define shaped peaks 
	f1 0 513 8 0 2 1 3 0 4 1 6 0 10 1 12 0 16 1 32 0 1 0 427 0

	asig	buzz	 20000,199,50,3        ; pulsewave source
	fsig	pvsanal  asig,1024,256,1024,0  ; create fsig
	kmod	linseg   0,p3/2,1,p3/2,0       ; simple control sig	
	fsig	pvsmaska fsig,2,kmod           ; apply weird eq to fsig
	aout	pvsynth  fsig                  ; resynthesize,
                dispfft  aout,0.1,1024         ; and view the effect

This also illustrates that the usual Csound behaviour applies to
fsigs; the same name can be used for both input and output.


	kflag	pvsftw 	fsrc,ifna [,ifnf]
		pvsftr 		fsrc,ifna [,ifnf]

Copy fsig amplitude and/or frequency data to (pvsftw) and from
(pvsftr) function tables, for external processing.
	
	kflag:   has the value 1 when new data is available, 0 otherwise.
	ifna :	 table at least inbins in size that stores amplitude data. 
                 Ignored if ifna  = 0
	ifnf :   table at least inbins in size that stores frequency data. 
                 Ignored if ifnf  = 0

These opcodes enable the contents of fsrc to be exchanged with
function tables, for custom processing. Except when the frame overlap
equals ksmps (which will generally not be the case), the frame data is
not updated each control period, and the data in ifna, ifnf should
only be processed when kflag is set to 1. To process only frequency
data, set ifna to zero.

As the functions tables are required only to store data from fsrc,
there is no advantage in defining then in the score, and they should
generally be created in the instrument, using ftgen.

By exporting amplitude data, say, from one fsig and importing it into
another, basic cross-synthesis (as in pvscross) can be performed, with
the option to modify the data beforehand using the table manipulation
opcodes.  Note that the format data in the source fsig is not written
to the tables.  Used this way, these opcodes become potentially
pathological, and can be relied upon to produce unexpected results. In
such cases, resynthesis using pvsadsyn would almost certainly be
required.


Example:

ifn	ftgen		0,0,inbins,10,1	; make ftable	
kflag	pvsftw		fsrc,ifn              ; export  amps to table,
kamp	init 		0		
if	kflag==0 	kgoto contin	; only proc when frame is ready
; kill lowest bins, for obvious effect
	tablew		kamp,1,ifn
	tablew		kamp,2,ifn
	tablew		kamp,3,ifn
	tablew		kamp,4,ifn	
; read modified data back to fsig
	pvsftr		fsrc,ifn  	
contin:
; and resynth
aout	pvsynth		fsrc


Pvanal extension to create a PVOC-EX file.

The standard Csound utility program pvanal has been extended to enable
a PVOC-EX format file to be created, using the existing interface. To
create a PVOC-EX file, the file name must be given the required
extension, ".pvx", e.g.  "test.pvx". The requirement for the FFT size
to be a power of two is here relaxed, and any positive value is
accepted; odd numbers are rounded up internally. However, power-of-two
sizes are still to be preferred for all normal applications.

The channel select flags are ignored, and all source channels will be
analysed and written to the output file, up to a compiler-set limit of
eight channels. The analysis window size (iwinsize) is set internally
to double the FFT size.

Use of PVOC-EX files with the old Csound pvoc opcodes.

All the original pvoc opcodes can now read a PVOC-EX file, as well as
the native non-portable file format. As the PVOC-EX file uses a
double-size analysis window, users may find that this gives a useful
improvement in quality, for some sounds and processes, despite the
fact that the resynthesis does not use the same window size.

Apart from the window size parameter, the main difference between the
original .pv format and PVOC-EX is in the amplitude range of analysis
frames.  While rescaling is applied, so that no significant difference
in output level is experienced, whichever file format is used, some
slight loss of amplitude can still arise, as the double window usage
itself modifies frame amplitudes, of which the resynthesis code is
unaware. Note that all the original pv~ opcodes expect a mono analysis
file, and multi-channel PVOC-EX files will accordingly be rejected.