Prosody speech processing: API: sm_record_start
Prototype Definition
int sm_record_start(struct sm_record_parms *recordp)
Parameters
- *recordp
-
a structure of the following type:
typedef struct sm_record_parms {
tSMChannelId channel; /* in */
tSMChannelId alt_data_source; /* in */
enum kSMDataFormat type; /* in */
/* Only in Prosody version 1 */
enum kSMDRecordElimination {
kSMDRecordNoElimination,
kSMDRecordSilenceElimination,
kSMDRecordToneElimination,
} elimination; /* in */
/* End of part only in Prosody version 1 */
/* Only in Prosody version 2 (TiNG) */
tSM_UT32 silence_elimination; /* in */
enum kSMToneDetection tone_elimination_mode; /* in */
tSM_UT32 tone_elimination_set_id; /* in */
/* End of part only in Prosody version 2 (TiNG) */
tSM_UT32 max_octets; /* in */
tSM_UT32 max_elapsed_time; /* in */
tSM_UT32 max_silence; /* in */
tSM_INT agc; /* in */
tSM_INT volume; /* in */
enum kSMRecordAltSource {
kSMRecordAltSourceDefault,
kSMRecordAltSourceInput,
kSMRecordAltSourceOutput,
} alt_data_source_type; /* in */
/* Only in Prosody version 2 (TiNG) */
tSM_UT32 sampling_rate; /* in */
double min_noise_level; /* in */
double grunt_threshold; /* in */
/* End of part only in Prosody version 2 (TiNG) */
} SM_RECORD_PARMS;
Description
This call starts a new recording job using the specified
channel.
Normally
alt_data_source
is set to kSMNullChannelId and the data that will
be recorded will be that switched to this input channel. If
however
alt_data_source
is set to the channel id of another existing channel, then the
data source for the recording will be determined by the value of
alt_data_source_type.
Note that the channel specified in
alt_data_source
must not be reconfigured while this recording is in progress.
The PCM data received will be encoded into buffers in the format
specified by the
type
parameter which is a value from same range of values
permitted in the
type
parameter of
sm_replay_start().
Note that, for compatibility with earlier
releases of Prosody, many other values are permitted for the
type
field. These compatibility values specify a combination of
data type and sampling rate. When one of these is used in
the
type
field, the
sampling_rate
field must be zero, and the actual rate used will be as
listed here. They are:
| compatibility code |
new code |
| type |
sampling rate |
| kSMDataFormat8KHzALawPCM |
kSMDataFormatALawPCM |
8000 |
| kSMDataFormat8KHzULawPCM |
kSMDataFormatULawPCM |
8000 |
| kSMDataFormat8KHzOKIADPCM |
kSMDataFormatOKIADPCM |
8000 |
| kSMDataFormat8KHzACUBLKPCM |
kSMDataFormatACUBLKPCM |
8000 |
| kSMDataFormat6KHzALawPCM |
kSMDataFormatALawPCM |
6000 |
| kSMDataFormat6KHzULawPCM |
kSMDataFormatULawPCM |
6000 |
| kSMDataFormat6KHzOKIADPCM |
kSMDataFormatOKIADPCM |
6000 |
| kSMDataFormat6KHzACUBLKPCM |
kSMDataFormatACUBLKPCM |
6000 |
| kSMDataFormat8KHz16bitMono |
kSMDataFormat16bit |
8000 |
| kSMDataFormat8KHz8bitMono |
kSMDataFormat8bit |
8000 |
| kSMDataFormat8KHzSigned8bitMono |
kSMDataFormatSigned8bit |
8000 |
| kSMDataFormatIMAADPCM |
kSMDataFormatIMAADPCM |
8000 |
The only record types supported are those listed as
compatibility codes in the above table with two groups of exceptions.
Firstly, kSMDataFormat8KHz16bitMono and
kSMDataFormat8KHz8bitMono are not supported
(refer to appendix B for the record data formats
supported by each firmware type), and secondly,
an additional pair of type codes are
recognised,
kSMDataFormat8KHzPCM and
kSMDataFormat6KHzPCM
which means that the data should be supplied to
the application PCM encoded in a way compatible with the loaded
firmware, either as
kSMDataFormat8KHzALawPCM
or
kSMDataFormat8KHzULawPCM for 8KHz, or
kSMDataFormat6KHzALawPCM
or
kSMDataFormat6KHzULawPCM for 6KHz.
(Thus if sp30a.smf firmware is loaded, the
data is A-law encoded, while if sp30u.smf firmware is loaded, it
is mu-law encoded).
Any form of record requires the module
inchan
to have been downloaded in addition to the module
that is required for the specific type of record, and any
module required for the sampling rate:
The sampling rate firmware:
| sampling rate |
extra firmware required |
| 8000 | - |
| 6000 |
sixkin
|
| 11000 |
8_to_11
|
See
Prosody application note: speech processing replay and record data formats
for more details on data formats supported by
Prosody and their appropriate use.
The
volume
parameter is for reserved future use and must be set to zero.
The
volume
parameter is the change in volume compared to the level of
the data (i.e. set this to -6 to attenuate by
6dB). If AGC and volume are both applied, the change in volume
requested is applied after AGC.
The
agc
parameter controls whether automatic gain control is applied to
the recorded data. If
agc
is non-zero then automatic gain control is applied. Even if this
is the case, the recording level is still governed by volume.
The behaviour of the AGC algorithm may be controlled by changing
its parameters, see
sm_adjust_agc_module_params()
for more details.
The recorded data may be retrieved by the application through
periodic calls to
sm_get_recorded_data().
The amount of data recorded is determined by the termination
criteria specified in the parameters:
|
max_octets
|
max octets of data to record, 0 if no limit |
|
max_elapsed_time
|
max recording period in mS, 0 if no limit |
|
max_silence
|
max period of silence in mS before recording terminated, 0
if no limit |
and also by the function
sm_record_abort()
which will terminate a recording directly.
If an event has been previously associated with a channel (see
sm_channel_set_event()),
then the driver will notify the application with that event
whenever (for that channel):
- recorded data becomes newly available for collection by
sm_get_recorded_data()
-
- recording terminates due to one of the termination
criteria being met
Fields
- channel
- The channel to perform the record.
- alt_data_source
kSMNullChannelId, or another
channel whose input or output is to be recorded. If this
specifies a channel, that channel must not be reconfigured
while recording is taking place.
- type
- The format in which to record. (See the main text above for
compatibility codes that can also be
used in this field.)
One of these values:
- kSMDataFormatNone
- Special value for test purposes only. This indicates that the
channel should prepare as if it was about to play or record
data, but not actually transfer any data.
- kSMDataFormatALawPCM
- G.711 A-law. This uses 8 bits per sample.
- kSMDataFormatULawPCM
- G.711 mu-law. This uses 8 bits per sample.
- kSMDataFormatOKIADPCM
- A 4-bit coding scheme.
- kSMDataFormatACUBLKPCM
- Aculab proprietary compressed speech. It averages two bits per
sample, but encodes blocks of speech, so it does not have an
exact bit rate per sample.
- kSMDataFormat16bit
- 16-bit linear coding, where each sample is a signed value
(-32768 to 32767). The first octet of each sample is the less
significant one.
- kSMDataFormat8bit
- 8-bit unsigned linear coding, where each sample is an unsigned value
(0 to 255). This is Microsoft's 8-bit format.
- kSMDataFormatSigned8bit
- 8-bit linear coding, where each sample is a signed value (-128 to 127).
- kSMDataFormatIMAADPCM
- A 4-bit coding scheme standardised by the Interactive Multimedia
Association (IMA).
- elimination (Only in Prosody version 1)
- What should be eliminated from the recording.
One of these values:
- kSMDRecordNoElimination
- No silence or tone elimination will occur.
- kSMDRecordSilenceElimination
- Periods of silence longer than a threshold value of 1000
milliseconds during the recording will be eliminated from the
recorded data. If, however, grunt detection is enabled for this
channel, the silence threshold will be the specified
grunt_latency value, see
sm_listen_for()).
- kSMDRecordToneElimination
- Any tones in the active tone set specified by the last call to
sm_listen_for()
will be eliminated from the recorded data.
- silence_elimination (Only in Prosody version 2 (TiNG))
- The maximum duration (in mS) of silence to record. Silences longer
than this are truncated to this length. The value zero disables
silence elimination.
Requires the module
grunt.
- tone_elimination_mode (Only in Prosody version 2 (TiNG))
- What types of tones to eliminate from the recording. This
allows the same tone detection as
sm_listen_for().
Requires the module
td
unless the value is
kSMToneDetectionNone.
- tone_elimination_set_id (Only in Prosody version 2 (TiNG))
- The tone set to use (only relevant if
tone_elimination_mode
is not
kSMToneDetectionNone). See
sm_listen_for()
for details of how to select an input tone set.
- max_octets
- The maximum amount of data to record. The value zero indicates
no maximum.
- max_elapsed_time
- The maximum duration of the recording in mS. The value zero
indicates no maximum.
Requires the module
timerx.
- max_silence
- The maximum silence permitted (in mS). The value zero indicates
no maximum. Silences longer than this cause the recording to
terminate.
Requires the module
grunt.
- agc
- Indicator of whether automatic gain control is to be enabled.
(non-zero) or not (zero).
Requires the module
gainbg.
- volume
- The desired adjustment to the volume (dB). The range of gain
supported is at least +8 to -22 dB,
Requires the module
gainbg.
- alt_data_source_type
- If an
alt_data_source
channel is specified, which kind of data associated with that
channel should be recorded.
One of these values:
- kSMRecordAltSourceDefault
- If
alt_data_source
is an input only channel, then data switched to this channel
input will be recorded (not all version 1 firmwares support this
mode), otherwise the data being generated on this channel output
will be recorded (this feature is normally used to record
conferenced outputs).
- kSMRecordAltSourceInput
- Data switched to
alt_data_source
input will be recorded (not all version 1 firmwares support this mode).
This value is deprecated since several channels can take input
from the same timeslot and that is a more straightforward way of
achieving the same result.
- kSMRecordAltSourceOutput
- Data generated on
alt_data_source
output will be recorded.
- sampling_rate (Only in Prosody version 2 (TiNG))
- The sampling rate at which to record the data. Currently supported
values are:
- 0 - record at the rate reported via
sm_record_status().
- 8000 - the typical rate for telephony, since it is the rate at
which telephone networks themselves operate.
- 6000 - a rate which reduces file sizes at the cost of lower
quality.
- 11000 - a rate convenient for use with typical PC soundcards.
This is sufficiently close to a quarter of the rate used
by CDs (44100 Hz) that the difference is not significant,
allowing almost universal compatibility with cheap PC
soundcards which can handle 11025 Hz sampling.
Note that when you specify a non-zero value here, this function
assumes that the source of the data to be recorded is providing
data at 8000 samples per second. The use of data at other rates is
not supported and will cause the data to be recorded at an incorrect
sampling rate. Consequently, the use of a non-zero value in this
field is deprecated.
- min_noise_level (Only in Prosody version 2 (TiNG))
- The minimum level, in dBm0, that the noise estimate of the grunt detector may
reach. The default is -55 dBm0. Only used if
silence_elimination
or
max_silence
are non zero.
Requires the module
grunt.
- grunt_threshold (Only in Prosody version 2 (TiNG))
- The threshold, in dB, above the noise estimate of the grunt detector at
which a signal is considered present. The default is 15 dB. Only used if
min_noise_level
is non zero.
Requires the module
grunt.
Returns
0
if call completed successfully, otherwise a standard error such as:
- ERR_SM_DEVERR - device error
- ERR_SM_WRONG_CHANNEL_STATE - if already recording
- ERR_SM_WRONG_CHANNEL_TYPE - if attempt to record using output channel
- ERR_SM_NOT_SAME_MODULE - alt_data_source channel not located on same module
This function is part of the Prosody speech processing API.