Measuring Voice Quality
On this page, guidelines were provided on how to design a network to ensure good voice quality. However, we still need the ability to measure and compare voice quality. This can be done using a number of different techniques.
Mean Opinion Score (MOS)
Described in ITU-T P.800, MOS is the most well-known measure of voice quality. It is a subjective method of quality assessment. There are two test methods: conversation-opinion test and listening-opinion test.
Test subjects judge the quality of the voice transmission system either by carrying on a conversation or by listening to speech samples. They then rank the voice quality using the following scale:
- 5 – Excellent
- 4 – Good
- 3 – Fair
- 2 – Poor
- 1 – Bad
MOS is then computed by averaging the scores of the test subjects. Using this scale, an average score of 4 and above is considered as toll-quality. MOS was originally designed to assess the quality of different coding standards. The following is a summary of the MOS for different coding algorithms.
|G.711||4.3 – 4.4 (64 kbps)|
|G.726||4.0 – 4.2 (32 kbps)|
|G.728||4.0 – 4.2 (16 kbps)|
|G.729||4.0 – 4.2 (8 kbps)|
|G.723.1||3.8 – 4.0 (6.3 kbps)|
|3.5 (5.3 kbps)|
MOS is the most relevant test because it is humans who use the voice network and it is humans whose opinions count. However, a subjective test that involves human subjects can be time-consuming to administer. Hence, there is a lot of interest in devising objective tests that can be used to approximate human perception of voice quality.
Perceptual Speech Quality Measure (PSQM)
Described in ITU-T P.861, PSQM (img. below) uses a psychoacoustic model to mathematically compute the differences between the input and output signals.
Using this method, if the input and output signals are identical, the PSQM score will be zero. The bigger the differences, the higher the score will be up to a maximum of 6.5. However, unlike more traditional measurements such as signal-to-noise ratio (SNR), the emphasis of PSQM is on differences that will affect human perception of speech quality.
One of the PSQM’s criticisms is that it was originally designed to measure the quality of coding standards. Therefore, it does not fully take into account the effect of various transmission impairments. PSQM+ was proposed in December 1997 and accounts for:
- Different perceptions due to volume or loud distortions
- Speech that has dropouts
With PSQM+, the correlation between the objective score and MOS is improved.
Other Speech Quality Measures
There are a number of other objective measures that either have been proposed or are in use, including:
- Measuring Normalizing Blocks (MNB)
- Perceptual Analysis Measurement System (PAMS) – a proprietary system developed by British Telecom
- Perceptual Evaluation of Speech Quality (PESQ) – a proposed standard being considered by the ITU-T
Transmission Characteristics and the E-Model
In a VoIP network, transmission impairments play a very important role in determining voice quality. As discussed in “Voice Quality” on page 3, these transmission impairments include frame loss, delay, and jitter. Another approach in voice quality testing is to measure directly those transmission impairments and then predict what the voice quality will be given those impairments.
The E-Model, as described in ITU-T G.107, provides a useful computational model for predictive analysis. The basic equation of the model is as follows:
R = Ro – Is – Id – Ie + A
R – Transmission rating factor
Ro – Basic signal-to-noise ratio. This is computed from all circuit noise powers.
Is – Simultaneous impairment factor. This accounts for impairments caused by non-optimum sidetone and quantizing distortion.
Id – Delay impairment factor. This accounts for impairments caused by a delay in the network.
Ie – Equipment impairment factor. This accounts for impairments caused by low bit rate coders as well as the effect of frame loss on the coder. This was discussed earlier in detail in Frame Loss here.
A – Expectation factor. This is a correction factor that adjusts perceived quality based on user expectation. For example, if users are aware that they are communicating with a hard-to-reach location via
multi-hop satellite connections, they may be more willing to tolerate impairments due to long delays.
For example, once the transmission impairments in an IP network have been measured, the E-Model can be used to calculate the transmission rating factor.
4 The transmission
rating factor can then be transformed into MOS using the following equations:
- For R < 0: MOS = 1
- For 0 < R < 100
MOS = 1 + 0.035R + 7R(R-60)(100-R) x 10-6
- For R > 100 MOS = 4.5
Which Voice Quality Measure Should be Used
Given the plethora of measurement methods, which one should be used? In practice, a number of methods can be used in combination. As mentioned previously, MOS is the most relevant measure because it is the human opinion that counts the most. So it should always be used as a reality check. Instead of conducting a formal MOS test, you may choose to run a pilot of a VoIP network, let a select group of users try out the system, and then provide you with feedback. However, when you are configuring the system, you may make many adjustments, and getting human subjects to assess the effect may be impractical. In these cases, an objective test system such as PSQM, PAMS, or PESQ may be more convenient. In VoIP, QoS is a very important component. In measuring the effectiveness of QoS, measuring the transmission impairments (frame loss, delay, and jitter) is more useful because it helps to directly answer questions such as:
- If the network is congested, are voice frames given higher priority compared with data frames?
- What is the average delay experienced by voice frames?
- What is the average jitter seen by voice frames?
In testing a VoIP network, it is necessary to create a realistic test environment. Typically, this means that there are a number of concurrent voice sessions and that both voice and data traffic are present and competing for bandwidth.