Measuring Voice Quality

MOS quality is very important factor regarding VoIP. On this page, guidelines were provided on how to design a network to ensure good voice quality. However, we still need the ability to measure and compare voice quality. This can be done using a number of different techniques.

Mean Opinion Score (MOS)

Described in ITU-T P.800, MOS is the most well-known measure of voice quality. It is a subjective method of quality assessment. There are two test methods: conversation-opinion test and listening-opinion test.

Voip test subjects judge the quality of the voice transmission system either by carrying on a conversation or by listening to speech samples. They then rank the voice quality using the following scale:

  • 5 – Excellent
  • 4 – Good
  • 3 – Fair
  • 2 – Poor
  • 1 – Bad

MOS is then computed by averaging the scores of the test subjects. Using this scale, an average score of 4 and above is considered as toll-quality. MOS was originally designed to assess the quality of different coding standards. The following is a summary of the MOS for different coding algorithms.


CodingStandard MOS
G.711 4.3 – 4.4 (64 kbps)
G.726 4.0 – 4.2 (32 kbps)
G.728 4.0 – 4.2 (16 kbps)
G.729 4.0 – 4.2 (8 kbps)
G.723.1 3.8 – 4.0 (6.3 kbps)
3.5 (5.3 kbps)

MOS test Voip

MOS is the most relevant test because it is humans who use the voice network and it is humans whose opinions count. However, a subjective test that involves human subjects can be time-consuming to administer. Hence, there is a lot of interest in devising objective tests that can be used to approximate human perception of voice quality.

Perceptual Speech Quality Measure (PSQM)

Described in ITU-T P.861, PSQM (img. below) uses a psychoacoustic model to mathematically compute the differences between the input and output signals.

Perceptual Speech Quality Measure (PSQM)


Using this method, if the input and output signals are identical, the PSQM score will be zero. The bigger the differences, the higher the score will be up to a maximum of 6.5. However, unlike more traditional measurements such as signal-to-noise ratio (SNR), the emphasis of PSQM is on differences that will affect human perception of speech quality.
One of the PSQM’s criticisms is that it was originally designed to measure the quality of coding standards. Therefore, it does not fully take into account the effect of various transmission impairments. PSQM+ was proposed in December 1997 and accounts for:

  • Different perceptions due to volume or loud distortions
  • Speech that has dropouts

With PSQM+, the correlation between the objective score and MOS is improved.

Other Speech Quality Measures

There are a number of other objective measures that either has been proposed or are in use, including:

  • Measuring Normalizing Blocks (MNB)
  • Perceptual Analysis Measurement System (PAMS) – a proprietary system developed by British Telecom
  • Perceptual Evaluation of Speech Quality (PESQ) – a proposed standard being considered by the ITU-T

 Transmission Characteristics and the E-Model

In a VoIP network, transmission impairments play a very important role in determining voice quality. As discussed in “Voice Quality” here, these transmission impairments include frame loss, delay, and jitter. Another approach in voice quality testing is to measure directly those transmission impairments and then predict what the voice quality will be given those impairments.
The E-Model, as described in ITU-T G.107, provides a useful computational model for predictive analysis. The basic equation of the model is as follows:
R = Ro – Is – Id – Ie + A

R – Transmission rating factor

Ro – Basic signal-to-noise ratio. This is computed from all circuit noise powers.

Is – Simultaneous impairment factor. This accounts for impairments caused by non-optimum sidetone and quantizing distortion.

Id – Delay impairment factor. This accounts for impairments caused by a delay in the network.

Ie – Equipment impairment factor. This accounts for impairments caused by low bit rate coders as well as the effect of frame loss on the coder. This was discussed earlier in detail in Frame Loss here.

A – Expectation factor. This is a correction factor that adjusts perceived quality based on user expectation. For example, if users are aware that they are communicating with a hard-to-reach location via
multi-hop satellite connections, they may be more willing to tolerate impairments due to long delays.
For example, once the transmission impairments in an IP network have been measured, the E-Model can be used to calculate the transmission rating factor.

4 The transmission
rating factor can then be transformed into MOS using the following equations:

  • For R < 0:      MOS = 1
  • For 0 < R < 100
    MOS = 1 + 0.035R + 7R(R-60)(100-R) x 10-6
  • For R > 100    MOS = 4.5

Which Voice Quality Measure Should be Used

Given the plethora of measurement methods, which one should be used? In practice, a number of methods can be used in combination. As mentioned previously, MOS is the most relevant measure because it is the human opinion that counts the most. So it should always be used as a reality check. Instead of conducting a formal MOS test, you may choose to run a pilot of a VoIP network, let a select group of users try out the system, and then provide you with feedback.

However, when you are configuring the system, you may make many adjustments, and getting human subjects to assess the effect may be impractical. In these cases, an objective test system such as PSQM, PAMS, or PESQ may be more convenient. In VoIP, QoS is a very important component. In measuring the effectiveness of QoS, measuring the transmission impairments (frame loss, delay, and jitter) is more useful because it helps to directly answer questions such as:

  • If the network is congested, are voice frames given higher priority compared with data frames?
  • What is the average delay experienced by voice frames?
  • What is the average jitter seen by voice frames?

In testing a VoIP network, it is necessary to create a realistic test environment. Typically, this means that there are a number of concurrent voice sessions and that both voice and data traffic are present and competing for bandwidth.

Testing VoIP

In this section, we will examine the different test configurations and the objectives of the tests.

IP Network Analysis

The main objective of this test is to measure the transmission characteristics of an IP network to determine if it can support VoIP applications. It is also an important test to measure the effectiveness of QoS mechanisms. The image below shows the typical test configuration.

IP Network traffic generator

IP Network Analysis

To test the effectiveness of QoS, you must be able to simulate a mixture of voice and data traffic. By measuring the transmission characteristics of each flow, you can test whether the IP network provides different treatment for voice and data traffic. Note that this test does not involve the use of any VoIP equipment.

End-to-End Voice Analysis

The objective is to test the ability of the VoIP network to transmit voice and other related signals from end to end. The most important part of this test is to assess speech quality. The image below shows the typical test configuration. This configuration allows a number of different tests, as described below.

End-to-End Voice Analysis

End-to-End Voice Analysis

Using the handsets, a voice call can be placed and human subjects can be used to assess the quality of the voice transmission.

Using a PSQM test system, the same test can be done objectively. This allows different gateway configurations (changing the CODEC or turning voice activity detection on and off) to be rapidly tested.

The traffic generator can also be used in this configuration to increase the load on the IP network, thereby testing the QoS of the network.

In addition to testing voice transmission, other types of information may also be transmitted over the VoIP network.

These include DTMF tones, fax, and modem.

For example, if you configure the gateway to use a low bit-rate coder such as G.729 with VAD, a fax will not transmit properly. However, some gateways can detect the presence of the fax tone and will switch over to G.711 and turn off VAD automatically.

Other gateways implement fax-relay that extracts the fax data and only transmits the data across the IP network. These mechanisms need to be tested.

Another common test configuration involves the use of an impairment simulator. When testing voice terminals, it is often desirable to test their performance under degraded operating conditions. For example, determining what happens to voice quality if the frame loss rate is 1%, 2%, 3%, and so on.

To a certain extent, the insertion of data traffic into the network to cause congestion can achieve that. However, it is difficult to adjust the data traffic level to cause a frame loss rate of, say, exactly 3%.

An impairment simulator can be used to precisely create a set of degraded operating conditions. The resulting voice quality can again be measured either subjectively by human listeners or objectively using a PSQM test system. This test configuration is illustrated in the image below.

Impairment Simulation

Impairment Simulation

Signaling Stress Test

This test focuses on the scalability of a VoIP network. In the PSTN, traditional telephone switches or PBXs have been tested extensively to ensure that they can handle a large number of calls.

In a VoIP network, call routing is handled by devices such as gatekeepers (in H.323), call agents (in MGCP and MEGACO) and SIP servers (in SIP). These devices must also be tested in a similar way. This test can be performed with the help of a bulk call generator as shown in the image below:

Signaling Stress Test

Signaling Stress Test

Typical measurements include:

  • What is the highest call rate the servers can sustain?
  • What is the highest number of calls the servers can maintain simultaneously?
  • What is the call setup time in relation to the load?x