Objective measures are used in sound quality assessment to measure human response without the effort of undertaking jury testing. However, not all objectives measures are applicable for every product. So before using objective measures, it is necessary to carry out jury testing to see which measures are useful. It might be that you need to devise new metrics which correlate with the jury's response. For this reason, at the end of jury testing you may end up comparing the objective and subjective results. So how is this done?

The objective measures vary greatly between the wash, drain and spin cycles, so there is no simple overall objective measures, and so we only compare the subjective scores within the three cycles and we must ignore the overall subjective scores.

We correlate the objective and subjective measures using a correlation coefficient. Take the case of the wash cycle and the question about loudness and the objective metric loudness:

A | B | C | D | E | |

Subjective loudness averaged over all subjects | -0.740 | 0.0365 | -1.349 | -0.2636 | -0.1758 |

Objective metric loudness | 4.35 | 5.71 | 2.86 | 4.61 | 4.28 |

The correlation coefficient between these two values can be calculated in Excel using correl() and is found to be 0.9.

The Pearson correlation coefficient lies between -1 and +1.

- If the value is +1, then the two scales are perfectly correlated, and the judgements are inter-related.
- If the value is -1, they are also perfectly correlated, but the scales go in opposite directions (when there is a high score on one scale, you get a low score on the other scale, and vice versa).
- When the value is 0, then there is no relationship between the scales

In most cases, the scores do not lie at the extremes, in which case a significant test should be used to find out whether the scales are significantly related. You compare the magnitude of the correlation coefficient to the values in the **Correlation coefficient significance table**. If your correlation coefficient exceeds the value in the table, the correlation coefficient is significant.

In this case the threshold is .878 as the degrees of freedom is 5-2 (number of washing machines-2), so the correlation is significant.

So we have the rather unsurprising result that subjective loudness correlates with the objective metric loudness. There are a large number of inter-relations to compare, and you can find them summarized in this spreadsheet. Look at the worksheets "washing" "spinning" "draining".

Loudness is the most important objective metric, correlating with all subjective scales except one (the sound tells you what it does) for the spin cycle when perceived monaurally, and correlating with loudness, pleasantness (and possibly robustness) for the spin cycle when perceived binaurally. We have previously found that the subjective scale "the sound tells you what it does" was not useful for this product, so the lack of correlation here is not important.

There are correlations between objective loudness and monaurally perceived pleasantness, robustness and quality for the draining cycle. There are no significant correlations between any of the scales and objective loudness for the draining cycle when the sounds were presented binaurally.

For the washing phase there are correlations between subjectively perceived and objectively measured loudness monaurally and binaurally. Pleasantness correlates with the objective loudness metric when the sounds were presented binaurally, but this is not the case for the monaural presentation of sounds.

There are few significant correlations with any of the other objective metrics. For example, for the spin cycle the objective measure tonality is useful in some cases. To get a better understanding of which objective measures are useful, it would probably be necessary now to test more washing machines. However, one must be wary of picking out odd correlations here and there if there are only a few that are significant, because when you generate such large amounts of data, you are almost certain to find some correlations just by chance. Remember that the threshold is only saying there is a 95% probability of the relationship being significant.

These results show the dominance of loudness in subjective preference, which is a common finding in many perceptual tests across acoustics. The lack of correlation between the other objective measures and subjective response has been found by others; informal discussion with experts in sound quality testing revealed that others have found that most objective measures are not useful for domestic appliances. At this point, therefore, it is necessary to revisit the sounds and to look for other aspects that might correlate with subjective response; in other words to draw up new metrics. This is a common approach in the automobile industry. This is, however, a slow process, and at this point it might be worth deciding to just continue using jury testing.