Thursday 27 March 2014

The Listening Tests and Preliminary Results

There are two distinct listening tests. The first will take the form of a SurveyMonkey online survey, with a download link to the selected track versions. The purpose of this test is to establish a baseline for 'average' end-users of music as well as professionals within their usual listening environment. The second will be a controlled test in a calibrated listening environment with a similar protocol.

After preliminary testing it is clear that there are several brackets of differences between bitrates. The drop in information from a CD-quality 16-bit/44.1kHz PCM file to even the top-tier 320kbps .mp3 is audible through quality professional equipment. However this discrepancy is extremely subtle - it is perceived as a very slight 'boxiness' of the vocal range and 8-10kHz transients. The attack of the sound is in some way distorted.

This is not surprising considering that CD quality is 1,411.2kbps; a drop to 320 is a significant change. However this perceived change is so slight, that the issue of perception is revealed to be affected by technology almost as much as the human ear.

For example, a test was performed with an A/B (not blind) between compressed versions of the same track, at the following bitrates:
-CD-Quality, 16-Bit 44.1kHz .wav PCM - 1,411.2kbps
-Magix Samplitude .mp3 files at:
->320kbps
->256kbps
->192kbps
->160kbps
->112kbps
->96kbps
->48kbps (mono)
-iTunes .mp3 files at:
->192kbps
->160kbps
->128kbps
-iTunes AAC files at:
->256kbps
->128kbps

The result was that a slight change, as described above, was perceived between CD and 320kbps .mp3. However, this change was only audible on the main studio monitors, which excel in transient reproduction. The secondary monitors, NAD 8020e, rendered this subtle 8-10kHz dynamic change inaudible. So the change was obviously there, but perhaps a majority of consumers would not possess the equipment necessary to hear it. As expected, the changes became more obvious as the bitrate of the lower-quality file decreased.

This raises the first emergent question for the project which would be difficult to answer conclusively:
'Is the fidelity of the sound reproduction technology used as important an issue as human listening ability?'

This is entirely possible. Consider the following:
A large number of music consumers listen to their music on headphones which come as standard with an iPod or other .mp3 player. While the playback mechanisms in an iPod are theoretically similar as any other digital-analogue converter, the headphones themselves may not even be capable of reproducing the transient detail (or frequency response) necessary to hear such subtle differences as we find between common audio codecs.

---

The second question follows logically from the first:
'If technology fidelity is the issue, at what point does it become a problem?'
Can it be assumed that consumers are, on the whole, satisfied with their listening experience? One may think that if not, they would upgrade. Is that a necessary truth? Perhaps not, as information on the topic is rife with advertising which may distort the facts in order to sell more headphones. This creates one of the objectives of the Wiki side of the project.

However, if the limitation of a listener's technology prevents them from hearing such subtle drops in quality, then arguably is that not a benefit rather than a flaw? Is it an issue similar to dithering, where we choose the lesser evil of a wide-band (or rather, generalised) set of flaws in place of one more obvious fault? This is a very difficult question to answer.

---

The third question arises directly from the preliminary listening test results. A significant drop in quality was perceived between 128kbps and 112kbps, to the point where this margin could happily be defined as the absolute cut-off point for quality perception. I would hypothesise that the absolute majority of music end-consumers will be able to perceive this drop - most certainly from CD quality to 112kbps.
Therefore, the question:
'Considering that human hearing perceives volume changes in a logarithmic fashion (Fletcher & Munson, 1933), could this also apply to perception of quality?' Is there some tolerance we have for fidelity loss, but for some reason beyond this bracket then perceived quality drops off dramatically?

It is unlikely that this project will be able to adequately address these questions with the data it collects. However it may provide some insight into trends within these areas. For example:
The online survey will include a section asking the participant to rank themselves for audio knowledge on a scale, from occasional listener to years-veteran professional. Another section will ask the participant to list their listening equipment to the best of their knowledge.

This will, provided the participants volunteer this information honestly, provide a large amount of supporting data from which trends could be established. For example, is there a more significant correlation between self-rated audio knowledge to accurate quality perception, or between the technology itself and the perceived quality? This uncontrolled listening environment will be one of the unavoidable flaws of the online test, but this will intentionally be used as its principle strength. The advantage will be to capture the listening experience users have every day of their lives, as opposed to in a clinical, high-pressure test environment.

Additionally, the test will run from a supposedly obvious comparison to the supposedly impossible. E.G. At the highest level, the difference between two consecutive bitrates in the middle of the scale, switched between A and B multiple times within a single audio file, asking the participant to correctly list the sequence.

---

The Controlled Test
As discussed, the variety of listening apparatus and environments for the average consumer is of a diversity approaching the infinite. All the possible combinations of listening space, source material, user experience, Analogue/Digital Converters, transducers in speakers/headphones, external noise, listening level etc make accurately determining the human element impossible in this test format.
Therefore the necessity for a second test is clear.

This second test will take place in a controlled (calibrated) listening environment - most likely the HB04 Mastering Studio at Confetti ICT in Nottingham. Here, a set (or several sets) of participants of distinct experience levels of experience in audio will listen to a similar set of tests and answer similar questions to the online survey. The difference being, of course, that the technology variable is now fixed and controlled, reducing the relative technological bias between participants to zero.

It is hoped that the combined results from both the online and controlled tests will provide an insight into the effects of human perception vs. technology limits, and define the common points at which marked and obvious perceived quality changes occur for each demographic.

---


Reference:

Fletcher, H. & Munson, W. A., 1933. Loudness, its definition, measurement and calculation. Journal of the Acoustic Society of America, 12(4), pp. 377-430.

No comments:

Post a Comment