Audio Data Quality Project Blog

Thursday, 1 May 2014

Project Report Writing and Wiki Feedback

The wiki is now complete and submitted as the product segment of this project.

I am currently in the process of writing up the project report which is going well and is nearing completion as I approach the conclusions section.

In order to establish the efficacy of the wiki as a product, I have created another [much shorter] survey which has been running for the past few days: https://www.surveymonkey.com/s/3BXTDPZ

The wiki as it stands is a proof-of-concept experiment, and the feedback from this survey should hopefully help to work out whether or not a wiki of this type is perceived as beneficial and how it would be received, were it to be launched on a larger scale with more individuals working to keep it complete and up to date.

--

Once the project has been completed, marked and returned, I plan to upload the results of the listening tests and indeed the report itself to this blog and the wiki for interested parties to examine and discuss. The focus of this project has been on consumer accessibility and that is a goal which I hope can continue to be relevant and fulfilled once the project has reached its academic conclusion.

Sunday, 13 April 2014

Digital Audio Wiki Progress and Listening Test Results Collation

I have been working on the Digital Audio Wiki, which constitutes the end product for this project. As it stands, the wiki broadly covers the process of recording and creating digital audio, from the basic principles of sound propagation in air up to binary encoding. From these basics, it expands upon the options available in digital (as well as analogue) audio storage and distribution, as well as discussing some of the supporting theory. The central page entitled 'What is Digital Audio?' covers most of this information, but I have begun creating a large number of related pages for more detail on the specific topics it touches on. Thus far I am very pleased with how it has come out. The wiki can be found, in whatever state it happens to be in at the time of reading, here: http://digital-audio.wikia.com/wiki/Digital_Audio_Wiki

In terms of assessment for the product I believe I will attempt to complete the majority of the wiki's core content as soon as possible, and then publish it to the 'public' as best I can, alongside one more survey regarding the wiki's usefulness and clarity. This should aid assessors in evaluating the success of the product as agreed in my proposal.

Additionally I have begun collating the data collected from the listening test study which forms the other major element to the project. So far I have examined the data from the controlled listening tests, and the results are largely inconclusive, which is arguably a conclusion in itself. It appears that the majority of participants have serious difficulty telling any difference between most of the comparisons made in the test. This is an interesting (and actually, quite positive) result, although it differs from my personal expectations in some ways. If the online version of the listening test (which has gathered substantially more responses, while still being too few to be statistically significant) indicates similar trends then I will be satisfied to report that any concerns regarding perceived nefarious or negative effects of widespread digital audio data compression are largely without cause. However, this potential outcome still leaves plenty to be discussed.

Thursday, 27 March 2014

Technical Information Regarding Testing Protocol

The online survey has a [very] general upward difficulty curve. All source materials are verified as CD-quality [16-bit, 44.1kHz] or above. Specifically, Questions 13-19 make use of EBU SQAM (Subjective Quality Assessment Materials) are losslessly encoded and sourced online from the EBU website: https://tech.ebu.ch/publications/sqamcd

The questions in the survey are variously comprised of the original source files at CD quality and a number of different compressed versions. These range from AIFF 1411 kbps and .mp3 320kbps down to .mp3 112kbps. Each of these files have been individually tested in a professional mastering environment and in a non-blind A/B test an audible difference was found to be present throughout the .mp3 range on a number of listening systems. However, this survey and its accompanying controlled test (performed at Confetti ICT in Nottingham, UK) aim to establish in a blind scenario whether participants from various demographics can in fact perceive this difference. The codecs used are split between Magix' Samplitude .mp3 converters and iTunes .mp3 and AAC converters. The test includes some control questions, where two identical files are used. Questions bounced from Samplitude to PCM .wav.

The Listening Tests and Preliminary Results

There are two distinct listening tests. The first will take the form of a SurveyMonkey online survey, with a download link to the selected track versions. The purpose of this test is to establish a baseline for 'average' end-users of music as well as professionals within their usual listening environment. The second will be a controlled test in a calibrated listening environment with a similar protocol.

After preliminary testing it is clear that there are several brackets of differences between bitrates. The drop in information from a CD-quality 16-bit/44.1kHz PCM file to even the top-tier 320kbps .mp3 is audible through quality professional equipment. However this discrepancy is extremely subtle - it is perceived as a very slight 'boxiness' of the vocal range and 8-10kHz transients. The attack of the sound is in some way distorted.

This is not surprising considering that CD quality is 1,411.2kbps; a drop to 320 is a significant change. However this perceived change is so slight, that the issue of perception is revealed to be affected by technology almost as much as the human ear.

For example, a test was performed with an A/B (not blind) between compressed versions of the same track, at the following bitrates:
-CD-Quality, 16-Bit 44.1kHz .wav PCM - 1,411.2kbps
-Magix Samplitude .mp3 files at:
->320kbps
->256kbps
->192kbps
->160kbps
->112kbps
->96kbps
->48kbps (mono)
-iTunes .mp3 files at:
->192kbps
->160kbps
->128kbps
-iTunes AAC files at:
->256kbps
->128kbps

The result was that a slight change, as described above, was perceived between CD and 320kbps .mp3. However, this change was only audible on the main studio monitors, which excel in transient reproduction. The secondary monitors, NAD 8020e, rendered this subtle 8-10kHz dynamic change inaudible. So the change was obviously there, but perhaps a majority of consumers would not possess the equipment necessary to hear it. As expected, the changes became more obvious as the bitrate of the lower-quality file decreased.

This raises the first emergent question for the project which would be difficult to answer conclusively:
'Is the fidelity of the sound reproduction technology used as important an issue as human listening ability?'

This is entirely possible. Consider the following:
A large number of music consumers listen to their music on headphones which come as standard with an iPod or other .mp3 player. While the playback mechanisms in an iPod are theoretically similar as any other digital-analogue converter, the headphones themselves may not even be capable of reproducing the transient detail (or frequency response) necessary to hear such subtle differences as we find between common audio codecs.

---

The second question follows logically from the first:
'If technology fidelity is the issue, at what point does it become a problem?'
Can it be assumed that consumers are, on the whole, satisfied with their listening experience? One may think that if not, they would upgrade. Is that a necessary truth? Perhaps not, as information on the topic is rife with advertising which may distort the facts in order to sell more headphones. This creates one of the objectives of the Wiki side of the project.

However, if the limitation of a listener's technology prevents them from hearing such subtle drops in quality, then arguably is that not a benefit rather than a flaw? Is it an issue similar to dithering, where we choose the lesser evil of a wide-band (or rather, generalised) set of flaws in place of one more obvious fault? This is a very difficult question to answer.

---

The third question arises directly from the preliminary listening test results. A significant drop in quality was perceived between 128kbps and 112kbps, to the point where this margin could happily be defined as the absolute cut-off point for quality perception. I would hypothesise that the absolute majority of music end-consumers will be able to perceive this drop - most certainly from CD quality to 112kbps.
Therefore, the question:
'Considering that human hearing perceives volume changes in a logarithmic fashion (Fletcher & Munson, 1933), could this also apply to perception of quality?' Is there some tolerance we have for fidelity loss, but for some reason beyond this bracket then perceived quality drops off dramatically?

It is unlikely that this project will be able to adequately address these questions with the data it collects. However it may provide some insight into trends within these areas. For example:
The online survey will include a section asking the participant to rank themselves for audio knowledge on a scale, from occasional listener to years-veteran professional. Another section will ask the participant to list their listening equipment to the best of their knowledge.

This will, provided the participants volunteer this information honestly, provide a large amount of supporting data from which trends could be established. For example, is there a more significant correlation between self-rated audio knowledge to accurate quality perception, or between the technology itself and the perceived quality? This uncontrolled listening environment will be one of the unavoidable flaws of the online test, but this will intentionally be used as its principle strength. The advantage will be to capture the listening experience users have every day of their lives, as opposed to in a clinical, high-pressure test environment.

Additionally, the test will run from a supposedly obvious comparison to the supposedly impossible. E.G. At the highest level, the difference between two consecutive bitrates in the middle of the scale, switched between A and B multiple times within a single audio file, asking the participant to correctly list the sequence.

---

The Controlled Test
As discussed, the variety of listening apparatus and environments for the average consumer is of a diversity approaching the infinite. All the possible combinations of listening space, source material, user experience, Analogue/Digital Converters, transducers in speakers/headphones, external noise, listening level etc make accurately determining the human element impossible in this test format.
Therefore the necessity for a second test is clear.

This second test will take place in a controlled (calibrated) listening environment - most likely the HB04 Mastering Studio at Confetti ICT in Nottingham. Here, a set (or several sets) of participants of distinct experience levels of experience in audio will listen to a similar set of tests and answer similar questions to the online survey. The difference being, of course, that the technology variable is now fixed and controlled, reducing the relative technological bias between participants to zero.

It is hoped that the combined results from both the online and controlled tests will provide an insight into the effects of human perception vs. technology limits, and define the common points at which marked and obvious perceived quality changes occur for each demographic.

---

Reference:

Fletcher, H. & Munson, W. A., 1933. Loudness, its definition, measurement and calculation. Journal of the Acoustic Society of America, 12(4), pp. 377-430.

Saturday, 8 March 2014

Project Introduction

Audio Data Compression Quality Project

This project aims to be open to everyone for its duration. My name is Chris, and I wholly encourage any and all contributions that my friends and colleagues, as well as any member of the wider public, cares to make to this endeavour. Firstly, to this blog - if I am correct, then anyone should be able to leave comments on posts on this blog anonymously with no necessary registration, or with an existing Google account. Therefore any questions, suggestions, or discussions regarding the project's topic are gratefully received as they will contribute to the completeness of the work.

The topic of the project is straightforward on the surface, but complex in the detail. The generalised, primary research question I am investigating is:
"Does the current trend in audio data compression codecs point towards a lossless standard in the near future?"

In order to understand and tackle this question, it is first necessary to establish some definitions, especially considering that this project is concerned in equal measure with the everyday consumer of music as a product, as well as with established professionals in the audio industries.

A codec is defined as 'a device or program that compresses data to enable faster transmission and decompresses received data.' Currently, the most popular and well-known example of a codec is most likely MPEG-Layer 3, or mp3. Other examples within the audio sphere include AAC, AIFF and FLAC, to name a few. FLAC is the odd one out of this list, as it is the only codec in that list which describes itself as 'Lossless.' In fact, according to FLAC's developers at the website https://xiph.org/flac/ :
"FLAC stands for Free Lossless Audio Codec, an audio format similar to MP3, but lossless, meaning that audio is compressed in FLAC without any loss in quality."

The advantage of FLAC as a format should be immediately obvious, based on that definition. What hypothetical person would knowingly choose a higher quality over a lower one, for any product? The answer is of course that the situation is not as simple as it might seem. For example, a lossless codec necessarily produces a larger file, and so as a result storage space and internet bandwidth (for transmission) become limiting factors. An entire collection of your favourite albums all in FLAC format may take up considerable room on a hard drive. As an attempt to solve this issue (particularly serious in an era where digital audio was just emerging from a world of analogue tape storage, with extremely limited digital memory available (remember floppy disks?)), lossy codecs were created.

To clarify: the term 'Lossless' means that during the compression process (in which the data making up a given audio file is trimmed down and reduced in one way or another in order to make the resultant file smaller, as its stated purpose) as little data as possible is 'lost.' Therefore the logical opposite of this type of codec is one to be described as 'Lossy.'

The term lossy applies to every type of codec which reduces the number of actual binary bits which are eventually played back on the user's computer, iPod, hi-fi system or other sound device compared to the original copy. This includes .mp3 and all of the above. Naturally when data is removed about a sound, the sound becomes...less than it was. In some cases, this change is audible. The first part of this project intends to shed some light on exactly what parameters make that change audible to the consumer and to the professional.

The hypothesis: The majority of end-consumers will be able to perceive a difference between lossless (FLAC) and heavily compressed non-VBR .mp3 (E.G. 128kbps) but will be unable to point to exactly what the difference is. Not knowing that, they will not be concerned and would not go out of their way to use lossless codecs such as FLAC instead.

In order to provide some [hopefully] useful information to music consumers who wish to know more about the topic, part of this project will involve the creation of a Wiki-style online resource with information about different compression options and the implications of each.

The course of this project manifests in two distinct stages: firstly, there is a need to evaluate the ability (or, indeed, the necessity) of a music consumer to perceive the difference between two given compression formats of a different audio quality. The second stage is dependent (to a degree) on the first, and involves the application of web-based information souces to communicate more information about the topic to said consumers. The decision-making process for how this second stage can be approached roughly follows this flow:

The first step is to find out whether this difference can be heard. If it can, that is no guarantee that the difference is worth paying for, and that is the deciding factor to most music consumers. Likewise it is unlikely that as many non-professional music listeners are able to pick out the same changes in a sound because A) They do not have the same experience of critical listening and B) They are unaware of the processes involved. Neither of these points are implied to be negative, it is however a discrepancy which this project will account for and attempt to remedy if, indeed, such action is warranted or desired in general.

A secondary (but pre-requisite) research question must then be to ascertain whether there is in fact a need for consumer-facing lossless audio codecs. If the difference is found to be simply negligible, then that is a satisfactory conclusion to the project. That is not to say however, that the value of a wiki resource on the topic would necessarily be diminished, and indeed such a conclusion would, in itself, raise new questions regarding psychoacoustics and, sociologically, consumer habits.

The wiki should be arranged and produced in a way which is useful to a user at any level of knowledge. That is, segregating information into tiers and pages centred around what the user already knows. More complex concepts will be explained in easily circumnavigated sub-sections, and an extensive glossary will be necessary linking to supporting information elsewhere on the web.

---

The purpose of this blog is to keep any interested parties following the project informed of its progress, and to present a chronological (if at times retrospective) log of activities and thought patterns for assessment.

The blog will cover some complex topics regarding digital audio theory, but these sections can be elaborated upon as necessary and upon request for anyone interested in learning more. Perhaps an apt objective for the emergent Wiki would be to ensure that all the topics covered in this blog could be entirely understood by somebody with no prior knowledge of digital audio technologies through its use.

The next blog post will cover some of the details of the listening tests to be performed, the first of which - a controlled test in a calibrated listening environment - will take place on Tuesday 11th March 2014 in the mastering studio of Confetti ICT in Nottingham. If any parties reading this would be interested in attending this session then please do not hesitate to get in contact, though some participants have already been sourced. Please also be aware that there will be an online version of the test to be performed in the participant's everyday listening environment, and the potential for repeat controlled tests in the future as time allows.

Thank you for reading, do leave a comment with your thoughts.

Chris