### If I've not heard from you by Friday, you're taking the test in 50 minutes, in this classroom - Last minute accommodations requests won't be able to be accommodated!
--- # Capturing and Storing Sound ### Will Styler - LIGN 113 --- ### Today's Plan - Capturing Pressure variations - Computer Audio, Sampling, and Quantization - Audio Compression - Noise Reduction --- ### Sound is compression and rarefaction in a medium
--- ### Timeshifted sound is a novelty - For most of our species history, this wasn't a thing - *How do we capture and recreate the pattern of sound pressure?* --- ### Analog Recording - "Let's capture the pressure pattern in a physical medium" --- ### The Phonograph - Air pressure pushes a stylus into very soft wax cylinder
--- ### Playback from Phonographs - Put a stylus on a membrane into the groove, and let it 'trace the wave'
--- ### These recordings are ephemeral and bad - The stylus wears away the groove - The power of the air pressure limited the strength of the medium 'The Lost Chord' by Arthur Sullivan (1888)
--- ### There's an inherent tradeoff - You want a soft medium for capture - ... and a hard medium for playback - Air pressure only provides so much power ---
--- ### Electric Recording fixes this! - Electrical signals are easy to amplify - ... and easier to store --- ### Microphones - A Microphone *transduces air pressure patterns into electrical patterns* - 'Give me a pattern of voltage that matches the pattern of compression and rarefaction'
--- ### Dynamic Microphones - Air pressure pushes a membrane, moving a coil of wire around a magnet, inducing voltage - Durable, but less sensitive
--- ### There are many other kinds of microphones! - We can nerd out about this in office hours! --- ### Now you have sound as a voltage on an electrical line - You can amplify it, transmit it, modify it and store it - You can even recreate the air pressure movements --- ### Speakers - Dynamic microphones in reverse - Changes in voltage move a membrane attached to a coil - This 'kicks' the air in the desired pattern of compression
--- ### There are many types of speakers, some are different!
--- ## Any Questions so far? --- ### So, that's how we capture sound - ... and that's how we worked with sound for a good while! --- ### There are other *analog* ways to store sound - Magnetic field strength on tapes - [Optical Audio Storage on Film](https://www.youtube.com/watch?v=tg--L9TKL0I) - ... and more! --- ### But then everything changed
--- ## Computer Audio --- ### Computers don't do waves ![](phonmedia/sampling_raw.jpg) 010001110010101000100101101010101010 --- ### Sound is analog, computers are digital - How do we deal with that? --- ### Quantization - Also known as 'digitization', 'discretization', or 'sampling' - "Let's just measure the sound a LOT and store those values" --- ### Quantization
--- ### Quantization
--- ### Quantization
--- ### Quantization
--- ### Analog-to-digital conversion - Sample the wave many times per second - Record the amplitude at each sample - The resulting wave will faithfully capture the signal --- ### How often do we sample? - This is called the 'Sampling Rate' - Measured in samples per second (Hz) --- ### Sampling Rate
--- ### Sampling Rate (low rate)
--- ### Sampling Rate (low rate)
--- ### Sampling Rate (lower rate)
--- ### Sampling Rate (lower rate)
--- ### Use the right sampling rate
--- ### Bad sampling makes for bad waves
--- ## Nyquist Theorem The highest frequency captured by a sample signal is one half the sampling rate --- ### Sampling Rates (Shpongle - 'Nothing is something worth doing') 44,100 Hz
22,050 Hz
11,025 Hz
6000 Hz
--- ### Sampling Rates (Shpongle - 'Nothing is something worth doing') 44,100 Hz
6000 Hz
3000 Hz
1500 Hz
800 Hz
--- ### Different media use different sampling rates - Radio was historically less than this - CDs are at 44,100 Hz - DVDs are at 48,000 Hz - High-End Audio DVDs are at 96,000 Hz - Some people want 192,000 Hz - Likely they are dolphins --- ### This all means that 'vinyl captures more detail' people are provably wrong - Any audible audio signal can be captured digitally, c.f. the nyquist theorem - We can capture greater bit depth than we can hear - 'More detail' means 'the noise and distortion I appreciate' - **Audiophiles are generally slightly insane** --- ### Capturing the samples into a file gives you uncompressed sound files! - WAV files are effectively large lists of amplitudes, with a sampling rate and channel info at the top - This is what programs like Praat or Audacity capture - There are ways to compress *losslessly*, by cleverly saving exactly the same amount of data - But this has limited effectiveness, as you're still stuck with all the data --- ### ... but what if you need your sound to take up less space - You're trying to store a bunch of sounds in a limited space - You're trying to save bandwidth costs when sending sound or music - You need to allow people with slow internet to talk synchronously by voice - You want to *encrypt* the signal so that others can't hear it without a key - **You want to send something smaller than large lists of samples!** --- ## Lossy Audio Compression
--- ### Lossy Compression - Lossy compression asks "What can I throw away to make the file smaller while keeping the human from noticing?" - Lossy compression *is tuned to human perception*! - This contrasts with *lossless* compression, which keeps all the data - We use 'codec' to refer to methods for encoding/decoding (**code-dec**ode) audio --- ### Lossy codecs are everywhere - mp3 is the most well known lossy codec - AAC/M4A is Apple's version - Your cell phone uses EVS, EVRC, AMR, or GSM - This one of the reasons old phones need to be changed - It's also why hold music sounds like garbage --- ### Lossy codecs are everywhere - Bluetooth has its own compression codecs - Zoom uses the Opus codec - Free and open format, hooray! --- ### Lossy Compression throws away information strategically - Using things like Discrete Cosine Transform and LPC - Also uses psychoacoustic knowledge - "The human won't be able to hear this part anyways" - More on this in a few weeks! - "Let's throw away or simplify the stuff that doesn't matter as much to the human!" --- ### It's a lot like image compression! ---
---
---
---
---
---
---
--- ### Here's what it looks like when you make it lossless again ---
--- ### You can choose how much to compress the sounds! - The *Bitrate* dictates how many bits are required to capture a second of audio - The unit is 'kbps', Kilobits per second - 'Variable Bitrate' (VBR) is the same idea, but adapts well to varied complexity - Lower bitrate means more compression, but more data loss - This is independent of bit depth - Bit depth is the precision of the measurement you store for each sample --- ### Sound Compression (Again, Shpongle 'Nothing is something worth doing') Uncompressed WAV
320kbps mp3
192kbps mp3
128kbps mp3
--- ### Sound Compression (Again, Shpongle 'Nothing is something worth doing') Uncompressed WAV
64kbps mp3
48kbps mp3
32kbps mp3
8kbps mp3
--- ### Lossy compression makes decisions! - Lossy compression throws away data irreversibly using a specified algorithm - These codecs were tuned for a data type and language - [mp3 was developed for Suzanne Vega's "Tom's Diner"](https://observer.com/2008/09/suzanne-vega-is-the-mother-of-the-mp3/) - Opus is meant for speech and makes decisions based on contributors' languages - **Saving or collecting your data with lossy compression changes it irrecoverably!** --- ## 'Noise Reduction' --- ### The World is Noisy - Non-speech noise - Room echo and feedback - Typing and mouse clicks - Background clatter - **Hearing aids want to send the voice, not the noise!** --- ### Filtering - Notch filtering out specific areas or whines - Detecting constant noise and subtracting it - Using multiple mics to remove non-target speech - Using phase cancellation to remove external noise --- ### Noise Reduction Algorithms - Discord, Zoom, Skype, and phones use speech tuned 'noise reduction' methods - These are increasingly neural-network-based filters - 'Noise Reduction' algorithms are usually trained on language data - They can adversely affect classes of phones found in languages outside of the training data - "That sound isn't found in the language I learned about, so it's noise!" --- ### Why do we give a damn in LIGN 113?!? - Speakers and Microphones are an important part of audio-focused disciplines - All hearing technology is based on quantized data these days - Compression is found in most of the digital audio we hear - Filtering and noise reduction are important steps in CIs and Hearing Aids - It's really, really cool ---
Thank you!