To most of us, it’s a complete mystery as to how wireless Earphones or headphones (earbuds) work.
With wired headphones, it makes sense that the electricity flows from your smartphone, through the plug, up the tangle of wires, and to the headphones.
But with wireless headphones, how does the audio of your favorite music or podcast get transmitted from your smartphone, through the air, and into these wireless earbuds?
Well, in this article we’re going to answer a segment of that question. wireless headphones are incredibly complex and inside these tiny plastic earbuds, there are 9 distinct technologies that we’re going to explore.
These technologies are
- Speaker
- Audio codecs
- Bluetooth
- System on a Chip or SoC
- Printed Circuit board or PCB
- Accelerometers
- Lithium-Ion Battery
- MEMS microphones
- Noise cancellation
Each of these topics is rather involved, and thus an episode is dedicated to each technology, rather than fit all 9 technologies into a single, feature-length movie.
Here, as you may have figured from this title and thumbnail, we’re going to focus on the audio codecs, and we’ll explore how sound waves can be represented digitally using just a bunch of numbers. This video will be broken up into 4 parts.
->First, we’ll break open these Apple AirPods 2 and explore the different components and where they’re located.
->Second, we’ll give you a conceptual overview of how these earbuds work.
->Then we’ll explore the basics of how the audio, or the sound waves of your favorite song, can be represented as digital information using numbers alone.
->Finally, we’ll provide you with a set of increasingly complicated details that will fill out this explanation, along with a brief discussion of audio file formats.
So, let’s begin. There are many different types of wireless headphones and earbuds, but they all use the same basic principles and techniques.
Here’s a pair of Apple AirPods 2 we decided to tear open and use as our example. One thing to note is that opening up these AirPods is much more difficult and destructive than we show in these animations.
There’s a lot of glue inside, and if you try this at home with your earbuds, they probably won’t work afterward. We did it so you don’t have to.
Now, let’s take a look inside. Underneath the outer plastic earpiece and mesh dust cover, we have a rubber protective shell, along with an optical sensor.
Below these sets of covers are the speaker that generates the sound, with its 4 key components:
1. Diaphragm
2. suspension or spider
3. voice coil
4. magnet
Behind the speaker, we find some insanely complex circuitry folded into a teeny tiny package.
Must Read: Working Of Smartphone Speakers
Let’s pull out this circuitry and see what makes the earbuds tick. This circuitry is essentially three separate printed circuit boards neatly folded into the earbud, with flexible wires connecting them to make a single PCB.
On this top board, which is glued to the backside of the speaker, we find two points where the speaker connects, along with a larger microchip, which handles Bluetooth connectivity and decodes the compressed audio stream sent from the smartphone.
We also have a set of accelerometers, along with a programmable SoC.
This circuit board is connected to and folded on top of a second circuit board which contains a low power stereo audio processing chip or audio codec.
Also connected to this circuitry is a flat cable for the antenna which lies adjacent to the battery, a microphone situated at the back of the earbud which is used for noise reduction, and a third circuit board which is connected to the battery.
Down below the battery is an additional small circuit board that holds the main MEMS microphone which is about the size of a grain of rice, and below that is a mesh dust cover, along with the contacts for charging the earbuds.
It’s truly an impressive amount of engineering that goes into making these earbuds so lightweight and so incredibly small.
Now that we’ve seen everything that’s packed inside these earbuds, let’s talk about how they work.
Working Of Wireless Earphones or Headphones:
When you turn on your already paired wireless headphones near your smartphone, a Bluetooth communication channel is established in order to send information back and forth.
As soon as you start playing music or a podcast, your smartphone grabs the audio data from its flash storage chip, decompresses the audio, and stores it in your phone’s working memory.
This audio is represented digitally as a long set of numbers, and in order to send it to your earbuds, your smartphone compresses and divides the information into packets according to Bluetooth specifications.
Next, your smartphone converts these packets into electromagnetic waves or photons and sends the data to the earbuds over the Bluetooth connection.
The earbuds receive the data and disassemble and decompress the packets back into long sets of values.
These values are then sent to the audio codec, which converts the digital values into an analog electrical waveform.
This waveform is then sent to the voice coil, which is attached to the back of the diaphragm. The voice coil moves back and forth depending on the given waveform, thus moving the diaphragm which in turn creates pressure waves in the air.
These pressure waves are sensed by your ear and interpreted as sound by your brain.
But wait, what’s a codec? Well, codec stands for coding and decoding, and in general, it’s either a piece of software or hardware that converts data or information from one format into another format thereby compressing or decompressing data.
In the scenario we just talked about, the audio codec converts the music or podcast data from a set of digital values or numbers, into an analog waveform.
This is the process of decoding the audio file. In this scenario, the codec performs a digital to analog conversion or DAC.
However, audio codecs can also do the reverse by encoding the analog signal from the microphone, into a digital set of values, which is an analog to digital conversion or ADC.
Codecs are used in every piece of technology we use in order to convert certain types of data into other types while compressing or decompressing the data.
In fact, any video you’re watching is downloaded as a compressed video file, and a video codec is actively decompressing that video’s data as you watch it.
Now that we have a conceptual overview, let’s take a look at how an analog sound waveform can be turned into and represented by 1’s and 0’s.
Here’s an analog audio waveform. It has some peaks and troughs, it’s incredibly detailed, and, depending on the length of the audio file, it can get pretty long in duration.
So, how do you turn this audio waveform into a digital long list of numbers?
You might think that it involves some crazy mathematics with a ton of sines and cosines, along with multivariable equations, but it’s actually a lot less complicated than that.
Rather, the analog audio waveform is placed on a graph, and all the values that it passes through at a set time interval are placed into a list of values.
That’s it, the digital version of the audio waveform is just a long list of values or points that the waveform passes through, and in this scenario, we’re going to have 23 microseconds between each data point.
When the digital information, or the long list of numbers, is fed into the audio codec, the audio codec plots all the points on the graph, connect the dots and smooths the line between the points, and sends the analog waveform to the speaker which generates sound.
In fact, if you were to open up an audio file in some audio editing software and zoom in, you would see all the points that constitute the audio.
A music file or podcast’s data isn’t an analog waveform like this, but rather it’s just a long set of points equally spaced apart with their associated values.
The process of turning an analog waveform into a set of numbers is called digitization, or analog to digital conversion. That’s the basic concept.
Next, we’ll add on a few more details, and then further, more complex details.
Let’s move on and dive into the details of how sound is represented digitally.
There are two key aspects that need to be addressed regarding this system.
First, on this graph, the X-axis is time, and as we mentioned, every data point or sample of the analog audio waveform is 23 microseconds apart.
If we wanted, the spacing could be smaller, at, let’s say 1 microsecond between values, or samples, which would yield a million samples every second, or a million-hertz sampling rate, and would result in audio files that would be over a hundred megabytes for 60 seconds of audio.
So, why 23 microseconds? Well, the short answer is that the spacing between each data point depends directly on the average human ear’s ability to perceive sound.
The human ear can hear sounds up to around 20kilohertz, or one wave every 50 microseconds. If the waves were closer together like this, the sound would have a higher frequency which humans wouldn’t be able to hear.
Scientists and engineers decided to not really concern themselves with frequencies that humans can’t hear.
So, in order to capture a waveform with a maximum 20-kilohertz frequency, two data points are required per full-wave, and thus they use one data point every 23 microseconds, which is a rate of 44.1 kilohertz, or 44 thousand 1 hundred samples every second.
Note that this sampling rate or sampling frequency of 44.1 kilohertz is the most common rate for recorded audio such as music and podcasts, and 48 kilohertz is the second most common sampling rate.
But why is the number slightly more than double the frequency of human hearing? Well, you can find those specifics along with details as to why music played over the telephone always sounds terrible, and finer points on the Nyquist Theorem and aliasing in the creator’s comments.
Also, we would greatly appreciate it if you could take a quick second to tell us what you think about this article in the comments below.
Knowing whether you find a section confusing, boring, really interesting, whether it has great graphics, or whatever, is extremely useful, and it helps us to improve on future content.
Let’s get back to our long list of digital values that represent the analog waveform.
Our next question is, how do we represent these values in binary?
Or, essentially, how many 1’s or 0’s are we going to use for each sample?
Let’s try representing each sample by using a single bit, either 1, or 0. To do that, let’s take the original waveform, assign each sample either a 1 or a 0, and here we have the resulting digital data.
But how accurate is our analog to digital conversion? To check, we reassemble the graph using these values, 1’s up here, 0’s down here, and smooth a line between the points.
Now we have an analog waveform created from the digital data, which was created from the original audio waveform, and… this recreation looks nothing like the original audio, and thus 1 bit isn’t good enough.
So, let’s say we use two bits, which means each sample could be one of 4 different values.
Let’s take the original audio, round each value to the closest 2 bits equivalent, and here we have the long set of values.
When recreating the audio, we would again plot all the points on the graph, smooth out the line but… it still looks pretty bad as it doesn’t really match the original waveform.
Really, the question is, what’s the optimal number of values in the vertical axis needed to accurately represent the original audio waveform?
And the answer is that it varies, but an audio CD, for example, uses 16 bits, for every single sample.
With 16 bits that means that there are 2 to the 16 or 65536 different values along the Y-axis, or using technical jargon, we say our audio file has a bit depth of 16 bits.
The process of turning an analog signal into a set of values is called quantization and assigning bits to each value is called pulse code modulation.
Furthermore, an audio bit depth of 16 bits is pretty common, however higher quality audio files use 24 or 32 bits per sample or higher.
Okay, so if you came to this video wondering about MP3, AAC, WAV, FLAC, or other audio file formats, we’ll briefly talk about them here.
Digital audio data, which is this long list of 16-bit values at 44.1kilohertz, is uncompressed and takes up a lot of space at around 10 and a half megabytes per 60 seconds of 2 channels of audio.
MP3 files at 320kbps stereo, reduce the file size to around 2.4megabytes by processing every millisecond of audio, and finding elements in the uncompressed audio that humans aren’t good at hearing.
The psychoacoustic algorithm finds elements in the audio that have exceptionally low volume sounds, very high-pitched sounds, or sounds very close together which the mp3 compression discards, thereby saving space and making it a lossy compression format.
Lossless compression formats such as .alac or .flac don’t discard any data but rather compress the data by finding patterns, or redundant data, and representing those patterns more efficiently than when the audio is uncompressed.
But file compression is a complex topic and we’re planning entire videos dedicated to just that topic.
As mentioned in the intro, there’s a lot of technology that goes into these wireless earbuds. Always remember to consider the conceptual simplicity, yet structural complexity in the world around you.
Hey, if you want to give any suggestions to the Tech Geek, then comment down below.
Disclaimer: Tech Geek is an Affiliate for one or more products listed in this article. If you click a link and complete a purchase we could make a commission.
Check out the best wireless headphones in the present market here