Comprehensive Analysis of USB Audio: Core Technologies and Diverse Applications¶

The Foundation of USB: The Cornerstone of Connection and Data Transmission¶

In the surging tide of technological advancements, USB, the Universal Serial Bus, shines like a brilliant star, high above the vast sky of personal computing for over a decade. Its extensive connectivity is like an invisible web, easily linking microphones, speakers, external drives, network cameras, and various other peripheral devices.

This article focuses on USB audio—this digital audio standard that sparkles in the realms of personal computers, smartphones, and tablets. It acts as a bridge, seamlessly connecting speakers, microphones, and mixers, opening a new chapter in audio transmission.

Let’s first explore the foundation of USB. USB operates according to a specific protocol, much like a meticulously organized symphony, where the host computer plays the role of the conductor, issuing transmission commands to devices like USB speakers. Each transmission is directed precisely at a specific device and endpoint, like an arrow hitting its target. There are several types of data transmission: Bulk Transfer, Isochronous Transfer, Interrupt Transfer, and Control Transfer.

Bulk Transfer¶

Bulk Transfer is akin to a reliable messenger, focused on securely transmitting data between the host and device. All USB transmissions are safeguarded by CRC (Cyclic Redundancy Check), much like a loyal sentinel, constantly ensuring the integrity of data. In Bulk Transfer, the receiver must check the CRC, and if it is correct, the transmission is confirmed, indicating no errors. If the CRC is incorrect, the transmission is not confirmed and will be retried. If the device is not ready to receive data, it can send a NAK (Negative Acknowledgment), prompting the host to retry. Bulk Transfer is not subject to strict timing requirements, acting like a flexible dancer navigating around stricter timing transmission modes.

Isochronous Transfer¶

Isochronous Transfer is like a punctual trailblazer, used for real-time data transmission between the host and device. When the host sets an isochronous endpoint, it allocates specific bandwidth and periodically performs input or output transmissions. For example, the host may output 1K bytes of data to the device every 125 milliseconds. However, due to the fixed and limited bandwidth, if a transmission issue arises, there is no time for retransmission. While the data has a valid CRC, the receiver will not request a retransmission in case of error.

Interrupt Transfer¶

Despite its name, Interrupt Transfer is like a diligent patrol officer, where the host periodically polls devices to check for events of interest. For instance, the host may poll an audio device to confirm whether the MUTE button has been pressed. Though it is named "Interrupt," it functions like a periodic polling mechanism rather than a traditional interrupt.

Control Transfer¶

Control Transfer is similar to Bulk Transfer but is non-real-time, designed for operations outside of normal data flows, such as querying device features or endpoint statuses. It is the backstage coordinator for tasks like adjusting device settings. These pre-defined classes, such as "USB Audio Class" or "USB Mass Storage Class," assist in achieving cross-platform interoperability. All transfers are performed in USB frames, with a high-speed USB frame interval of 125μs (1ms for full-speed USB), marked by the host sending a Start of Frame (SOF) message. Isochronous and Interrupt transfers transmit only once per frame.

USB Audio: The Intricate Arrangement of Data Transmission¶

Next, let's dive into the charm of USB Audio. USB Audio uses Isochronous, Interrupt, and Control Transfers to transmit audio data. Isochronous Transfer serves as the high-speed channel for audio data transmission; Interrupt Transfer acts as the guardian of the audio clock, monitoring the availability of the audio clock; Control Transfer serves as the audio settings controller, adjusting parameters like volume and sample rate.

The data requirements vary depending on the number of channels, sample bit depth, and sample rate. Common channel counts include 2 (stereo), 6 (5.1), and even more for studio and DJ setups. Typical sample bit depths are 24 bits, with 16 bits commonly used for standard audio, and 32 bits for high-quality audio. Common sample rates include 44.1, 48, 96, and 192 kHz, with the latter often used for high-quality audio production.

For example, let's design a stereo speaker with a sample rate of 96 kHz and a sample size of 24 bits. To simplify data transmission between the host and the device, the 24-bit value is often padded with zero bytes, resulting in a total data throughput of 96,000 x 2 channels x 4 bytes = 768,000 bytes per second. The Isochronous endpoint operates at a speed of one transmission every 125μs, which means 8000 transmissions per second. Dividing the required bytes per second by the transmission frequency gives the bytes per transmission: 768,000 / 8,000 = 96 bytes per transmission.

Clock Synchronization: The Key to Time Coordination¶

USB clock synchronization is also a critical aspect. In the world of digital audio, it’s like navigating a time labyrinth, where a common time concept must be agreed upon. As mentioned earlier, USB frames are transmitted 8000 times per second, while the speaker plays 96,000 samples per second. Harmony can only be achieved when both the host and the speaker agree on the length of one second.

USB Audio provides three modes to ensure that the host and speaker stay in sync over time. In Synchronous Mode, the host defines the length of a second, sending data at a fixed rate, with the device matching that rate. In Asynchronous Mode, the device defines the second's length, with the host adjusting accordingly. In Adaptive Mode, the data flow determines the clock’s direction.

However, both Adaptive and Synchronous Modes are not flawless, as personal computers are not always stable in maintaining clock synchronization, and external audio sources, such as digital soundcards, can interfere. Asynchronous Mode uses an external clock source or a low-jitter clock inside the device as the master clock. Typically, both modes rely on a crystal-based Phase-Locked Loop (PLL). Thus, the system requires at least two independent clocks: one for driving the USB transfer frequency (8,000 times per second) and another for driving the sample rate (e.g., 96,000Hz).

These clock frequencies differ slightly and may change subtly over time. For example, at a 96,000Hz sample rate, the average number of samples could be 12.001. To ensure that the host sends the right amount of data, the host queries the current sample rate through an Interrupt Transfer. Every few milliseconds, the average sample rate from the previous stage is reported as a fixed-point value. If the final cycle’s average is 12.001 frames, the reported value is 0x000C0041 (65536 * 12.001).

With this average rate, the host can calculate when to send additional samples during transmission. For instance, eight transmissions per second will carry one extra sample. The host can use this value to synchronize with the audio device, ensuring that the video and audio remain in sync. Without this synchronization, the audio will eventually get ahead of the video, and after two hours, the sound will be a second ahead of the image.

To maintain a short feedback loop, it is crucial to avoid unnecessary buffering of audio data packets and feedback packets. Any additional buffering causes delays, making smooth flow difficult. This means the lower-level USB protocol stack and USB audio protocol stack must tightly integrate, without unnecessary buffering. While this is challenging on application processors, it is relatively easy on embedded processors with predictable execution times.

In summary, maintaining consistent time concepts is crucial in the world of digital audio. The three modes of USB Audio—Synchronous, Asynchronous, and Adaptive—each offer unique approaches to synchronize the host and peripheral devices, with Asynchronous Mode being more reliable due to external clock sources.

Complex Devices: The Precise Choice of Multiple Clocks¶

In complex devices like mixers, there may be multiple devices providing sample rates via different interfaces. USB Audio allows designers to deploy clock selectors to choose the clock source for the sample rate from multiple input clocks, such as an S/PDIF-connected clock, a local oscillator, or an ADAT-connected clock. Users can select the clock source through Control Transfers.

Compliance Support: The Smooth Bridge of System Integration¶

In terms of compliance and native support, devices that comply with the USB Audio Class 2.0 standard can seamlessly interface with operating systems, allowing users to control parameters like volume and sample rate through standard operating system dialogs.

The Outstanding Contribution of USB Audio¶

In conclusion, leveraging the powerful USB 2.0's high-speed capabilities, USB Audio Class 2.0 successfully bridges the gap between PCs and audio devices, ensuring low-latency transmission, high throughput, and exceptional audio quality. Its broad applicability ranges from complex mixers to surround sound systems, PC speakers, microphones, and many more, continuously infusing the audio field with vitality and possibilities.