Since the early 2000s, the audio industry has used Cat5 and Ethernet networks for audio transport (beginning with, among others, EtherSound and CobraNet). The convenience of using the network to transport digital audio was compelling, with lower costs due to the use of commercial off-the-shelf (COTS) switches and very-low-cost cabling. Many users had adopted audio-over-IP (AoIP) transport between or within the studio/venues. But, at some point in the early 2010s, many different, non-interoperable ecosystems had imposed themselves in their own market. The industry needed a standard to allow easy communication between those ecosystems, while maintaining high performance.
The key word here is “interoperability,” because the network is an open system by nature, where standardization and interoperability are paramount. Now, everyone is used to plugging in devices on a network, and we expect these devices to work with one another, at least on a basic communication level. This is the motivation behind the development of the AES67 standard. AES67 does not seek to replace existing ecosystems, but, rather, to enable smooth audio exchange between them, with as little performance penalty as possible.
In 2010, manufacturers of networked audio equipment and some of their users met to form the AES SC-02-12 standards working group. Through diligent effort, the AES67 standard was published in September 2013. It was revised in 2015, and a new revision has just been published for comment.
What Is AES67 Made Of?
AES67 leverages several existing technology standards (called request for comments [RFCs] in the internet world) to allow low-latency and sample-accurate transport on the network. Four of these standards are critical to obtaining interoperability and high network performance.
- RTP: The Real-time Transport Protocol was designed to transport data as quickly as possible on a network, while also conveying time and sequence information, enabling the receiving device to detect out-of-order or missing packets.
- PTPv2 (IEEE-1588 2008): The Precision Time Protocol is used to share a common clock on the network. Therefore, with PTP, samples are referenced not only by sequence number and a sampling rate, but also by absolute time, allowing transport delay correction and phase alignment.
- QoS: Differentiated services and quality of service (QoS) give the highest priority on the network to clock and media packets. QoS guarantees delivery of critical traffic, even if the network is congested by other traffic. In AES67, clock traffic will always have the highest priority, ensuring the highest clock accuracy.
- SDP: The Session Description Protocol is used to share the stream information (destination address, payload format, etc.) between the sending and receiving devices.
AES67 can handle both multicast (one sender talking to multiple receivers) and unicast (one sender to one receiver) streams. Multicast is typically used on small or managed networks, and it allows for maximum bandwidth efficiency. When using unicast, AES67 mandates a specific connection protocol, Session Initiation Protocol (SIP), which is widely used in telephony and remote contribution.
Although AES67 can accommodate multiple sampling rates, channel numbers, resolutions and packet sizes (see Table 1), the main exchange format (aka pivot format) is 48kHz, two channels, 24-bit resolution in streams of 1ms packets (48 samples). If a device claims to be AES67 compliant, it must be able to receive pivot streams.
AES67 only cares about audio transport on the network. That means transporting audio from the sender to the receiver with the best performance and making sure they can both talk to each other. It does not address the problems of standardized device control or connection management. These are the purview of AES70 or NMOS, both of which natively handle AES67 connection management.
Measuring Interoperability
How do you actually measure inter-operability? The Audio Engineering Society uses interoperability events (aka Plugfests) to assess the validity and adoption of the AES67 standard. Four of these Plugfests have been held to date (2014 at IRT, Munich, Germany; 2015 at NPR, Washington DC; 2017 at BBC, London, England; and 2018 at FOX, The Woodlands TX). Interoperability requires cooperation and openness, so the events are conducted at the engineering level, with a prohibition on reporting individual results. Attendance and participation have increased with each event, and interoperability steadily improved as participants corrected implementation bugs and gained experience with the standard. Plugfests are also the perfect place to experiment and improve the standard so that new companies implementing it do not have to encounter the same issues. The AES releases a report after each Plugfest so that interested parties can learn what issues were discovered at each one, and the solutions developed.
Industry adoption of AES67 has been rapid and wide, leading to its use as the basis for audio transport in the SMPTE ST 2110-30 standard. ST2110-30 and AES67 are so close that the most recent Plugfest was a joint ST2110 and AES67 event. It was a tremendous success, achieving a very high audio interoperability, with more than 100 devices and 60 different manufacturers testing their products on the same network.
Industry-based public interoperability events, called “IP Showcases,” are also regularly presented at major trade shows, allowing prospective users to learn and see the state of interoperability and the flexibility of open network standards. And, of course, AES67 is a key element of these.
What Comes Next?
All major ecosystems (Dante, Livewire, Q-SYS, Ravenna, Wheatstone) are now providing some degree of AES67 interoperability, and all are actively participating in AES67 interoperability events. That directly benefits the end user, because he or she can choose a device from almost any vendor of IP-based audio and connect it to his or her network using AES67.
With its wide adoption and its use as the basis for other standards, AES67 is truly a stepping stone into an open audio-networking experience. There is no doubt that this stone will be the foundation for audio in the future.