Ambisonic reproduction systems

From The Right Wiki
Jump to navigationJump to search

The design of speaker systems for Ambisonic playback is governed by several constraints:

  • the desired spatial operating range (horizontal-only, hemispherical, full-sphere),
  • the predominant resolution (= Ambisonic order) of the expected program material,
  • the desired localisation performance and size of listening area versus the available number of speakers and amplification channels, and
  • the theoretically optimal distribution of speakers versus the actually available placement and/or rigging options.

This page attempts to discuss the interaction of these constraints and their various trade-offs in theory and practice, as well as perceptional advantages or drawbacks of specific speaker layouts which have been observed in actual deployments.

General considerations

Near-field effect

In its original formulation, Ambisonics assumed plane-wave sources for reproduction, which implies speakers that are infinitely far away. This assumption will lead to a pronounced bass boost for speaker rigs of small diameter, which increases with Ambisonic order. The cause is the very same proximity effect that occurs with directional microphones. Therefore, appropriate near-field compensation (bass equalisation) is beneficial.

Speaker distance vs. angles

This same plane-wave assumption makes it possible to vary the distance of speakers within reasonable limits without upsetting the correct function of the decoder, provided that the difference is compensated with delay, the power is adjusted for uniform loudness at the center, and that per-speaker near-field compensation is used. Distance does not affect the decoder matrix. Variable speaker distance is therefore the most important degree of freedom when deploying idealized layouts in actual rooms. It is constrained by the reverberation of the room which leads to uneven direct-to-reverb ratios between speakers at different distances, and the power handling capability of the most distant speaker. If speakers have to be moved very close, care must be taken to ensure they still cover the entire listening area with reasonably flat frequency response. Speaker angles on the other hand should be adhered to as precisely as possible, unless an optimised irregular decoder can be generated in-the-field.

Horizontal vs. full-sphere accuracy

For horizontal-only content, horizontal systems provide more stable localisation at high frequencies than full-sphere ones, as shown by a simulation of the energy vector rE. Therefore, if occasional horizontal-only reproduction at the highest precision is desired, full-sphere layouts with a dense horizontal ring are preferable.

Phasing

Since multiple speakers will inevitably radiate very highly correlated content, a moving listener may experience a phasing effect that affects the perceived timbre and can upset localisation. Phasing artefacts are most prominent in dry rooms on very precisely calibrated systems. They can be reduced by adding height speakers, which tend to smoothen the effect, or tuned to a subjective minimum by introducing staggered delays to the speakers, with the understanding that this may adversely affect low-frequency localisation if overdone. Phasing problems usually become evident in walk-around environments, and are of less concern for a seated audience, unless the interference pattern is so dense that it is perceived by small head movements.

Loudspeaker occlusion

For multi-listener environments and auditoria, the occlusion of speakers by other listeners must not be under-estimated. Generally, the higher the order and the more physically accurate the reproduction, the more robust it is, up to the point where occlusion produces realistic effects that are consistent with the affected listener's visual perception. For low order systems however, reconstruction can easily fail entirely when line-of-sight to speakers is blocked, which has led to odd seating arrangements in listening tests.[1] With-height systems usually provide more unhindered lines-of-sight per direction for a given audience, which might increase their robustness.

Number of loudspeakers vs. source material resolution

Solvang[2] and others have shown that using much more than the minimally required number of speakers can be detrimental. The reason is simple: more speakers with constant angular resolution means higher crosstalk and thus higher correlation between speakers. If not managed, this leads to stronger comb-filtering effect and phasing artefacts when the listener moves. Therefore, with some decoding techniques, it may be advisable to consider if and how a reasonably regular lower-order decoder that omits some speakers can be fitted into any higher-order system design. For example, the third-order octagon allows for a perfectly regular first-order square using only every other speaker.

Horizontal-only systems

Horizontal-only playback rigs are the most commonly deployed and most extensively researched Ambisonic speaker systems, because they constitute an economic next step after conventional stereo. They can reproduce full-sphere content, but elevated sources will be projected onto the horizontal plane, and sources at zenith and nadir will be reproduced in mono by all available speakers. The literature is rife with horizontal decoders based on the simpler cylindrical harmonics, which do not depend on the elevation angle ϕ. Their use is discouraged, because they wrongly assume cylindrical waves which would require perfect line sources for reproduction. Actual speakers are point sources and will inevitably leak energy along the vertical axis, which has consequences for near-field compensation and the tuning of dual-band decoders. Hence, cylindrical decoders do not usually fulfill the Ambisonic criteria.

Triangle

The theoretical minimum of speakers for horizontal playback is 2+1, or the number of Ambisonic components. However, the triangle demonstrates that at least one more speaker is necessary for proper soundfield reconstruction, since it exhibits extreme speaker detention: when panned around, sounds will stick to speaker locations and then jump across to the next speaker, rather than showing uniform motion. As a consequence, the directions of rV and rE do not match between speakers, which causes localisation errors.[3] Hence, the triangle is a suitable setup for Ambisonic reproduction only at low frequencies.

Square or rectangular setups

Four-speaker setups are the most economical way of reproducing first-order horizontal material, and a rectangular layout is most easily fit into a living room, which makes these setups the most common in domestic environments. With rectangles, there is a localisation performance trade-off: the short sides will localise more stably than the square, the long sides worse. Consequently, for predominantly frontal sound stages, Benjamin, Lee, and Heller (2008) have observed a preference for rectangular layouts over squares.[4] All legacy domestic hardware decoders supported rectangular layouts, usually with variable aspect ratios.

ITU 5.1

It is tempting to consider 5.1 systems for Ambisonic playback due to their wide availability, but the ITU-R BS775 layout is quite hostile to Ambisonics due to its extreme irregularity. The three front speakers are so close together (-30°, 0°, +30°) that they will exhibit significant crosstalk in first-order, which causes irritating phasing artefacts without any benefit. Therefore, it is advisable to omit the center speaker and decode only for L, R, Ls and Rs, as has been done in all pre-decoded G-format releases for 5.1. These G-format disks also assume a rectangular layout. If first-order playback is desired, the rear speakers should be moved accordingly, otherwise the Ambisonic imaging will be very unstable due to the wide angle between the surround speakers. Decoding approaches to 5.1 were first suggested by Gerzon and Barton in 1992[5] and subsequently patented.[6] Adriansen provides a free second-order decoder obtained by genetic search,[7] and Wiggins (2007) has shown that source material as high as fourth order can be beneficial in order to 'steer' the decoding functions, even though the system is unable to reproduce the full spatial resolution.[8] Second and third-order material can be played satisfactorily over the ITU 5.1 layout, but due to the problems with first-order reproduction, it should not be considered for Ambisonics except as a compromise when 5.1 content predominates.

Hexagon

If six speakers and sufficient space are available, the hexagon is a very good option that has outperformed four-channel setups for first-order reproduction in listening tests[4] and is capable of second-order reproduction. It can be driven by an inexpensive 5.1 sound card and domestic 5.1 amplifier, provided the LFE output is full-range. When used with one speaker in front, the hexagon can be abused for native 5.1 playback at the expense of a significantly wider and more blurry stereo stage (120° as opposed to 60° between L and R as per ITU-R BS775). Alternatively, reasonably sharp virtual speakers at the canonical ITU locations can be created with second-order panners - this is an interesting option if a phantom center is tolerable, and it will also work with a two-in-front orientation, which leaves more room for a TV or projection screen.

Octagon

The Octagon is a flexible choice for up to third-order playback. When oriented one-in-front, it can be used for reasonably accurate native 5.1 playback (L and R at +/- 45° vs. 30°, and surrounds within the standardized sector at +/- 112.5°). For first order, phasing artefacts might become obvious under non-reverberant listening conditions due to the use of significantly more speakers than required, and Solvang's results (2008) suggest slightly increased timbral defects outside the sweet spot.[9] With eight channels, an octagon can be driven by affordable 7.1 consumer equipment, again as long as the LFE output is full-range. Driven in third order, it is a reasonable lower bound for concert sound reinforcement over an extended listening area, either for native Ambisonic content or to produce virtual speakers,[10] which has been found to scale to several hundred listeners under favourable conditions.[11]

Systems with limited height reproduction

Stacked rings

Stacked rings have been a popular way of obtaining limited with-height reproduction. Spatial resolution will be weak close to the zenith and nadir, but these are somewhat rare positions for sound sources. Rings are generally easier to rig than (hemi-)spherical setups because they do not require overhead trussing, speaker stands can be shared unless the rings are twisted, and entrances, fire escape routes etc. can be more easily accommodated for. Double hexagons and octagons are the most common variations. Since the introduction of #H#V mixed-order schemes by Travis (2009),[12] stacked rings can be operated at their full horizontal resolution even for elevated sources. #H#V decoding matrices for common layouts are available from Adriaensen (2012).[7] Triple rings are rare, but have been used to good effect.[13]

Upper hemisphere systems

Since stacked rings are somewhat wasteful at higher elevations and necessarily have a hole at the zenith, they have been largely surpassed by hemispherical layouts since mature methods for decoder generation have become available. As they are difficult to rig and require overhead points, hemispheres are usually found either in permanent installations or experimental studios, where expensive and visually intrusive trussing is not an issue.

Full-sphere systems: Platonic solids

The regular Platonic solids are the only full-sphere layouts for which closed-form solutions for decoding matrices exist. Before the development and adoption of modern mathematical tools for the optimisation of irregular layouts and the generation of T-designs and Lebedev grids with higher numbers of speakers, the regular polyhedra were the only tractable options.

Tetrahedron

Tetrahedral speaker setups were used in the 1970s for first trials of full-sphere sound reproduction. One such experiment conducted by the Oxford University Tape Recording Society was documented by Michael Gerzon in 1971.[14][15][16] In this setup, the tetrahedron was inscribed into a cuboid, using every other corner. Despite Gerzon's somewhat over-enthusiastic description (which pre-dates the introduction of Ambisonics and the proper formulation of its psychoacoustic criteria), the tetrahedron exhibits the same stability problems in 3D that plague the triangle for horizontal-only reproduction. It is a viable option for adequate full-sphere reproduction only at low frequencies.

Octahedron

The octahedron is difficult to set up in "upright" orientation, since the listener would occlude the floor speaker. Hence, a "slanted" setup is usually preferred. It provides basic full-sphere first-order reproduction for a single listener. Goodwin (2009) has suggested a slanted octahedron with separate front center (which he calls 3D7.1)[17] as an alternative way of using 7.1 systems to achieve with-height Ambisonic reproduction in games, and to allow reasonably accurate native 5.1 playback. An OpenAL game audio backend and decoder for this setup is commercially available.[18]

Cube

The most commonly encountered full-sphere systems are cubes or rectangular cuboids. The same localisation trade-offs apply as for square vs. rectangle (see above). Cuboids are easily fit into standard rooms and provide precise localisation in first order for a single listener plus enjoyable envelopment for one or two more, and they can be built using off-the shelf 7.1 components. If all speakers are placed in room corners, their acoustic loading and resulting bass boost will be uniform, which means they can all be equalised in the same way.

Icosahedron

For the sake of consistency, we consider the vertices of the regular polyhedra as speaker positions, which makes the twelve-vertex icosahedron the next in the list.[note 1] If suitable rigging options are available, it is capable of second-order full-sphere reproduction. A good and slightly more practical alternative is a horizontal hexagon complemented by two twisted triangles on floor and ceiling.

Dodecahedron

With twenty vertices,[note 1] the dodecahedron is capable of third-order full-sphere playback. Budget dodecahedra can be built by combining four domestic 5.1 sets as demonstrated at IRCAM's Studio 4,[19] which would also allow for a square horizontal subwoofer decode,

Irregular speaker layouts

It is possible to decode Ambisonics and Higher-Order Ambisonics onto fairly arbitrary speaker arrays, and this is a subject of ongoing research. A number of free decoding toolkits as well as a commercial implementation[20] are available.

Binaural stereo

Higher-Order Ambisonics can be decoded to produce 3D stereo headphone output similar to that produced using binaural recording. This can be done in a number of ways, including the use of virtual loudspeakers in combination with HRTF data.[21] Other methods are possible.[22]

Notes

  1. 1.0 1.1 Unfortunately, in the literature the icosahedral layout is commonly called a dodecahedron and vice versa, without justification as to why we should now consider faces rather than vertices.

References

  1. Stephen Thornton, Surround sound from two-channel stereo, see photos, retrieved 2014-01-02
  2. Audun Solvang, Spectral Impairment of Two-Dimensional Higher Order Ambisonics, JAES Vol.56 No. 4, April 2008
  3. Bruce Wiggins, Has Ambisonics Come of Age?, Reproduced Sound 24 - Proceedings of the Institute of Acoustics, Vol 30. Pt 6, 2008, Fig. 7
  4. 4.0 4.1 Eric Benjamin, Richard Lee, and Aaron Heller, Localisation in Horizontal-only Ambisonic Systems, 121st AES Convention, San Francisco 2006
  5. Michael A Gerzon, Geoffrey J Barton, "Ambisonic Decoders for HDTV", 92nd AES Convention, Vienna 1992. http://www.aes.org/e-lib/browse.cfm?elib=6788
  6. US 5757927, Gerzon, Michael Anthony & Barton, Geoffrey James, "Surround sound apparatus", published 1998-05-26, assigned to Trifield Productions Ltd. 
  7. 7.0 7.1 Fons Adriaensen, AmbDec Ambisonic Decoder, 2012
  8. Bruce Wiggins, The Generation of Panning Laws for Irregular Speaker Arrays Using Heuristic Methods Archived 2016-05-17 at the Portuguese Web Archive. 31st AES Conference, London 2007
  9. Audun Solvang, Spectral Impairment for Two-Dimensional Higher-Order Ambisonics, JAES Vol. 56, No. 4, April 2008, http://www.aes.org/e-lib/browse.cfm?elib=14385
  10. Jörn Nettingsmeier, General-purpose Ambisonic playback systems for electroacoustic concerts, 2nd International Symposium on Ambisonics and Spherical Acoustics, Paris 2010
  11. Jörn Nettingsmeier and David Dohrmann, Preliminary studies on large-scale higher-order Ambisonic sound reinforcement systems, Ambisonics Symposium 2011, Lexington (KY) 2011
  12. Travis, Chris, A new mixed-order scheme for Ambisonic signals Archived 2009-10-04 at the Wayback Machine, Ambisonics Symposium, Graz 2009
  13. Jörn Nettingsmeier, Field Report II A contemporary music recording in Higherorder Ambisonics, Linux Audio Conference 2012, Stanford 2012, p.8
  14. Michael Gerzon, Experimental Tetrahedral Recording: part one, Studio Sound, Vol. 13, August 1971, pp 396-398
  15. Michael Gerzon, Experimental Tetrahedral Recording: part two, Studio Sound, Vol. 13, September 1971, pp 472, 473 and 475
  16. Michael Gerzon, Experimental Tetrahedral Recording: part three, Studio Sound, Vol. 13, October 1971, pp 510, 511, 513 and 515
  17. Simon Goodwin, 3D sound for 3D games - beyond 5.1, AES 35th International Conference, London 2009
  18. Blue Ripple Sound, HOA Technical Notes - 3D7.1, retrieved 2014-01-02
  19. 2nd International Symposium on Ambisonics and Spherical Acoustics, IRCAM, Paris 2010, demo of Blue Ripple Sound's Rapture3D engine
  20. Blue Ripple Sound, HOA Technical Notes - Custom Layouts in Rapture3D Advanced Edition, retrieved 2014-01-24
  21. Richard Furse, Building an OpenAL Implementation Using Ambisonics, AES 35th International Conference, London 2009
  22. Blue Ripple Sound, HOA Technical Notes - Amber HRTF, retrieved 2014-01-24