| Related sites for http://www.minidisc.org/aes_atrac.html |
| PyGS Gopher server written in Python. No longer being developed. | | Web_Page_Accessibility_Section_508_Checklist Checklist of Checkpoints for Web Content Accessibility Section 508 Guidelines | | Compustation_Inc Computer enclosures with NEMA 12 rating. Ruggedized for outdoor and harsh environments. | | 37_Degrees Quality assurance solutions company, using proprietary testing tools for scalability and performance testing and test automation. Offices in the United States, Asia and Europe. | | Freetalk Freetalk is an extensible console Jabber client. | | OfficeNet_Communications,_Inc_ Offers technology services, business networking, web hosting and design, computer sales, wiring, and helpdesk support. | | Conductor A web based content management solution to edit and publish website content in real time focusing on ease of use and streamlining workflow processes. | | AFS Reseller of Great Plains offers consulting services. | | Amazon_com_Interview__Larry_Wall Wall answers questions questions about XML support, Perl's Unicode implementation, the Win32 Perl port, the ActiveState visual Perl debugger, and the guiding philosophy behind the language. | | Rainmaker Producers of Flash Boyfriend and Flash Band. Features interactive web cartoons. | | Ellisys_USB_Tracker_110_Protocol_Analyzer Ellisys provides USB Tracker 110 a high-quality, low-cost solution that will give each developer and tester their own USB analyzers. | | A_Domain_Registration_Company Offers .com, .net, .org and .cc domain registration services with hosting and E-commerce solutions. | | SecTor Inspired by BlackHat, SecTor is taking a deeply technical look at threats with well known speakers and Canadian perspective. Venue and presentation details, call for speakers, schedule and contact in | | RFC_3132 Dormant Mode Host Alerting ("IP Paging") Problem Statement. J. Kempf. June 2001. | | Francis_Wright\'s_Emacs_packages WoMan.el, an elisp package to browse UNIX man files without having man installed and msdos-shell-fix.el, an improvement of the NT-Emacs support for the COMMAND.COM shell under Windows 95 can be found | | PSGML_Mode Major mode for editing SGML and XML documents. | | StealthNet3000 Web and graphic design, flash buttons, templates, and provides homework hosting on Linux servers. | | Uperlink_Technologies Offers consulting, e-business, networking, hosting, and disaster recovery services. Located in Texas, USA. | | GraphicsGraphics_WebDesign Offers design, maintenance, hosting and e-commerce solution. Located in Arizona, United States. | | Data_Hive A contact management system designed with simplicity in mind to improve user acceptance. |
|
ATRAC: Adaptive Transform Acoustic Coding for MiniDiscATRAC: Adaptive Transform Acoustic Coding forMiniDiscKyoya TsutsuiHiroshi SuzukiOsamu ShimoyoshiMito SonoharaKenzo AkagiriRobert M. HeddleSony Corporate Research Laboratories6-7-35 Kitashinagawa, Shinagawa-ku, Tokyo 141 JapanReprinted from the 93rd AudioEngineering Society Convention in San Fransisco, 1992 October 1-4AbstractATRAC is an audio coding system based on psychoacoustic principles.The input signal is divided into three subbands which are thentransformed into the frequency domain using a variable block length.Transform coefficients are grouped into nonuniform bands to reflectthe human auditory system, and then quantized on the basis of dynamicsensitivity and masking characteristics. ATRAC compresses compact discaudio to approximately 1/5 of the original data rate with virtually noloss in sound quality.1 IntroductionRecently, there has been an increasing consumer demand for a portablerecordable high-quality digital audio media. The MiniDisc system wasdeveloped to meet this demand. The MiniDisc is based on a 64 mmoptical or magneto-optical disc which has approximately 1/5 of thedata storage capacity of a standard compact disc. Despite the reducedstorage capacity, it was necessary that the MiniDisc maintain highsound quality and a playing time of 74 minutes. The ATRAC (AdaptiveTransform Acoustic Coding) data compression system was therefordesigned to meet the following criteria:Compression of 16-bit 44.1 kHz stereo audio into less than 1/5 ofthe original data rate with minimal reduction in sound quality.Simple and inexpensive hardware implementation suitable forportable players and recorders.When digital audio data is compressed, there is normally a certainamount of quantization noise introduced into the signal. The goal ofmany audio coding systems [1-6] is to control the time-frequencydistribution of this noise in such a way as to render it inaudible tothe human ear. If this is completely successful, the reconstructedsignal will be indistinguishable from the original.In general, audio coders operate by decomposing the signal into a setof units, each corresponding to a certain range in time and frequency.Using this time-frequency distribution, the signal is analyzedaccording to psychoacoustic principles. This analysis indicates whichunits are critical and must be coded with high precision, and whichunits are less sensitive and can tolerate some quantization noisewithout degrading the perceived sound quality. Based on thisinformation, the available bits are allocated to the time-frequencyunits. The spectral coefficients in each unit are then quantized usingthe allocated bits. In the decoder, the quantized spectra arereconstructed according to the bit allocation and then synthesizedinto an audio signal.The ATRAC system operates as above, with several enhancements.ATRAC uses psychoacoustics not only in the bit allocation algorithm,but also in the time-frequency splitting. Using a combination ofsubband coding and transform coding techniques, the input signal isanalyzed in nonuniform frequency divisions which emphasize theimportant low-frequency regions. In addition, ATRAC uses a transformblock length which adapts to the input signal. This ensures efficientcoding of stationary passages without sacrificing time resolutionduring transient passages.This paper begins with a review of the relevant psychoacousticprinciples. The ATRAC encoder is then described in terms oftime-frequency splitting, quantization of spectral coefficients, andbit allocation. Finally, the ATRAC decoder is described.2 Psychoacoustics2.1 Equi-loudness CurvesThe sensitivity of the ear varies with frequency. The ear is mostsensitive to frequencies in the neighbourhood of 4 kHz; sound pressurelevels which are just detectable at 4 kHz are not detectable at otherfrequencies. In general, two tones of equal power but differentfrequency will not sound equally loud. The perceived loudness of asound may be expressed in sones, where 1 sone is defined as theloudness of a 40 dB tone at 1 kHz. Equi-loudness curves at severalloudness levels are shown in Figure 1. The curve labeled "hearingthreshold in quiet" indicates the minimum level (by definition, 0sone) at which the ear can detect a tone at a given frequency. These curves indicate that the ear is more sensitive at somefrequencies than it is at others. Distortion at insensitivefrequencies will be less audible than at sensitive frequencies.2.2 MaskingMasking [7] occurs when one sound is rendered inaudible by another.Simultaneous masking occurs when the two sounds occur at the sametime, such as when a conversation (the masked signal) is renderedinaudible by a passing train (the masker). Backward masking occurswhen the masked signal ends before the masker begins; forward maskingoccurs when the masked signal begins after the masker has ended.Masking becomes stronger as the two sounds get closer together in bothtime and frequency. For example, simultaneous masking is strongerthan either forward or backward masking because the sounds occur atthe same time. Masking experiments are generally performed by using anarrow band of white noise as the masking signal, and measuring thejust-audible level of a pure tone at various times and frequencies.Examples of simultaneous masking and temporal masking are shown inFigure 2 and Figure 3 respectively. Important conclusions may be drawn from these graphs. First,simultaneous masking is more effective when the frequency of themasked signal is equal to or higher than that of the masker. Second,while forward masking is effective for a considerable time after themasker has stopped, backwards masking may only be effective for lessthan 2 or 3 ms before the onset of the masker.2.3 Critical BandsCritical bands [7] arose from the idea that the ear analyzes theaudible frequency range using a set of subbands. The frequencieswithin a critical band are similar in terms of the ear's perception,and are processed separately from other critical bands. Critical bandsarose naturally from experiments in human hearing and can also bederived from the distribution of sensory cells in the inner ear.Critical bands can be thought of as the frequency scale used by theear [8].The critical band scale is shown in Table 1. It is clear that thecritical bands are much narrower at lower frequencies than at highfrequencies; in fact, three quarters of the critical bands are locatedbelow 5 kHz. This indicates that the ear receives more information fromthe low frequencies and less from higher frequencies.Table 1: Discrete critical bands [7] Critical Band Frequency (Hz) Critical Band Frequency (Hz) LowHighWidth LowHighWidth0 0 100 10013 2000 2320 3201 100 200 10014 2320 2700 3802 200 300 10015 2700 3150 4503 300 400 10016 3150 3700 5504 400 510 11017 3700 4400 7005 510 630 12018 4400 5300 9006 630 770 14019 5300 6400 11007 770 920 15020 6400 7700 13008 920 1080 16021 7700 9500 18009 1080 1270 19022 9500 12000 250010 1270 1480 21023 12000 15500 350011 1480 1720 24024 15500 22050 655012 1720 2000 2803 The ATRAC EncoderA block diagram of the encoder structure is shown in Figure 4. Theencoder has three components. The analysis block decomposes the signalinto spectral coefficients grouped into Block Floating units (BFU's).The bit allocation block divides the available bits between the BFU's,allocating fewer bits to insensitive units. The quantization blockquantizes each spectral coefficient to the specified wordlength. 3.1 Time-Frequency AnalysisThis block (Figure 6) generates the BFU's in three steps, combiningtechniques from subband coding and transform coding. First, the signalis broken down into three subbands: 0-5.5 kHz, 5.5-11 kHz, and 11-22kHz. Each of these subbands is then transformed into the frequencydomain, producing a set of spectral coefficients. Finally, thesespectral coefficients are grouped nonuniformly into BFU's. The subband decomposition is performed using Quadrature Mirror Filters(QMF's) [0-10]. The input signal is divided into upper and lowerfrequency bands by the first QMF, and the lower frequency band isdivided again by a second QMF. Use of QMF's ensures that time-domainaliasing caused by the subband decomposition will be cancelled duringreconstruction.Each of the three subbands is then transformed into the frequencydomain using the Modified Discrete Cosine Transform (MDCT) [11-12].The MDCT allows up to 50% overlap between time-domain windows, leadingto improved frequency resolution while maintaining critical sampling.Instead of a fixed transform block length, however, ATRAC chooses theblock length adaptively based on the signal characteristics in eachband. There are two modes: long mode (11.6 ms) and short mode (1.45 msin the high frequency band, 2.9 ms in the others). Normally long modeis used to provide good frequency resolution. However, problems mayoccur during attack portions of the signal. Specifically, thequantization noise is spread over the entire signal block, and theinitial quantization noise is not masked (Figure 8a); this problem iscalled pre-echo. In order to prevent pre-echo, ATRAC switches to shortmode (Figure 8b) when it detects an attack signal. In this case,because there is only a short segment of noise before the attack, thenoise will be masked by backward masking (section 2.2). Backwardmasking is not effective for Long Mode because of its very shortduration. Thus, ATRAC achieves efficient coding in stationary regionswhile responding quickly to transient passages. Note that short mode is not necessary for signal decay, because thequantization noise will be masked by forward masking which lasts muchlonger than backward masking. For maximum flexibility, the block sizemode can be selected independently for each band.The MDCT spectral coefficients are then grouped into BFU's. Each unitcontains a fixed number of coefficients. In the case of long mode, theunits reflect 11.6 ms of a narrow frequency band; in the case of shortmode, each block reflects a shorter time but a wider frequency band(Figure 9). Note that the concentration of BFU's is greater at lowfrequencies than at high frequencies; this reflects the psychoacousticcharacteristics of the human ear. 3.2 Spectral QuantizationThe spectral values are quantized using two parameters: wordlength andscale factor. The scale factor defines the full-scale range of thequantization, and the wordlength defines the precision within thatscale. Each BFU has the same wordlength and scale factor, reflectingthe psychoacoustic similarity of the grouped frequencies.The scale factor is chosen from a fixed list of possibilities, andreflects the magnitude of the spectral coefficients in each BFU. Thewordlength is determined by the bit allocation algorithm (section3.3).For each sound frame (corresponding to 512 input points), thefollowing information is stored in disc:MDCT block size mode (long or short).Wordlength data for each Block Floating unit.Scale factor code for each Block Floating unit.Quantized spectral coefficients.In order to guarantee accurate reconstruction of the input signal,critical data such as the block size mode, wordlength and scale factordata may be stored redundantly. Information about quantities ofredundant data is also stored on the disc.3.3 Bit AllocationThe bit allocation algorithm divides the available data bits betweenthe various BFU's. Units with a large number of bits will have littlequantization noise; units with few or no bits will have significantquantities of noise. For good sound quality, the bit allocationalgorithm must ensure that critical units have sufficient bits, andthat the noise in non-critical units is not perceptually significant. ATRAC does not specify a bit allocation algorithm; any appropriatealgorithm may be used. The wordlength of each BFU is stored on theMiniDisc along with the quantized spectra, so the decoder iscompletely independent of the allocation algorithm. This provides forthe evolutionary improvement of the encoder without changing theMiniDisc format or the decoder.There are many possible algorithms, ranging from very simple toextraordinarily complex. For portable MiniDisc recorders, however, thepossibilities are limited somewhat by the fact that they must beimplemented on low-cost low-power compact hardware. Nevertheless,ATRAC is capable of good sound quality using even a simple bitallocation algorithm, provided it is soundly based on psychoacousticprinciples. ATRAC's nonuniform adaptive time-frequency structure isalready based on psychoacoustics, relieving the pressure on the bitallocation algorithm.One suggested algorithm uses a combination of fixed and variable bits.The fixed bits emphasize the important low-frequency regions,allocating fewer bits to the BFU's in higher frequencies. The variablebits are allocated according to the logarithm of the spectralcoefficients within each BFU. The total bit allocation btotis the weighted sum of the fixed bits bfix(k) and thevariable bits bvar(k). Thus, for each BFU k,btot(k) = Tbvar +(1-T)bfixThe weight T is a measure of the tonality of the signal, takinga value close to 1 for pure tones, and close to 0 for white noise.This means that the proportion of fixed and variable bits is itselfvariable. Thus, for pure tones, the available bits will beconcentrated in a small number of BFU's. For more noise-like signals,the algorithm will emphasize the fixed bits in order to reduce thenumber of bits allocated to the insensitive high frequencies. The above equation is not concerned withoverall bit rate, and will in general allocate more bits than areavailable. In order to ensure a fixed data rate, an offsetboff (the same for all BFU's) is calculated. Thisvalue is subtracted from btot(k) for each unit,giving the final bit allocation b(k): b(k) =integer{btot(k)-boff} If the subtraction generates a negative wordlength, that BFU isallocated 0 bits. This algorithm is illustrated in Figure 10. 4 The ATRAC DecoderA block diagram of the decoder structure is shown in Figure 5. Thedecoder first reconstructs the MDCT spectral coefficients from thequantized values, using the wordlength and scale factor parameters.These spectral coefficients are then used to reconstruct the originalaudio signal (Figure 7). The coefficients are first transformed backinto the time domain by the inverse MDCT (IMDCT) using either longmode or short mode as specified in the parameters. Finally, the threetime-domain signals are synthesized into the output signal by QMFsynthesis filters. 5 ConclusionsThrough a combination of various techniques including psychoacoustics,subband coding and transform coding, ATRAC succeeds in coding digitalaudio with virtually no perceptual degradation in sound quality.Listening tests indicate that the difference between ATRAC sound and theoriginal source is not perceptually annoying nor does it reduce thesound quality. Furthermore, the system is sufficiently compact to beinstalled in portable consumer products. Using ATRAC, the MiniDiscprovides a practical solution for portable digital audio.6 ReferencesMPEG/AUDIO CA11172-3, 1992."ASPEC (Source: AT&T Bell Labs et al. )" Doc. No. 89/205, ISO-IECJTC1/SC2/WG8 MPEG-AUDIO, Oct. 18, 1989.R. Veldhuis, M. Breeuwer and R. van der Wall, "Subband coding ofdigital audio signals without loss of quality," Proc. 1989International Conference on Acoustics, Speech and Signal Processing,Glasgow, pp. 2009-2012.A. Sugiyama, F. Hazu, M. Iwadare and T. Nishitani, "Adaptivetransform coding with an adaptive block size (ATCABS)," Proc. 1990International Conference on Acoustics, Speech and Signal Processing,Albuquerque, pp. 1093-1096.G. Davidson, L. Fielder and M. Antill, "High-quality audiotransform coding at 128 kbits/s," Proc. 1990International Conference on Acoustics, Speech and Signal Processing,Albuquerque, pp. 1117-1120.G. Davidon, L. Fielder and M. Antill, "Low-complexity transformcoder for satellite link applications," Audio Engineering Society 89thConvention preprint 2966, Sept. 1990.J. S. Tobias, Ed., Foundations of Modern Auditory Theory, Vol. 1,Academic Press, New York, 1970.E. Zwicker and U. T. Zwicker, "Audio engineering andpsychoacoustics: Matching signals to the final receiver, the humanauditory system." J. Audio Engineering Society, Vol. 39 No. 3, pp.115-126, March 1991.D. Estaban, and C. Galand, "Application of quadrature mirrorfilters to split band voice coding schemes," Proc. 1977 IEEEInternational Conference on Acoustics, Speech and Signal Processing,Hartford CT, pp. 191-195.P. P. Vaidyanathan, "Quadrature mirror filter banks, M-bandextensions and perfect-reconstruction techniques," IEEE ASSP Magazine,Vol. 4, pp. 4-20, July 1987.J. Princen and A. Bradley. "Analysis/synthesis filter band designbased on time-domain aliasing cancellation," IEEE Trans. Acoustics,Speech and Signal Processing, Vol. 34, pp. 1153-1161, 1986.J. Princen, A. Johnson and A. Bradley, "Subband/transform codingusing filter band designs based on time domain aliasing cancellation,"Proc. 1987 IEEE International conference on Acoustics, Speech andSignal Processing, Dallas, pp. 2161-2164.Return to the MiniDisc Community Page. |
|