The GSM 06.10 lossy speech compression library and its applicationsGSM Applications, Ports, Others, Half-Rate GSM, Miscellaneous, Indices GSM 06.10 lossy speech compression- telephone quality speech- 13 kbit/s- free sourcecode[Recent additions are bold.]In 1992, I was working as a tutor at the Technical University of Berlin. The research groupI was in needed a speech compression algorithm to support its multimediaconferencing experiments. They found what they were looking for in theETSI specificationsof theGlobal System for Mobile telecommunication (GSM), Europe'scurrently most popular protocol suite for digital cellularphones. (John Scourias'overview of GSMdoes a good job introducing the overall architecture;hire him.Another, more recent,overview of the GSM system (with a list ofWeb links) comes from Javier Gozàlvez Sempere.)The low-level speech compression algorithm of the GSM suite iscalled GSM06.10 RPE-LTP (Regular-Pulse ExcitationLong-Term Predictor). My colleague Dr. Carsten Bormann and I have implemented a GSM 06.10RPE-LTP coder and decoder in C. Its sourcecode is freely available, and weencourage you to use it,play with it, and invent new real-time media protocols and algorithms.Our implementation consists of a C library and a stand-alone program.Both are destined to be compiled and used on a Unix-like environmentwith at least 32-bit-integers, but others have ported it to VMS and aMS-DOS 16-bit-environment. GSM 06.10 is faster than code-book lookup algorithms such as CELP,but by no means cheap; to use it for real-time communication,you will need at least a medium-scale workstation.When using the library, you create a gsm object that holds the statenecessary to either encode frames of 160 16-bit PCM samples into 264-bitGSM frames, or to decode GSM frames into linear PCM frames. If you want to examine and change the individual parts of the GSM frame,you can ``explode'' it into an array of 70 parameters, change them there,and ``implode'' them back into a packed frame; you can also print awhole GSM frame to a file in human-readable format with a single function call.Our library client, called toast, is modeled afterthe Unix compress program. Running toast myspeech willcompress the file myspeech, remove it, and collect the result ofthe compression in a new file called myspeech.gsm, while untoastmyspeech will reverse the process. The big differencebetween toast and compress is that toast loses information with eachcompression cycle. (After a few iterations, you canhear high-pitched chirpsthat I initially mistook for birds outside of my office window.) Patent Issues with GSM 06.10Philips is claiming intellectual property on GSM 06.10.They haven't contacted the authors of this library,but at least two largecompanies that wanted to integrate GSM 06.10 codecs into theirproducts have been approached; one decided to pull their codec,another to pull just the encoder and leave the decoder.(So, apparently, at least some lawyers think the intellectualproperty applies only to one half of the process.)I don't know which parts of the patent are new,or whether it would hold up in court, but of course nobodywants to go to court over an issue as small as this.The VPIM IETF workgroup is considering using GSM 06.10.The IETF can't standardize on technology that forces itsusers to pay license fees.If Philips doesn't release their intellectual property for use in VPIM,we'll be wasting a lot of bandwidth with voice mail.That's not the end of the world, but it would be nice toat least ask first.I don't know whom to ask.If you do, please contact me. Get ETSI publications free of chargeETSI is the European standards bodythat came up with GSM. For a limited time, ETSIis making copies of its publications available over theInternet for the priceof giving away an email address,among them the GSM 06.10 and GSM 06.06 draftsand attachments.Try it out at http://pda.etsi.org/pda/. GSM 06.10: the current patchlevel is 12The only difference between this and patchlevel 10 is an untested changein the WAV#49 portion of gsm_implode.c. I don't think anyone isactually using this, but since the ftp server became unavailableand I had to restore the archive somehow, I figured I might as wellfix it.Full release:reference version, gzip'ed tar file.Warble, warble, warbleLeila: Are you using a scrambler?J. Frank: I can't hear you, I'm using a scrambler!- Repo ManIf you're using the library to encode and decode sound inyour project, and the resulting audio is nowhere near telephonyquality but sort of warbled, the most likely cause is thatyou're using the same gsm state to both encode and decode.Don't do that; allocate two different states instead, onefor each direction.Porting to a DEC AlphaPeople porting the GSM 06.10 library to DEC Alphas have noticedthat the test of the basic math routines fails. The test prints:0xfffffffe (4294967294) != L_<< (2147483647, 1) -- expected 0xfffffffe (-2)0x00000000 (-4294967296) != L_<< (-2147483648, 1) -- expected 0This can be fixed by changing the definitions of the32-bit types in inc/private.h from longand unsigned longto int and unsigned int. On the Alpha, a longhas 64 bits; an int (at least with the unadorned native compilerI used) has 32 bits.The math tests that fail exploit properties specific to 32-bitinteger math.If you don't care about the math test, you don't have tochange the types. In spite of the failing test, the librarydoes work fine even with a 64-bit long. (It's beentested against byte-swapped ETSI test patterns.) The .wav GSM formatThere is a .wav chunk format #49 that encodes GSM 06.10 frames. Newer Windows versions support it natively. It's a completely parallelversion to ours, written from the same ETSI pseudocode, but ending upwith imcompatible framing and different code order in the bytes.After fretting over intellectual property rights fora few months,Microsoft has now registered the encoding inside the WAV chunk as aMIME type, particularly for use in the context of VPIM (Voice Profilefor Interenet Mail)'s spinoff IVM, a way of sendingVoice Messages as MIME documents.The Microsoft ietf-draft used to be available asdraft-ema-vpim-msgsm-00.txtfrom IETF draft repositories.Long before that, Jeff Chiltonfigured out the format with trial-and-error when he needed to writecompressed wave files for his shortwave radio application (seebelow).The patchlevel 9 release of GSM integrates Jeff's ``unofficial''patch 8 in slightly different form,breaking his sample source code along the way.The updated versionhas its GSM_OPT_WAV_FMT changed to GSM_OPT_WAV49, and (thanksto Dima Barsky) a more portable way of looking at fputs'sresult. If you couldn't get it to work earlier on a SysV-ishenvironment, try again.GSM on the World-Wide WebJay Novello has gone ahead and used the audio/x-gsm MIME type.A page at the North Carolina Institute for Transportation Research and Educationexplains how users and web masterscan configure their systems to conveniently handle GSM documents,and offers a few sounds to test with for those that do.GSM 06.10 ErrataThe list of tested overflow points for sequence 1 (coder part),table 5.2 of the GSM 06.10 draft, expects 49 overflows in the APCMquantizer's call to abs() (section 4.2.15). Rob Wubben of PhilipsResearch Labs, who implemented a GSM 06.10 codec and counted, found57 - ditto when he checked the same count in our library, and ina colleague's C simulation of the codec. In our opinion the tableis wrong.(Update: Pierre Larbier reports that the final ETSIrelease of the GSM 06.10 test sequences, attached toETS 300 580-2 edition 2 (GSM 06.10 version 4.1.1),has corrected its SEQ01 to produceonly the promised 49 overflows.) Dr. Carsten BormannMy co-authorCarsten Bormann has left the TU Berlin a few months ago toaccompany Prof. Ute Bormann to the computer science departmentof the Universität Bremen, but both still visit Berlinregularily. Carsten will continue to be reachable as cabo@cs.tu-berlin.de;his email address in Bremen is cabo@informatik.uni-bremen.de. The Schur recursionThe Linear Predictive Coding (LPC) part of the GSM algorithmuses an integer version of the ``Schur recursion'' describedby Issai Schur in 1917. (The Levinson-Durbinalgorithm from 1959 is better known, but the Schur recursion can befaster when paralellized.) Linear predictionmeans that the algorithm tries to find parameters for a filterthat predicts the signal in the current frame as a weightedsum (or ``linear combination'') of the previous ones.(Wil Howitt offers a short tutorial about LPC and CELP)GSM for XJavaKudos to Steven Pickles fora free full-source Java 1.1 port of the GSM 06.10 Decoder side.Unlike the C library, the Java code is licensed under the FreeSoftware Foundation's General Public License; if you use it,keep the library source available.Chris Edwards did a Java port of the GSM 06.10 Encoder,but I'm not sure where he moved to - the old link I havefor him doesn't work anymore.An open-source applet that can play lots of different GSM variants (with or without.wav header) is MumboJumbo,from voxeo's Omi Chandiramani. It's being extended to play other sound formats,too, and you can help.DOS?Louis Selvon <lselvon@usa.net> has created a new versionof toast for DOS,based on the Patchlevel 10 release.As part of his EE thesis work, Louis also measured theobjective and subjective performance (not speed, quality) ofGSM 06.10 using MatLab (objective) and his family and neighbors(subjective).Richard Elofsson <rel@ldecs.ericsson.se> has made thehis DOS-port of the Patchlevel 4 releaseavailable. (He fixed bugs that it took me untilPatchlevel 7 to find, though.) The source code, which compiles with Turbo C++ version 1.01,can be found as gsm-dos.zip in the toplevel GSM ftp directory.Sergey A. Zhatchenko (zha@ergenm.comcen.nsk.su),from Novosibirsk, Russia, has donated a toast.exe, derived from a patchlevel 6 release of the GSMlibrary. Make sure your input filenames have no suffix;this version of toast doesn't know that MS-DOS doesn't like morethan one dot in its filenames.GSM on the BeBoxPierre-Emmanuel Chaut ported the GSM library to theBeBox, a PowerPC-basedmultiprocessing platform that excelswith concurrent multimedia applications.It takes, he writes, "4 seconds to compress 20 secondsof sound". Way to go, Be.Jake Bordens did his own port and implemented someGSM Coders as minimal sample applications. He'sstill too embarrassed to publish his code to justabout anyone, but might be talked out of it; meanwhile,the binary is available from the webpage. GSM DLL for OS/2Terry Fry created, and is now distributing and maintaining,a OS/2 DLL version of the GSM 06.10 library.Next will be a .wav to .gsm for OS/2.MacGSMPaul C.H. Ho and Pink Elephant Technologies have used the Patchlevel 6release to write a drag-and-dropGSM compressor/decompressorthat converts between .au.gsm and .au. You'll need System 7.5 orSystem 7.0 and 7.1 with a Thread Manager extension; 68K and Power PChardware is fine. The tool, initially written to decompress files broadcast byRadio Television Hong Kongis freeware and is distributed viaftp as a binary. GSM for the amigaMichael Cheng is responsible for the distribution ofatoast binary compiled with amiga gcc2.7.2on the aminet repositiories, pathutil/pack/GSMToast.lha. Michael also added some scriptsthat use toast to implement astreaming audio GSM mime type; they canbe found on the same archives in comm/tcp/unrealaudio.lha.GSM ApplicationsMythPhonePaul Volkaerts has added a GSM 06.10 codec to MythPhone, ateleconferencing plugin for thehomebrew PVR MythTV.GSM for GBADamian Yerrick ported part of the library to the Game Boy Advance aspart of a portable music player application that plays musicoff 256 Mbit flash cards.ffmpegThe ffmpeg projectnow can decode and encode GSM, both in the Microsoft- andnon-Microsoft flavors, including support for Microsoft-GSMin .wav files.xineThe GPL'ed free video player xinenow uses code from our library to help play GSM-enocded AppleTalk andWindows WAV/AVI/ASF audio tracks.aRtsThe KDE sound server aRts, short foranalog realtime synthesizer,has grown a GSM de- and encoder in its kdenonbeta module, thanksto Matthias Kretz.JusTalk 2Jonas Tärnström released this compactWindows multiuser voice chat application. It supports multiplesample rates, can function as a client or server, and can be set tostream audio either contiguously or whenever the voice level risesabove a threshold.ElderVisionThe makers of the TouchTownInternet package for seniors are using a Java GSM 06.10 clientfor low-bandwidth telephony.linphoneEven if you don't speak French, you can now read about and downloadlinphone, a web-phone applicationthat uses the GSM 06.10 library (with a fresh autoconf Makefile from author SimonMorlat).JVOIPLIBJori Liesenborgs'sJVOIPLIBis a LGPL'ed voice-over-IP library written in C++, based on his thesis work.It supports multiple codecs and codec parameters, VoIP session creation anddestruction, and 3D effects (!).Jori has just integrated the GSM library and will likely be shipping asubset of the GSM 06.10 release with his next version.OpenH323OpenH323is an Open Source implementation of the ITU H.323 protocol stack whichruns on Linux, Windows, Solaris and other Unix platforms.The OpenH323 client sample code can interoperate with NetMeeting inaudio mode, and can receive H261 format video. The GSM codec is thestandard codec used by Linux implementations where G.723.1 hardware isnot available.Patches to the SOund eXchange tool, soxAndrew Pam(avatar@aus.xanadu.com)haspatched Lance Norskog'ssox program to work with the GSMlibrary. I wish I had thought of that.Sox-12.16: Son of SOXChris Bagwell (you might remember him as maintainer of theAudio File Format FAQ) has snatched maintenance of thecryptic, resourceful Unix tool sox from its original author,Lance Norskog.Version 12.17 supports GSMand WAV#49. Pulse Entertainment's 3d web animation pluginPulse3dis streaming GSM 06.10 audio to its real-time animated characters,along with the lip sync and and body animation information thatmakes them come to life. HotFoonPeople with friends in Hyderaband, India, are inluck; hotfoonis offering a (so far) free gateway service to numbers in thelocal area there. Their small, free client also servesas a gateway to an online chat system; as usual, if youand a friend both download the client, have Duplex sound cards anda reasonably fast Internet connection, you can talk forfree across the Internet, no matter where you are. ATR-ITLSomewhere towards the tail fin of the Japanese-Englishtelephone "babelfish"that the Advanced Telecommunications Research group's InterpretingTelecommunications Research Laboratoriesare trying to build, a GSM 06.10 codec is one of the optionsavailable for encoding the translated utterances. The Audiograph Lecture Recorder and PlayerThe University of Surrey, UK, andMassey University, NZ,have developed a Mac-based authoring system and Windows/Mac Netscapeplugin software for voice- and drawing-annotatedslide shows; they now distribute it throughwww.nzedsoft.com.The viewers are free; version 1.2 of the authoring toolused to cost money, but is now free as well. NTT's "InterSpace" Virtual EnvironmentThe Virtual Campus of NTT'sInterSpace projectcombines videoconferencing with 3D graphics and, recently added,an audio chat facility that uses our library.The site'sentrance graphics show rendered avatars whose heads arereplaced by video screens rendered into the scenery, ratheringeniously close to the SnowCrash ideal.BooksOn digital speech processing, I recommendDiscrete-Time Processing of Speech Signalsby John R. Deller, JR, John G. Porakis,and John H. L. Hansen;Macmillan Publishing Company, New York, 1993;ISBN 0-02-328301-7 For a well-written, interesting,100% jargon-free introduction tolanguage, speech, and the mind, seeThe Language Instinctby Steven Pinker;William Morrow and Company, Inc., New York, 1994;ISBN 0-688-12141-1 The book on GSM in general is self-published and can only beordered from the authors.The GSM System for Mobile Communicationsby Michel Mouly and Marie-Bernadette Pautet;49, rue Louise Bruneau, F-91120 PALAISEAU, FRANCETel: +33 1 69 31 03 18Fax: +33 1 69 31 03 38Web: http://perso.wanadoo.fr/cell.sys/A new edition of the book is planned for autumn 1998.Introductions and DemosA set of introductory DSP classes is onlineat http://www.bores.com/courses/intro.If you're learning about digital speech processing, visitPhil Karn's Digital/Analog Voice Demo atQualcomm. Illustrated with mu-law sound samples,Phil takes you from an original sound sample,to a band-pass filtered version,to one with added noise, to a GSM version,a CELP-encoded version, Qualcomm's proprietaryQCELP-encoded version at two different data rates,and an LPC-10 version, complete with runningcommentary about each encoding.CELP source code sightedRick Ross found a set ofspeech compression engines at CMU; featuring a prehistoricversion of GSM, an LPC, the CCITT-ADPCM, and various*ELPs that I haven't seen anywhere else.SamplesThe VincentVoice Library atMichigan State Universityhouses taped utterances of over 50,000 persons recordedover 100 years.Sound applications on the World-Wide WebJeff Chilton's ShortwaveRadio gives you access to the last 5 or 15 seconds from auser-selected frequency, as received in Reston, Virginia, USA.ResearchVoice Synthesizers On The Verge of a Nervous Breakdown: in 1989,Janet Cahn wrote her thesis at the MIT Media Lab aboutExpressive Synthesized Speech - how to make voicesynthesizers express emotions. The three soundsamples she has online, three different sentences synthesizedin ten tones expressing anything from impatience through angerto depression, are still hilarious to listen to.FunIf you want to learn more about sounds, why not pay a visit to theSan Francisco Exploratoriumand its duck call vowels?(If, conversely, you want to hear more about toasters, I recommendPatrick R. Michaud's report on Strawberry Pop-Tart Blow-Torches)The final word on telephone sex.IndicesThe maintainers of the following sites try to offercomprehensive and complete indices into their respectivesubjects; the documents should be large enoughto get you within a few hops of your topic quickly.MultimediaSimon Gibbs'Index to Multimedia Information SourcesA long no-frills list of Multimedia links, with archives,standards, companies, research organisatins,conference announcements, tutorial-type material, and FAQs.Speech ProcessingAndrew Hunt'scomp.speech site)The site's hypertext version of the comp.speechFrequently Asked Questions posting has pointers to generalinformation and tools concerned with speech encoding, compression,recognition, synthesis, and other forms of natural languageprocessing.Jason Woodard's descriptions ofSpeech CodecsRather than pointing to every speech processing gizmo in existence,this subtree explains principles and formats, and gives crucialsoftware and theory references, for three generalclasses of speech codecs and a the most important standards.Digital signal processingThe comp.dsp Frequently Asked Questions listQuestions, answers, and resources for general digitalsignal processing.Josip Juric's DSPhomepagecollects the FAQ and a number of other pointers to DSP resources;among them Guido van Rossum's Audio File Format FAQand Appendix from comp.dsp.CompressionThe comp.compression Frequently Asked Questions listexplains, and often provides references to software that implements,most lossy and non-lossy algorithms. The hypertext FAQarchived at Ohio State University looks just like the ascii FAQ,but has been broken up and links directly to referenced documentswhere possible.TelecommunicationTelecommunication sites from John Scourias.John is the author of the excellent overview referencedelsewhere on this page; this is his telecommunication hotlist.GSMJürgen Morhöfer's GSM List,last updated on Sep 22th 2000, listsGSM operators with network code and customerservice phone number, sorted by country.Supercall Cellular,a South African provider, maintains a page of links to generalinformation about GSM, including codes, networks,coverage maps for Europe -- and a request for submissionsof scanned-in SIMs.jutta@pobox.com, March 2006. Comments and corrections are welcome. |
|