About site: Algorithms - Huffman Coding Algorithm
Return to Computers also Computers
  About site: http://www.huffmancoding.com/david/algorithm.html

Title: Algorithms - Huffman Coding Algorithm Contains a guide how to build it and a JAVA tutorial how to use it.
Piraeus_University_-_Decision_Support_Systems_Lab Research, publications, and courses on intelligent decision support systems.

1stLinuxHost_com Canadian Linux hosting firm, offering PHP, ASP, MySQL, perl, and SSI support.

Hosted_Test Web-based software application enables online administration of tests and exams to students anywhere in the world.

Albert_van_der_Horst Fig-Forth including modern style manuals, Intel Forth assemblers, sorting, simulation, factoring and primes, Transputers and projects, politics, tomato game; member of the Forth gg: Dutch Forth user g

PathLearner Educational tool that demonstrates learning a shortest path on a grid using reinforcement learning.

MyPCBB Explains how to fix computer hardware, software, and networking problems via forum.


  Alexa statistic for http://www.huffmancoding.com/david/algorithm.html





Get your Google PageRank






Please visit: http://www.huffmancoding.com/david/algorithm.html


  Related sites for http://www.huffmancoding.com/david/algorithm.html
    sMArTH An equation editor for MathML and LaTeX built on open Web standardsusing a SVG interface, implemented in ECMAScript. MathML, LaTeX and SVG exports. Open source.
    Web_AccessiBlog Web accessibility news, links, and resources.
    Emerald_Web_Hosting Offers shared hosting and domain registration. Based in Washington, United States.
    d\'Vinci_Interactive Macromedia Authorized classes and customized training/consulting in Adobe Photoshop, and WebCatalog's WebDNA. Located in Maryland, United States.
    Scrub_The_Web Search engine with sponsored links at the top of results. Also offering a meta tag builder and analyzer.
    Scala_Group [Florence] Design and production of audiovisual contents, vast archives of images and related multimedia resources.
    Wine_the_MS-Windows_emulator_ An introduction.
    RFC_1811 U.S. Government Internet Domain Names. Federal Networking Council. June 1995.
    RFC_1621 Pip Near-Term Architecture. P. Francis. May 1994.
    RFC_1059 Network Time Protocol (Version 1) Specification and Implementation. D.L. Mills. July 1988.
    An_Analysis_of_Inferno_and_Limbo By Kim Nyberg. Paper presents, discusses, compares Inferno network OS and its Limbo language: history, precedents, motives, goals, main features; similarities, differences to Java OS, Java. Bibliograp
    Back_Thru_the_Future_Microcomputers Refurbished Apple II drives, cards, and monitors. Online shopping.
    Common_Applications_of_Regular_Expressions This article by Richard Lowe demonstrates four powerful and practical applications of regular expressions.
    Anil_Dash Thoughts on technology, society and the internet.
    PowerTech Offers office management and accounting software for piano technicians and tuners. Features download and screenshots.
    Project_Management_Based_on_PMI_Concepts Mapping of the Project Management Processes to the Project Management Process Groups and the Knowledge Areas.
    Digiteyesed_Photography Over 100 free images offered by the photographer Sean McCormick.
    PC_Tech_Guide__CD-R/CD-RW The technology behind the recordable CD.
    35938 Equal-Sign Characters as Arguments in Batch Files
    PowerASP_-_Code_Snippets Active Server Pages articles, links and tutorials
This is websites2007.org cache of m/ as retrieved on 2008.07.20 websites2007.org's cache is the snapshot that we took of the page as we crawled the web. The page may have changed since that time.
The Huffman Coding Procedure huffmandotalgorithm To avoid a college assignment The domain name of this website (www.huffmancoding.com) is from my uncle’s algorithm. In nerd circles, his algorithm is pretty well known. Often college computer science textbooks will refer to the algorithm as an example when teaching programming techniques. I wanted to keep the domain name in the family so I had to pay some domain squatter for the rights to it. Back in the early 1950’s, one of my uncle’s professors challenged him to come up with an algorithm that would calculate the most efficient way to represent data, minimizing the amount of memory required to store that information. It is a simple question, but one without an obvious solution. In fact, my uncle took the challenge from his professor to get out of taking the final. He wasn’t told that no one had solved the problem yet. I’ve written a simple program to demonstrate Huffman Coding in Java. Because I have this web site, several times a year I receive a frantic e-mail from a college student stating, basically, “I have a homework assignment to code the Huffman Algorithm and it is due next week. I am too lazy or clueless to do the work myself, so can you just send me the source code so I can pass it off as my own.” I don’t normally accommodate them, but perhaps this will help them do their own homework. A little of bit of background Computers store information in zeros and ones: binary “off”s and “on”s. The standard way of storing characters on a computer is to give each character a sequence of 8 bits (or “binary digits”) which can be 0’s or 1’s. This allows for 256 possible characters (because 2 to the 8th power is 256). For example, the letter “A” is given the unique code of 01000001. Unicode allocates 16 bits per character and it handles even non-Roman alphabets. It is simply easier for computers to handle characters when they all are the same size. The more bits you allow per character the more characters you can support in your alphabet. But when you make every character the same size, it can waste space. In written text, all characters are not created equal. The letter “e” is pretty common in English text, but rarely does one see a “Z.” But since it is possible to encounter both in text, each has to be assigned a unique sequence of bits. But if “e” was a 7-bit sequence and “Z” was 9 bits then, on average, a message would be slightly smaller than otherwise because there would be more short sequences than long sequences. You could compound the savings by adjusting the size of every character and by more than 1 bit. Even before computers, Samuel Morse took this into account when assigning letters to his code. The very common letter “E” is the short sequence of “·” and the uncommon letter “Q” is the longer sequence of “— — · —.” He came up with Morse code by looking at the natural distribution of letters in the English alphabet and guessing from there. Morse code isn’t perfect because some common letters have longer codes than less common ones. For example the letter “O,” which is a long “— — —,” is more common than the letter “I,” which is the shorter code “· ·.” If these two assignments where swapped, then it would be slightly quicker, on average, to transmit Morse code. Huffman Coding is a methodical way for determining how to best assign zeros and ones. It was one of the first algorithms for the computer age. By the way, Morse code is not really a binary code because it puts pauses between letters and words. If we were to put some bits between each letter to represent pauses, it wouldn’t result in the shortest messages possible. This adjusting of the codes is called compression and sometimes the computational effort in compressing data (for storage) and later uncompressing it (for use) is worth the trouble. The more space a text file takes up makes it slower to transmit from one computer to another. Other types of files, which have even more variability than the English language, compress even better than text. Uncompressed sound (.WAV) and image (.BMP) files are usually at least ten times as big as their compressed equivalents (.MP3 and .JPG respectively). Web pages would take ten times as long to download if we didn't take advantage of data compression. Fax pages would take longer to transmit. You get the idea. All of these compressed formats take advantage of Huffman Coding. Again, the trick is to choose a short sequence of bits for representing common items (letters, sounds, colors, whatever) and a longer sequence for the items that are encountered less often. When you average everything out, a message will require less space if you come up with good encoding dictionary. Mixing art and computer science You cannot just start assigning letters to unique sequences of 0’s and 1’s because there is a possibility of ambiguity if you do not do it right. For example, the four most common letters of the English alphabet are “E,” “T,” “O,” and “A.” You cannot just assign 0 to “E,” 1 to “T,” 00 to “O,” 01 to “A,” because if you encounter “…01…” in a message, you could not tell if the original message contained “A” or the sequence “ET.” The code for a letter cannot be the same as the front part of a different letter. To avoid this ambiguity, we need a way of organizing the letters and their codes that prevents this. A good way of representing this information is something computer programmers call a binary tree. Alexander Calder is an American artist who builds mobiles and really likes the colors red and black. One of his larger works hangs from the East building atrium at the National Gallery, but he had made several similar to it. The mobile hangs from a single point in the middle of a pole. It slowly sways as the air circulates in the room. On each end of the pole you’ll see either a weighted paddle or a connection to the middle of another pole. Similarly, those lower poles have things hanging off of them too. At the lowest levels, all the poles have weights on their ends. Calder mobile Programmers would look at this mobile and think of a binary tree, a common structure for storing program data. This is because every mobile pole has exactly two ends. For the sake of this algorithm, one end of the pole is considered “0” while the end is “1.” The weights at the ends of the poles will have letters associated with them. If an inchworm were to travel from the top of the mobile to a letter, it would walk down multiple poles, sometimes encountering the “0” and sometimes the “1.” The sequence of binary digits to the letter ends up corresponding to the encoding of that letter. Let us build a mobile So how do we build that perfectly balanced mobile? The first step of Huffman Coding is to count the frequency of all the letters in the text. Sticking with mobile analogy, we need to create a bunch of loose paddles, each one painted with a letter in the alphabet. The weight of each paddle is proportional to the number of times that letter appears in the text. For example, if the letter “q” appears twice, then its paddle should weight two ounces and the “e” paddle would weigh 10 ounces if that many “e”s were present. Every paddle has a loop for hanging. For our example, lets assume that in our tiny file there were two “q”, three “w”s, six “s”s, and ten “e”s. Now lets prepare some poles. We’ll need one fewer poles than unique characters. For example, with 4 unique characters we’ll need 3 poles. One end of each pole is “0” and the other end is “1.” Each pole will have a hook on both ends for holding things and a loop in the middle for being hung itself. In my imaginary world, poles weigh nothing. Now let us line up all the paddles then find the two lightest of them and connect them to opposite ends of a pole. In the example below, ”q“ and ”w“ were the lightest (least frequent). From now on, we’ll consider those two paddles and their pole as one inseparable thing. The weight of the “q+w” object is the sum of the two individual paddles. Remember the pole itself weighs nothing. We’ll put down the object then we’ll repeat the process. The two lightest things in the room now may be an individual paddle or possibly a previously connected contraption. In the picture below, “q+w” (with a weight of 5) and “s” (with a weight of 6) were the next two lightest objects. Then we are left with a “q+w+s” (with a weight of 11) and “e” (with a weight of 10) as the last two groupings. We’ll attach those two together. We are attaching the poles from the bottom up. We’ve hooked up the two lightest things until we’ve got exactly one contraption that contains the weight of the entire text. Binary Tree So what do we do with this tree? Now let’s hang up the mobile and admire our handiwork. The heaviest paddles (like the frequent “e”) will have a tendency to be nearer to the top because they were added later while to the lightest paddles (the infrequent “q”) will be at the bottom because they were grabbed first and connect to pole after pole, and so forth. In other words, the path from the top to the common letters will be the shortest binary sequence. The path from the top to the rare letters at the bottom will be much longer. The code for “e” is “0”, “s” is “10”, “w” is “111” and “q” is “110.” We have built a Huffman Coding tree. To finish compressing the file, we need to go back and re-read the file. This time, instead of just counting the characters, we’ll lookup, in our tree, each character encountered in the file and write its sequence of zeros and ones to a new file. Later, when we want to restore the original file, we’ll read the zeros and ones and use the tree to decode them back into characters. This implies that when we must have the tree around at the time we decompressing it. Commonly this is accomplished by writing the tree structure at the beginning of the compressed file. This will make the compressed file a little bigger, but it is a necessary evil. You have to have the secret decoder ring before you can pass notes in class. Other ways of squeezing data Since my uncle devised his coding algorithm, other compression schemes have come into being. Someone noticed that the distribution of characters may vary at different spots in the source, for example a lot of “a”s around the beginning of the file but later there might be a disproportionate number of “e”s. When that is the case, it is occasionally worth the effort to adjust how the Huffman tree hangs while running through the file. One could slice the file into smaller sections and have different trees for each section. This is called Adaptive Huffman Coding. Three other guys (Lempel, Ziv and Welch) realized that certain sequences of characters can be common, for example the letter “r” is often followed by the letter “e”, so we could treat the sequence “re” as just another letter when assigning codes. Sometimes it is not necessary to re-create the original source exactly. For example, with image files the human eye cannot detect every subtle pixel color difference. The JPEG (“Joint Photography Expert Group”) format “rounds” similar hues to the same value then applies the Huffman algorithm to the simplified image. The MP3 music format uses a similar technique for sound files. My uncle’s algorithm makes the world a smaller place. huffmandotfamily
 

Contains

a

guide

how

to

build

it

and

a

JAVA

tutorial

how

to

use

it.

http://www.huffmancoding.com/david/algorithm.html

Huffman Coding Algorithm 2008 July

dvd rental

dvd


Contains a guide how to build it and a JAVA tutorial how to use it.

Rules




© 2008 Internet Explorer 5+ or Netscape 6+

Recommended Sites: 1. Arts - Business - Computers - Games - Health - Home - Kids and Teens - News - Recreation - Reference - Regional - Science - Shopping - Society - Sports - World Miss Gallery - Top Anime Hentai - DVD rental by mail - Loans - The eBay Song - MPAA - Xecuter 3 Mod Chip - Ringtones
2008-07-20 06:26:55

Copyright 2005, 2006 by Webmaster
Websites is cool :) 184Hotel Wiedeñ - Man And Van - Bramy Rzeszów - Rekreacja - Hotell Glasgow