Javascript required
Skip to content Skip to sidebar Skip to footer

How to Draw a Huffman Tree

I remember when learning algorithms I got clear enough explanation of Huffman Codes and Trees but usually code examples were hard to read and understand for a newbie. So this motivated me to put in my to-do list a task of writing simplified example for Huffman Encoding and Decoding problem.

Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means ofHuffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code (sometimes called "prefix-free codes", that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol). Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when such a code is not produced by Huffman's algorithm.

There are mainly two major parts in Huffman Coding:

1) Build a Huffman Tree from input characters.

2) Traverse the Huffman Tree and assign codes to characters.

Steps to build Huffman Tree

Input is an array of unique characters along with their frequency of occurrences and output is Huffman Tree.

1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root)

2. Extract two nodes with the minimum frequency from the min heap.

3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.

4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node and the tree is complete.

No alt text provided for this image

Let's go to our example in Python

We are given a string and want to encode it with Huffman coding:

line = 'The world should be better!'        

Step 1. Calculate frequencies

def get_freq(words):     """     Given a string words, returns a dictionary with the frequency     of each character in the string called words.      """     dic = {}     for i in (words):         if i in dic.keys():             dic[i] += 1         else:             dic[i] = 1        
    return dic   dic = get_freq(line)  print(dic)  #{'T': 1, 'h': 2, 'e': 4, ' ': 4, 'w': 1, 'o': 2, 'r': 2, 'l': 2, 'd': 2, 's': 1, 'u': 1, 'b': 2, 't': 2, '!': 1}        

Step 2. Generate a Tree

First we are creating a class Node - to describe how our Nodes should look like

# Node class for our Huffman Tree class Node:     def __init__(self, data, freq):         self.data = data         self.freq = freq         self.left = None        
        self.right = None        

Next generate a list of Nodes (value - frequency pares) that correspond to each symbol in our line and sort them by frequency

list_nodes = [] for i in dic.items():     node = Node(i[0], i[1])     list_nodes.append(node)   s = sorted(list_nodes, key = lambda node: node.freq)        
print([(i.data, i.freq) for i in s])  #[('T', 1), ('w', 1), ('s', 1), ('u', 1), ('!', 1), ('h', 2), ('o', 2), ('r', 2), ('l', 2), ('d', 2), ('b', 2), ('t', 2), ('e', 4), (' ', 4)]        

Create a Huffman tree

# function to make Huffman tree from prepared nodes def makeTree(list_nodes):     if len(list_nodes) > 1:         while len(list_nodes)>1:             '''             sort list of nodes and select 2 with the lowest frequency             '''             s = sorted(list_nodes, key = lambda node: node.freq)             node1, node2 = s[0], s[1]             '''             create Huffman tree part with selected nodes             '''             parent = Node('-', node1.freq + node2.freq)             parent.left = node1                         parent.right = node2             root = parent             '''             update the list of nodes by replacing 2 selected nodes with a new one for next level iteration             '''             list_nodes.remove(node1)             list_nodes.remove(node2)                          list_nodes.append(parent)     else:         root = list_nodes[0]     return root        
root = makeTree(list_nodes)        

Well, we have now root - it is our generated Huffman tree. Let's generate code for each string from our line to encode. To do this we need to have an algorithm to walk through the tree and reading code as +"0" each time when we go LEFT and +"1" each time when we go RIGHT

# walk through tree recursively and collect Code for each Original String def makeCode(root, string, dic):     '''     Recursive function to print the tree.     Here the string is the huffman code generated     The dictionary dic get's the code for each string.     '''     # If the left and the right of the root are none     # Then it is a leaf node so we just print its value     if root.left == None and root.right == None:         # Make the string its Huffman Code for future use         dic[root.data] = string         return dic       # if we go to left then add "0" to the code.     # if we go to the right add "1" to the code.          makeCode(root.left, string+"0",dic)     makeCode(root.right, string+"1",dic)        
        

Now we have everything for generating encoded line function:

def encode_huffman(line, root):     """     line - original string that should be encoded     root - generated Huffman tree.      """     # Get a dictionary that stores the codes     dic = {}     makeCode(root,"",dic)          # Make an encoded string     enc_s = ""     # Replace each character by it Huffman Code.     for char in line:         enc_s = enc_s + str(dic[char])       #printing results     print('original length:', len(line))     print('encoded length:', len(enc_s))     for k,v in dic.items():         print(k+':', v)     print('original string:', line)        
    print('encoded string:', enc_s)   encode_huffman(line, root)        
original length: 27 encoded length: 100 T: 0000 w: 0001 s: 0010 u: 0011 !: 0100 h: 0101 e: 011  : 100 o: 1010 r: 1011 l: 1100 d: 1101 b: 1110 t: 1111 original string: The world should be better!        
encoded string: 0000010101110000011010101111001101100001001011010001111001101100111001110011100111111111101110110100        

BAM! :)

Let's decode our line back (assuming we have our original Huffman tree prepared)

def decode_huffman(enc_s, root):     '''     enc_s - encoded string     root - Huffman tree for decoding     '''     dec_s = ""     cur = root     # For each element in string s do a tree traversal depending on its value.       for i in range(len(enc_s)):         if enc_s[i] == '0':             cur = cur.left         else:             cur = cur.right         # leaf nodes contain the character corresponding to a certain huffman code.          if cur.left == None and cur.right == None:             dec_s += cur.data             cur = root          print('Encoded string:',enc_s)        
         print('Decoded string:',dec_s)   decode_huffman(enc_s, root) print('Original string:', line)        
Encoded string: 0000010101110000011010101111001101100001001011010001111001101100111001110011100111111111101110110100 Decoded string: The world should be better!        
Original string: The world should be better!        

Double BAM! :)

Hope you enjoy this simplified example.

Best regards,

threlkeldpless1969.blogspot.com

Source: https://www.linkedin.com/pulse/huffman-code-tree-simplified-igor-onyshchenko-phd