Text Compression - Eliminating vowels and spaces

118 days ago by bennoms

original = (open(DATA+'text.txt')).read() #original = "FACT IS STRONGER THAN FICTION" compressed = original compressed = compressed.replace('a','') compressed = compressed.replace('e','') compressed = compressed.replace('i','') compressed = compressed.replace('o','') compressed = compressed.replace('u','') compressed = compressed.replace('A','') compressed = compressed.replace('E','') compressed = compressed.replace('I','') compressed = compressed.replace('O','') compressed = compressed.replace('U','') compressed = compressed.replace(' ','') print original + "(" + str(len(original)) + ")" print print compressed + "(" + str(len(compressed)) + ")" print print "Compression Rate = " +"%1.2f" % (float(len(compressed))/float(len(original))) 
       
The Code Book covers a diverse set of historical topics including the
Man in the Iron Mask, Arabic cryptography, Charles Babbage, the
mechanisation of cryptography, the Enigma Machine, and the decipherment
of Linear B and other ancient writing systems. Later sections cover the
development of public key cryptography and some of this material is
based on interviews with the participants, including those who worked in
secret at GCHQ. The book concludes with a discussion of PGP, quantum
computing, and quantum cryptography.


(530)

ThCdBkcvrsdvrsstfhstrcltpcsncldngthMnnthrnMsk,rbccryptgrphy,ChrlsBbbg,th\
mchnstnfcryptgrphy,thngmMchn,ndthdcphrmntfLnrBndthrncntwrtngsystms.Ltrsc\
tnscvrthdvlpmntfpblckycryptgrphyndsmfthsmtrlsbsdnntrvwswththprtcpnts,ncl\
dngthswhwrkdnscrttGCHQ.ThbkcncldswthdscssnfPGP,qntmcmptng,ndqntmcryptgrp\
hy.


(297)

Compression Rate = 0.56
The Code Book covers a diverse set of historical topics including the Man in the Iron Mask, Arabic cryptography, Charles Babbage, the mechanisation of cryptography, the Enigma Machine, and the decipherment of Linear B and other ancient writing systems. Later sections cover the development of public key cryptography and some of this material is based on interviews with the participants, including those who worked in secret at GCHQ. The book concludes with a discussion of PGP, quantum computing, and quantum cryptography.


(530)

ThCdBkcvrsdvrsstfhstrcltpcsncldngthMnnthrnMsk,rbccryptgrphy,ChrlsBbbg,thmchnstnfcryptgrphy,thngmMchn,ndthdcphrmntfLnrBndthrncntwrtngsystms.Ltrsctnscvrthdvlpmntfpblckycryptgrphyndsmfthsmtrlsbsdnntrvwswththprtcpnts,ncldngthswhwrkdnscrttGCHQ.ThbkcncldswthdscssnfPGP,qntmcmptng,ndqntmcryptgrphy.


(297)

Compression Rate = 0.56
alphabet = dict() length = len(original) for symbol in original: if symbol in alphabet: alphabet[symbol] += 1 / length else: alphabet[symbol] = 1 / length entropy = 0 for symbol, fraction in alphabet.iteritems(): entropy -= fraction * log(fraction, 2) #print symbol, float(fraction) print "\nEntropy Original: " + str(float(entropy))# alphabet = dict() length = len(compressed) for symbol in compressed: if symbol in alphabet: alphabet[symbol] += 1 / length else: alphabet[symbol] = 1 / length entropy = 0 for symbol, fraction in alphabet.iteritems(): entropy -= fraction * log(fraction, 2) #print symbol, float(fraction) print "\nEntropy Compressed: " + str(float(entropy))# 
       
Entropy Original: 4.49845624995

Entropy Compressed: 4.32942578129
Entropy Original: 4.49845624995

Entropy Compressed: 4.32942578129