A text lossy compression scheme

syntotic · Sep 1, 2012

How come I did not think of it before! For big text quantities, if you losea few letters it does not matter at all... it is a typo, period. It happens to anyone when writing without proofreading. The Human processor will be able, in MOST CASES, to recover the original word. In a worse case a whole sentence loses meaning without changing it (for instance, inverting it). Languages like English have repeated letter as a matter of orthography, so letter is the same as leter when it comes to meaning...

So what if we implement a lossy compression scheme where, before using any standard text compression algorithm, we lose some letters here and there? Iy t can be calculated very formally and implemented with a dictionary approach. To begin with, single syllable words would remain the same, but they can be simplified if there is no way in a single language they can be confused with another word. For instance, all OR can be substituted by R, all NOTby O, all YES by Y and so on. Selection of words to be simplified can be done after compiling a language dictionary and calculating the probability of confusion for dropping random letters, that is, a measure of word uniqueness. Then we take the droppings that minimized confusion across the whole dictionary while maximizing the number of letters dropped. Some heuristics, like a priori dropping repeated letters, can also be applied as a further preprocessing. This idea can be taken down to simplify the process to the level of syllables: instead of taking the full language, only known syllabic combinations are considered and substituted by a syllabic dropping...

(This is more or less what is achieved with Huffman, but heuristics would add a better compression ration a priori since the text would not be the SAME text but a different text altogether to begin with, albeit the SAME text given a native speaker Human processor...)

(WHAT THE HELL HAPPENED TO CROSS-POSTING? THOUSANDS OF POSTS WITH ONLY THREE OR FOUR NON UNDERSTANDING REPLIES. IT IS FOR SAFETY. OTHERWISE POSTS DISSAPPEAR CAUSE YOU DO NOT WANT TO ADMIT THE THEFT OF COMPUTERS. IMBECILE MUSICIANS, IT IS ISLAMIC TERRORISM WHAT THEY DO IN RADIO BUT CANNOT UNDERSTAND IT. GIVE BACK THE MUSIC FILES. GOOGLE LOST TRACK AND THEY CANNOT BELIEVE IT..)

Danilo J Bonsignore

syntotic · Sep 2, 2012

Sorry, again FIGHTING to reach a wi fi outlet before... inspiration down. Well, anyway.

....
The plaintext can be thought of as the normal B temperature, then the cypher-plaintext in this method is equivalent to cooling down the B temperature of the text.
....
Substitutions for substitutions for common words can be optimized to give bit based compression minimums; vg, select either n or d or... for AND (but obviously not a).
....
Hey! I am thinking of Hebrew! Maybe it is the process they went through...?!* And forgot to decipher the cypher-plaintext!
....
A similar idea can be applied to pictorial text by taking pairs of letters and saving them as the overimposed symbol. Then the width of the picture isthat of the line that accepted less compression. Counterexample: co* cannot be compressed this way, obviously; example: de* would give the equivalentto a strikethrough d and would be easy to decode by sight and by machine. A standard OCR algorithm can be trained to decode the new symbols, an easy problem if the plaintext is a typographic picture rather than manuscript script.
....

Danilo J Bonsignore

Measuring a string of text	1	Sep 15, 2022
C language. work with text	3	Dec 10, 2021
Genetic algoritm generating the text	0	Aug 18, 2023
Text File Only Programming	1	May 10, 2023
extending a scheme	0	Jul 31, 2011
I have to finish this code for my assignment but I cant figure out how to solve it	1	Jun 27, 2023
WIN32 - Update Text in a Window in order to show its size in Pixels and coordinates	0	Oct 4, 2023
Iframe link overlapping text	4	Jan 18, 2021

A text lossy compression scheme

syntotic

syntotic

Ask a Question

Similar Threads

Staff online

Members online

Forum statistics

Latest Threads