S
syntotic
How come I did not think of it before! For big text quantities, if you losea few letters it does not matter at all... it is a typo, period. It happens to anyone when writing without proofreading. The Human processor will be able, in MOST CASES, to recover the original word. In a worse case a whole sentence loses meaning without changing it (for instance, inverting it). Languages like English have repeated letter as a matter of orthography, so letter is the same as leter when it comes to meaning...
So what if we implement a lossy compression scheme where, before using any standard text compression algorithm, we lose some letters here and there? Iy t can be calculated very formally and implemented with a dictionary approach. To begin with, single syllable words would remain the same, but they can be simplified if there is no way in a single language they can be confused with another word. For instance, all OR can be substituted by R, all NOTby O, all YES by Y and so on. Selection of words to be simplified can be done after compiling a language dictionary and calculating the probability of confusion for dropping random letters, that is, a measure of word uniqueness. Then we take the droppings that minimized confusion across the whole dictionary while maximizing the number of letters dropped. Some heuristics, like a priori dropping repeated letters, can also be applied as a further preprocessing. This idea can be taken down to simplify the process to the level of syllables: instead of taking the full language, only known syllabic combinations are considered and substituted by a syllabic dropping...
(This is more or less what is achieved with Huffman, but heuristics would add a better compression ration a priori since the text would not be the SAME text but a different text altogether to begin with, albeit the SAME text given a native speaker Human processor...)
(WHAT THE HELL HAPPENED TO CROSS-POSTING? THOUSANDS OF POSTS WITH ONLY THREE OR FOUR NON UNDERSTANDING REPLIES. IT IS FOR SAFETY. OTHERWISE POSTS DISSAPPEAR CAUSE YOU DO NOT WANT TO ADMIT THE THEFT OF COMPUTERS. IMBECILE MUSICIANS, IT IS ISLAMIC TERRORISM WHAT THEY DO IN RADIO BUT CANNOT UNDERSTAND IT. GIVE BACK THE MUSIC FILES. GOOGLE LOST TRACK AND THEY CANNOT BELIEVE IT..)
Danilo J Bonsignore
So what if we implement a lossy compression scheme where, before using any standard text compression algorithm, we lose some letters here and there? Iy t can be calculated very formally and implemented with a dictionary approach. To begin with, single syllable words would remain the same, but they can be simplified if there is no way in a single language they can be confused with another word. For instance, all OR can be substituted by R, all NOTby O, all YES by Y and so on. Selection of words to be simplified can be done after compiling a language dictionary and calculating the probability of confusion for dropping random letters, that is, a measure of word uniqueness. Then we take the droppings that minimized confusion across the whole dictionary while maximizing the number of letters dropped. Some heuristics, like a priori dropping repeated letters, can also be applied as a further preprocessing. This idea can be taken down to simplify the process to the level of syllables: instead of taking the full language, only known syllabic combinations are considered and substituted by a syllabic dropping...
(This is more or less what is achieved with Huffman, but heuristics would add a better compression ration a priori since the text would not be the SAME text but a different text altogether to begin with, albeit the SAME text given a native speaker Human processor...)
(WHAT THE HELL HAPPENED TO CROSS-POSTING? THOUSANDS OF POSTS WITH ONLY THREE OR FOUR NON UNDERSTANDING REPLIES. IT IS FOR SAFETY. OTHERWISE POSTS DISSAPPEAR CAUSE YOU DO NOT WANT TO ADMIT THE THEFT OF COMPUTERS. IMBECILE MUSICIANS, IT IS ISLAMIC TERRORISM WHAT THEY DO IN RADIO BUT CANNOT UNDERSTAND IT. GIVE BACK THE MUSIC FILES. GOOGLE LOST TRACK AND THEY CANNOT BELIEVE IT..)
Danilo J Bonsignore