Is there a function to remove escape characters from a string ?

Stef Mientki · Dec 25, 2008

hello,

Is there a function to remove escape characters from a string ?
(preferable all escape characters except "\n").

thanks,
Stef

James Stroud · Dec 25, 2008

Stef said:
hello,

Is there a function to remove escape characters from a string ?
(preferable all escape characters except "\n").

thanks,
Stef

import string

WANTED = string.printable[:-5] + "\n"

def descape(s, w=WANTED):
return "".join(c for c in s if c in w)

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com

John Machin · Dec 25, 2008

hello,

Is there a function to remove escape characters from a string ?
(preferable all escape characters except "\n").

"\n" is not what most people would call an escape character. The "\"
is what most people would call an escape character when it is used in
a manner like in a Python non-raw string (e.g. "1\tStef\r\n2\tJames\r
\n").

Assuming (as James has done) that you meant you want to remove all but
"truly visible ASCII characters, plus newline", I'd have to ask: Are
you sure?? Do you really want to throw away tabs, when they might be
separating fields, as in the above example?

Let's start at the beginning:

Python 2.x or 3.x?
Type of your data objects is str/bytes or unicode/str or both?
If str/bytes, what encoding(s)?
What exactly are these "escape characters"?
Are you sure that you need to remove them all i.e. you don't want to
replace some with other characters?

HTH,
John

Steven D'Aprano · Dec 25, 2008

hello,

Is there a function to remove escape characters from a string ?
(preferable all escape characters except "\n").

Can you explain what you mean? I can think of at least four alternatives:

(1) Remove literal escape sequences (backslash-char):
"abc\\t\\ad" => "abcd"
r"abc\t\ad" => "abcd"

(2) Replace literal escape sequences with the character they represent:
"abc\\t\\ad" => "abc\t\ad"

(3) Remove characters generated by escape sequences:
"abc\t\ad" => "abcd"
"abc" => "abc" but "a\x62c" => "ac"

This is likely to be impossible without deep magic.

(4) Remove so-called binary characters which are typically inserted using
escape sequences:
"abc\t\ad" => "abcd"
"abc" => "abc" but "a\x62c" => "abc"

This is probably the easiest, assuming you have bytes instead of unicode.

import string
table = string.maketrans('', '')
delchars =''.join(chr(n) for n in range(32))

s = string.translate(s, table, delchars)

Stef Mientki · Dec 25, 2008

Steven said:
Can you explain what you mean? I can think of at least four alternatives:

I have the following kind of strings,
the funny "þ" is ASCII character 254, used as a separator character

[FSM]
Counts = "1þ11þ16" ==> 1,11,16
Init1 = "1þ\BCtrl" ==> 1,Ctrl
State5 = "8þ\BJUMP_COMPL\b\n>PCWrite = 1\n>PCSource = 10"
==> 8, JUMP_COMPL\n>PCWrite = 1\n>PCSource = 10

Seeing and testing all your answers, with great solutions that I've
never seen before,
knowing nothing of escape sequences (I'm a windows guy ;-)
I now see that the characters I need to remove, like \B and \b are
not "official" escape sequences.
So in this case the best (easiest to understand) method is a few replace
statements:
s = s.replace ( '\b', '' ).replace( '\B', '' )

Nevertheless, thank you all for the other examples,

cheers,
Stef

John Machin · Dec 26, 2008

I have the following kind of strings,
the funny "þ" is ASCII character 254, used as a separator character

ASCII ends at 127. Just refer to it as chr(254).

[FSM]
Counts = "1þ11þ16" ==> 1,11,16
Init1 = "1þ\BCtrl" ==> 1,Ctrl
State5 = "8þ\BJUMP_COMPL\b\n>PCWrite = 1\n>PCSource = 10"
==> 8, JUMP_COMPL\n>PCWrite = 1\n>PCSource = 10

After making those substitutions, what are you going to do with it?
Split it up into fields using the csv module or stuff.split(",") or
some other DIY method? Is there a possibility that whoever "designed"
that data format used chr(254) as a separator because the data fields
contained "," sometimes and so "," could not be used as a separator?

Seeing and testing all your answers, with great solutions that I've
never seen before,

As far as str methods and built-ins that work on str objects are
concerned, there is no corpus of secret knowledge known only to a
cabal of wizards; it's all in the manual, and you don't need special
magical spectacles to see it

knowing nothing of escape sequences (I'm a windows guy ;-)

Why do you think that whether or not you are a "windows guy" is
relevant to knowing anything about escape sequences?

I now see that the characters I need to remove, like \B and \b are
not "official" escape sequences.

\b *is* an "official" escape sequence, just like \n; see below:

| >>> x = '\b'; print len(x), repr(x)
| 1 '\x08'
| >>> x = r'\b'; print len(x), repr(x)
| 2 '\\b'
| >>> x = '\B'; print len(x), repr(x)
| 2 '\\B'
| >>> x = r'\B'; print len(x), repr(x)
| 2 '\\B'

So in this case the best (easiest to understand) method is a few replace
statements:
s = s.replace ( '\b', '' ).replace( '\B', '' )

It's probable that \b and \B are both TWO-byte sequences, in which
case you should use r'\b' so that it does what you want it to do, and
use r'\B' for consistency.

Stef Mientki · Dec 26, 2008

I have the following kind of strings,
the funny "þ" is ASCII character 254, used as a separator character

Click to expand...

ASCII ends at 127. Just refer to it as chr(254).

note 1)

[FSM]
Counts = "1þ11þ16" ==> 1,11,16
Init1 = "1þ\BCtrl" ==> 1,Ctrl
State5 = "8þ\BJUMP_COMPL\b\n>PCWrite = 1\n>PCSource = 10"
==> 8, JUMP_COMPL\n>PCWrite = 1\n>PCSource = 10

Click to expand...

After making those substitutions, what are you going to do with it?
Split it up into fields using the csv module or stuff.split(",") or
some other DIY method? Is there a possibility that whoever "designed"
that data format used chr(254) as a separator because the data fields
contained "," sometimes and so "," could not be used as a separator?

Yep, chr(254), because it's not in the human range of characters
and it's accepted by windows ini-files.

As far as str methods and built-ins that work on str objects are
concerned, there is no corpus of secret knowledge known only to a
cabal of wizards; it's all in the manual, and you don't need special
magical spectacles to see it

note 2)

Why do you think that whether or not you are a "windows guy" is
relevant to knowing anything about escape sequences?

Just a windows guy,
or maybe better, "being a windows guy for many years",
windows users are wysiwyg users, they are not dealing with individual bits.
I personally left escape sequences and values of ASCII characters behind
me more than 25 years ago.
And now maybe you might understand note 1) and note 2) .

cheers,
Stef

John Machin · Dec 26, 2008

Yep, chr(254), because it's not in the human range of characters
and it's accepted by windows ini-files.

.... s = chr(254)
.... enc = 'cp125' + str(i)
.... try:
.... u = s.decode(enc)
.... except UnicodeDecodeError:
.... continue
.... print enc, 'U+%04X' % ord(u), ucd.name(u)
....
cp1250 U+0163 LATIN SMALL LETTER T WITH CEDILLA
cp1251 U+044E CYRILLIC SMALL LETTER YU
cp1252 U+00FE LATIN SMALL LETTER THORN
cp1253 U+03CE GREEK SMALL LETTER OMEGA WITH TONOS
cp1254 U+015F LATIN SMALL LETTER S WITH CEDILLA
cp1257 U+017E LATIN SMALL LETTER Z WITH CARON
cp1258 U+20AB DONG SIGN

Either you have a strange and narrow definition of "human", or you are
so brave as to cheerfully insult (inter alia) Romanians, Russians,
Icelanders, Greeks, Turks, Czechs, Estonians, Finns, Slovaks,
Slovenians, and Vietnamese

Stef Mientki · Dec 27, 2008

John said:
... s = chr(254)
... enc = 'cp125' + str(i)
... try:
... u = s.decode(enc)
... except UnicodeDecodeError:
... continue
... print enc, 'U+%04X' % ord(u), ucd.name(u)
...
cp1250 U+0163 LATIN SMALL LETTER T WITH CEDILLA
cp1251 U+044E CYRILLIC SMALL LETTER YU
cp1252 U+00FE LATIN SMALL LETTER THORN
cp1253 U+03CE GREEK SMALL LETTER OMEGA WITH TONOS
cp1254 U+015F LATIN SMALL LETTER S WITH CEDILLA
cp1257 U+017E LATIN SMALL LETTER Z WITH CARON
cp1258 U+20AB DONG SIGN

Either you have a strange and narrow definition of "human", or you are
so brave as to cheerfully insult (inter alia) Romanians, Russians,
Icelanders, Greeks, Turks, Czechs, Estonians, Finns, Slovaks,
Slovenians, and Vietnamese

Sorry if I offended someone, that was certainly not my intention.
And I guess you will be surprised, if I tell you, I don't (want) to
understand any bit of the above code ;-)
Come on, the home computer was invented about 1980.
If we look at hardware, it follows the Moore's law,
for software I would expect at least 0.1 of Moore's law ;-)
I hope that clarifies my point.

cheers,
Stef

Steven D'Aprano · Dec 27, 2008

Sorry if I offended someone, that was certainly not my intention. And I
guess you will be surprised, if I tell you, I don't (want) to understand
any bit of the above code ;-) Come on, the home computer was invented
about 1980. If we look at hardware, it follows the Moore's law, for
software I would expect at least 0.1 of Moore's law ;-) I hope that
clarifies my point.

No, that only makes it even more confusing. What does Moore's Law have to
do with your willful ignorance about the existence of human languages
other than English?

Stef Mientki · Dec 27, 2008

Steven said:
No, that only makes it even more confusing. What does Moore's Law have to
do with your willful ignorance about the existence of human languages
other than English?

Nothing.
I even don't (want to) see what bits / bytes / escape sequences have to
do with modern programming techniques,
so I certainly don't see any relation between these and human languages.

But the lack of Moore's law in software explains why we still need to
concern about bits and bytes ;-)

cheers,
Stef

Martin · Dec 28, 2008

2008/12/27 Stef Mientki said:
Nothing.
I even don't (want to) see what bits / bytes / escape sequences have to do
with modern programming techniques,
so I certainly don't see any relation between these and human languages.

But the lack of Moore's law in software explains why we still need to
concern about bits and bytes ;-)

http://www.joelonsoftware.com/articles/Unicode.html

--
http://soup.alt.delete.co.at
http://www.xing.com/profile/Martin_Marcher
http://www.linkedin.com/in/martinmarcher

You are not free to read this message,
by doing so, you have violated my licence
and are required to urinate publicly. Thank you.

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

STRING - Remove small letters from string	1	Jan 20, 2023
Did you know that there is a match-case function in python?	4	Dec 17, 2023
Sort and count word pairs in a string	6	Jan 29, 2023
Remove unwanted characters from column	9	Jan 27, 2014
Ordenate and remove duplicate cases of an array passed as a c function argument	0	Sep 27, 2022
Is it possible to get string from function?	7	Jan 16, 2014
Is there a way to pass this state from component to the fetch?	1	Apr 24, 2023
curses and processing terminal escape characters	4	Oct 29, 2010

Is there a function to remove escape characters from a string ?

Stef Mientki

James Stroud

John Machin

Steven D'Aprano

Stef Mientki

John Machin

Stef Mientki

John Machin

Stef Mientki

Steven D'Aprano

Stef Mientki

Martin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads