Is there a function to remove escape characters from a string ?

Discussion in 'Python' started by Stef Mientki, Dec 25, 2008.

  1. Stef Mientki

    Stef Mientki Guest

    hello,

    Is there a function to remove escape characters from a string ?
    (preferable all escape characters except "\n").

    thanks,
    Stef
    Stef Mientki, Dec 25, 2008
    #1
    1. Advertising

  2. Stef Mientki

    James Stroud Guest

    Re: Is there a function to remove escape characters from a string?

    Stef Mientki wrote:
    > hello,
    >
    > Is there a function to remove escape characters from a string ?
    > (preferable all escape characters except "\n").
    >
    > thanks,
    > Stef



    import string

    WANTED = string.printable[:-5] + "\n"

    def descape(s, w=WANTED):
    return "".join(c for c in s if c in w)


    James


    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com
    James Stroud, Dec 25, 2008
    #2
    1. Advertising

  3. Stef Mientki

    John Machin Guest

    On Dec 25, 9:00 pm, Stef Mientki <> wrote:
    > hello,
    >
    > Is there a function to remove escape characters from a string ?
    > (preferable all escape characters except "\n").


    "\n" is not what most people would call an escape character. The "\"
    is what most people would call an escape character when it is used in
    a manner like in a Python non-raw string (e.g. "1\tStef\r\n2\tJames\r
    \n").

    Assuming (as James has done) that you meant you want to remove all but
    "truly visible ASCII characters, plus newline", I'd have to ask: Are
    you sure?? Do you really want to throw away tabs, when they might be
    separating fields, as in the above example?

    Let's start at the beginning:

    Python 2.x or 3.x?
    Type of your data objects is str/bytes or unicode/str or both?
    If str/bytes, what encoding(s)?
    What exactly are these "escape characters"?
    Are you sure that you need to remove them all i.e. you don't want to
    replace some with other characters?

    HTH,
    John
    John Machin, Dec 25, 2008
    #3
  4. On Thu, 25 Dec 2008 11:00:18 +0100, Stef Mientki wrote:

    > hello,
    >
    > Is there a function to remove escape characters from a string ?
    > (preferable all escape characters except "\n").



    Can you explain what you mean? I can think of at least four alternatives:

    (1) Remove literal escape sequences (backslash-char):
    "abc\\t\\ad" => "abcd"
    r"abc\t\ad" => "abcd"


    (2) Replace literal escape sequences with the character they represent:
    "abc\\t\\ad" => "abc\t\ad"


    (3) Remove characters generated by escape sequences:
    "abc\t\ad" => "abcd"
    "abc" => "abc" but "a\x62c" => "ac"

    This is likely to be impossible without deep magic.


    (4) Remove so-called binary characters which are typically inserted using
    escape sequences:
    "abc\t\ad" => "abcd"
    "abc" => "abc" but "a\x62c" => "abc"

    This is probably the easiest, assuming you have bytes instead of unicode.

    import string
    table = string.maketrans('', '')
    delchars =''.join(chr(n) for n in range(32))

    s = string.translate(s, table, delchars)



    --
    Steven
    Steven D'Aprano, Dec 25, 2008
    #4
  5. Stef Mientki

    Stef Mientki Guest

    Re: Is there a function to remove escape characters from a string?

    Steven D'Aprano wrote:
    > On Thu, 25 Dec 2008 11:00:18 +0100, Stef Mientki wrote:
    >
    >
    >> hello,
    >>
    >> Is there a function to remove escape characters from a string ?
    >> (preferable all escape characters except "\n").
    >>

    >
    >
    > Can you explain what you mean? I can think of at least four alternatives:
    >

    I have the following kind of strings,
    the funny "þ" is ASCII character 254, used as a separator character

    [FSM]
    Counts = "1þ11þ16" ==> 1,11,16
    Init1 = "1þ\BCtrl" ==> 1,Ctrl
    State5 = "8þ\BJUMP_COMPL\b\n>PCWrite = 1\n>PCSource = 10"
    ==> 8, JUMP_COMPL\n>PCWrite = 1\n>PCSource = 10

    Seeing and testing all your answers, with great solutions that I've
    never seen before,
    knowing nothing of escape sequences (I'm a windows guy ;-)
    I now see that the characters I need to remove, like \B and \b are
    not "official" escape sequences.
    So in this case the best (easiest to understand) method is a few replace
    statements:
    s = s.replace ( '\b', '' ).replace( '\B', '' )

    Nevertheless, thank you all for the other examples,

    cheers,
    Stef


    > (1) Remove literal escape sequences (backslash-char):
    > "abc\\t\\ad" => "abcd"
    > r"abc\t\ad" => "abcd"
    >
    >
    > (2) Replace literal escape sequences with the character they represent:
    > "abc\\t\\ad" => "abc\t\ad"
    >
    >
    > (3) Remove characters generated by escape sequences:
    > "abc\t\ad" => "abcd"
    > "abc" => "abc" but "a\x62c" => "ac"
    >
    > This is likely to be impossible without deep magic.
    >
    >
    > (4) Remove so-called binary characters which are typically inserted using
    > escape sequences:
    > "abc\t\ad" => "abcd"
    > "abc" => "abc" but "a\x62c" => "abc"
    >
    > This is probably the easiest, assuming you have bytes instead of unicode.
    >
    > import string
    > table = string.maketrans('', '')
    > delchars =''.join(chr(n) for n in range(32))
    >
    > s = string.translate(s, table, delchars)
    >
    >
    >
    >
    Stef Mientki, Dec 25, 2008
    #5
  6. Stef Mientki

    John Machin Guest

    On Dec 26, 8:53 am, Stef Mientki <> wrote:
    > Steven D'Aprano wrote:
    > > On Thu, 25 Dec 2008 11:00:18 +0100, Stef Mientki wrote:

    >
    > >> hello,

    >
    > >> Is there a function to remove escape characters from a string ?
    > >> (preferable all escape characters except "\n").

    >
    > > Can you explain what you mean? I can think of at least four alternatives:

    >
    > I have the following kind of strings,
    > the funny "þ" is ASCII character 254, used as a separator character


    ASCII ends at 127. Just refer to it as chr(254).

    >
    > [FSM]
    > Counts = "1þ11þ16"     ==>   1,11,16
    > Init1 = "1þ\BCtrl"     ==>    1,Ctrl
    > State5 = "8þ\BJUMP_COMPL\b\n>PCWrite = 1\n>PCSource = 10"
    >          ==> 8, JUMP_COMPL\n>PCWrite = 1\n>PCSource = 10


    After making those substitutions, what are you going to do with it?
    Split it up into fields using the csv module or stuff.split(",") or
    some other DIY method? Is there a possibility that whoever "designed"
    that data format used chr(254) as a separator because the data fields
    contained "," sometimes and so "," could not be used as a separator?

    > Seeing and testing all your answers, with great solutions that I've
    > never seen before,


    As far as str methods and built-ins that work on str objects are
    concerned, there is no corpus of secret knowledge known only to a
    cabal of wizards; it's all in the manual, and you don't need special
    magical spectacles to see it :)

    > knowing nothing of escape sequences (I'm a windows guy ;-)


    Why do you think that whether or not you are a "windows guy" is
    relevant to knowing anything about escape sequences?

    > I now see that the characters I need to remove, like  \B  and \b  are
    > not "official" escape sequences.


    \b *is* an "official" escape sequence, just like \n; see below:

    | >>> x = '\b'; print len(x), repr(x)
    | 1 '\x08'
    | >>> x = r'\b'; print len(x), repr(x)
    | 2 '\\b'
    | >>> x = '\B'; print len(x), repr(x)
    | 2 '\\B'
    | >>> x = r'\B'; print len(x), repr(x)
    | 2 '\\B'

    > So in this case the best (easiest to understand) method is a few replace
    > statements:
    > s = s.replace ( '\b', '' ).replace( '\B',  '' )


    It's probable that \b and \B are both TWO-byte sequences, in which
    case you should use r'\b' so that it does what you want it to do, and
    use r'\B' for consistency.
    John Machin, Dec 26, 2008
    #6
  7. Stef Mientki

    Stef Mientki Guest

    Re: Is there a function to remove escape characters from a string?


    >> I have the following kind of strings,
    >> the funny "þ" is ASCII character 254, used as a separator character
    >>

    >
    > ASCII ends at 127. Just refer to it as chr(254).
    >
    >

    note 1)
    >> [FSM]
    >> Counts = "1þ11þ16" ==> 1,11,16
    >> Init1 = "1þ\BCtrl" ==> 1,Ctrl
    >> State5 = "8þ\BJUMP_COMPL\b\n>PCWrite = 1\n>PCSource = 10"
    >> ==> 8, JUMP_COMPL\n>PCWrite = 1\n>PCSource = 10
    >>

    >
    > After making those substitutions, what are you going to do with it?
    > Split it up into fields using the csv module or stuff.split(",") or
    > some other DIY method? Is there a possibility that whoever "designed"
    > that data format used chr(254) as a separator because the data fields
    > contained "," sometimes and so "," could not be used as a separator?
    >
    >

    Yep, chr(254), because it's not in the human range of characters
    and it's accepted by windows ini-files.
    >> Seeing and testing all your answers, with great solutions that I've
    >> never seen before,
    >>

    >
    > As far as str methods and built-ins that work on str objects are
    > concerned, there is no corpus of secret knowledge known only to a
    > cabal of wizards; it's all in the manual, and you don't need special
    > magical spectacles to see it :)
    >
    >

    note 2)
    >> knowing nothing of escape sequences (I'm a windows guy ;-)
    >>

    >
    > Why do you think that whether or not you are a "windows guy" is
    > relevant to knowing anything about escape sequences?
    >
    >

    Just a windows guy,
    or maybe better, "being a windows guy for many years",
    windows users are wysiwyg users, they are not dealing with individual bits.
    I personally left escape sequences and values of ASCII characters behind
    me more than 25 years ago.
    And now maybe you might understand note 1) and note 2) .

    cheers,
    Stef
    Stef Mientki, Dec 26, 2008
    #7
  8. Stef Mientki

    John Machin Guest

    On Dec 27, 12:05 am, Stef Mientki <> wrote:

    > Yep, chr(254), because it's not in the human range of characters
    > and it's accepted by windows ini-files.


    >>> import unicodedata as ucd
    >>> for i in (0,1,2,3,4,7,8):

    .... s = chr(254)
    .... enc = 'cp125' + str(i)
    .... try:
    .... u = s.decode(enc)
    .... except UnicodeDecodeError:
    .... continue
    .... print enc, 'U+%04X' % ord(u), ucd.name(u)
    ....
    cp1250 U+0163 LATIN SMALL LETTER T WITH CEDILLA
    cp1251 U+044E CYRILLIC SMALL LETTER YU
    cp1252 U+00FE LATIN SMALL LETTER THORN
    cp1253 U+03CE GREEK SMALL LETTER OMEGA WITH TONOS
    cp1254 U+015F LATIN SMALL LETTER S WITH CEDILLA
    cp1257 U+017E LATIN SMALL LETTER Z WITH CARON
    cp1258 U+20AB DONG SIGN

    Either you have a strange and narrow definition of "human", or you are
    so brave as to cheerfully insult (inter alia) Romanians, Russians,
    Icelanders, Greeks, Turks, Czechs, Estonians, Finns, Slovaks,
    Slovenians, and Vietnamese :)
    John Machin, Dec 26, 2008
    #8
  9. Stef Mientki

    Stef Mientki Guest

    Re: Is there a function to remove escape characters from a string?

    John Machin wrote:
    > On Dec 27, 12:05 am, Stef Mientki <> wrote:
    >
    >
    >> Yep, chr(254), because it's not in the human range of characters
    >> and it's accepted by windows ini-files.
    >>

    >
    >
    >>>> import unicodedata as ucd
    >>>> for i in (0,1,2,3,4,7,8):
    >>>>

    > ... s = chr(254)
    > ... enc = 'cp125' + str(i)
    > ... try:
    > ... u = s.decode(enc)
    > ... except UnicodeDecodeError:
    > ... continue
    > ... print enc, 'U+%04X' % ord(u), ucd.name(u)
    > ...
    > cp1250 U+0163 LATIN SMALL LETTER T WITH CEDILLA
    > cp1251 U+044E CYRILLIC SMALL LETTER YU
    > cp1252 U+00FE LATIN SMALL LETTER THORN
    > cp1253 U+03CE GREEK SMALL LETTER OMEGA WITH TONOS
    > cp1254 U+015F LATIN SMALL LETTER S WITH CEDILLA
    > cp1257 U+017E LATIN SMALL LETTER Z WITH CARON
    > cp1258 U+20AB DONG SIGN
    >
    > Either you have a strange and narrow definition of "human", or you are
    > so brave as to cheerfully insult (inter alia) Romanians, Russians,
    > Icelanders, Greeks, Turks, Czechs, Estonians, Finns, Slovaks,
    > Slovenians, and Vietnamese :)
    >

    Sorry if I offended someone, that was certainly not my intention.
    And I guess you will be surprised, if I tell you, I don't (want) to
    understand any bit of the above code ;-)
    Come on, the home computer was invented about 1980.
    If we look at hardware, it follows the Moore's law,
    for software I would expect at least 0.1 of Moore's law ;-)
    I hope that clarifies my point.

    cheers,
    Stef
    Stef Mientki, Dec 27, 2008
    #9
  10. On Sat, 27 Dec 2008 01:41:40 +0100, Stef Mientki wrote:

    > Sorry if I offended someone, that was certainly not my intention. And I
    > guess you will be surprised, if I tell you, I don't (want) to understand
    > any bit of the above code ;-) Come on, the home computer was invented
    > about 1980. If we look at hardware, it follows the Moore's law, for
    > software I would expect at least 0.1 of Moore's law ;-) I hope that
    > clarifies my point.


    No, that only makes it even more confusing. What does Moore's Law have to
    do with your willful ignorance about the existence of human languages
    other than English?



    --
    Steven
    Steven D'Aprano, Dec 27, 2008
    #10
  11. Stef Mientki

    Stef Mientki Guest

    Re: Is there a function to remove escape characters from a string?

    Steven D'Aprano wrote:
    > On Sat, 27 Dec 2008 01:41:40 +0100, Stef Mientki wrote:
    >
    >
    >> Sorry if I offended someone, that was certainly not my intention. And I
    >> guess you will be surprised, if I tell you, I don't (want) to understand
    >> any bit of the above code ;-) Come on, the home computer was invented
    >> about 1980. If we look at hardware, it follows the Moore's law, for
    >> software I would expect at least 0.1 of Moore's law ;-) I hope that
    >> clarifies my point.
    >>

    >
    > No, that only makes it even more confusing. What does Moore's Law have to
    > do with your willful ignorance about the existence of human languages
    > other than English?
    >
    >

    Nothing.
    I even don't (want to) see what bits / bytes / escape sequences have to
    do with modern programming techniques,
    so I certainly don't see any relation between these and human languages.

    But the lack of Moore's law in software explains why we still need to
    concern about bits and bytes ;-)

    cheers,
    Stef
    Stef Mientki, Dec 27, 2008
    #11
  12. Stef Mientki

    Martin Guest

    2008/12/27 Stef Mientki <>:
    > Steven D'Aprano wrote:
    >> No, that only makes it even more confusing. What does Moore's Law have to
    >> do with your willful ignorance about the existence of human languages other
    >> than English?
    >>

    > Nothing.
    > I even don't (want to) see what bits / bytes / escape sequences have to do
    > with modern programming techniques,
    > so I certainly don't see any relation between these and human languages.
    >
    > But the lack of Moore's law in software explains why we still need to
    > concern about bits and bytes ;-)


    http://www.joelonsoftware.com/articles/Unicode.html



    --
    http://soup.alt.delete.co.at
    http://www.xing.com/profile/Martin_Marcher
    http://www.linkedin.com/in/martinmarcher

    You are not free to read this message,
    by doing so, you have violated my licence
    and are required to urinate publicly. Thank you.

    Please avoid sending me Word or PowerPoint attachments.
    See http://www.gnu.org/philosophy/no-word-attachments.html
    Martin, Dec 28, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?YmFzdWxhc3o=?=

    Are there escape characters for SQL?

    =?Utf-8?B?YmFzdWxhc3o=?=, Jul 7, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    10,943
    Patrice
    Jul 7, 2005
  2. polilop

    String and escape characters

    polilop, Dec 15, 2006, in forum: Java
    Replies:
    2
    Views:
    617
    polilop
    Dec 15, 2006
  3. Replies:
    5
    Views:
    74,941
    opalpa http://opalpa.info
    Feb 5, 2007
  4. rvino
    Replies:
    0
    Views:
    4,651
    rvino
    Aug 14, 2007
  5. slomo
    Replies:
    5
    Views:
    1,531
    Duncan Booth
    Dec 2, 2007
Loading...

Share This Page