RE: ascii character - removing chars from string

Discussion in 'Python' started by bruce, Jul 4, 2006.

  1. bruce

    bruce Guest

    simon...

    the issue that i'm seeing is not a result of simply using the
    'string.replace' function. it appears that there's something else going on
    in the text....

    although i can see the nbsp in the file, the file is manipulated by a number
    of other functions prior to me writing the information out to a file.
    somewhere the 'nbsp' is changed, so there's something else going on...

    however, the error i get indicates that the char 'u\xa0' is what's causing
    the issue.. as far as i can determine, the string.replace can't/doesn't
    handle non-ascii chars. i'm still looking for a way to search/replace
    non-ascii chars...

    this would/should resolve my issue..

    -bruce


    -----Original Message-----
    From: python-list-bounces+bedouglas=
    [mailto:python-list-bounces+bedouglas=]On Behalf
    Of Simon Forman
    Sent: Monday, July 03, 2006 11:28 PM
    To:
    Subject: Re: ascii character - removing chars from string


    bruce wrote:
    > simon...
    >
    > the ' ' is not to be seen/viewed as text/ascii.. it's a

    representation
    > of a hex 'u\xa0' if i recall...


    Did you not see this part of the post that you're replying to?

    > 'nbsp': '\xa0',


    My point was not that '\xa0' is an ascii character... It was that your
    initial request was very misleading:

    "i'm running into a problem where i'm seeing non-ascii chars in the
    parsing i'm doing. in looking through various docs, i can't find
    functions to remove/restrict strings to valid ascii chars."

    That's why you got three different answers to the wrong question.

    You weren't "seeing non-ascii chars" at all. You were seeing ascii
    representations of html entities that, in the case of ' ', happen
    to represent non-ascii values.

    >
    > i'm looking to remove or replace the insances with a ' ' (space)


    Simplicity:

    s.replace(' ', ' ')

    ~Simon

    "You keep using that word. I do not think it means what you think it
    means."
    -Inigo Montoya, "The Princess Bride"

    >
    > -bruce
    >
    >
    > -----Original Message-----
    > From: python-list-bounces+bedouglas=
    > [mailto:python-list-bounces+bedouglas=]On Behalf
    > Of Simon Forman
    > Sent: Monday, July 03, 2006 7:17 PM
    > To:
    > Subject: Re: ascii character - removing chars from string
    >
    >
    > bruce wrote:
    > > hi...
    > >
    > > update. i'm getting back html, and i'm getting strings like " foo

     "
    > > which is valid HTML as the ' ' is a space.

    >
    > &, n, b, s, p, ; Those are all ascii characters.
    >
    > > i need a way of stripping/removing the ' ' from the string
    > >
    > > the   needs to be treated as a single char...
    > >
    > > text = "foo cat  "
    > >
    > > ie ok_text = strip(text)
    > >
    > > ok_text = "foo cat"

    >
    > Do you really want to remove those html entities? Or would you rather
    > convert them back into the actual text they represent? Do you just
    > want to deal with  's? Or maybe the other possible entities that
    > might appear also?
    >
    > Check out htmlentitydefs.entitydefs (see
    > http://docs.python.org/lib/module-htmlentitydefs.html) it's kind of
    > ugly looking so maybe use pprint to print it:
    >
    > >>> import htmlentitydefs, pprint
    > >>> pprint.pprint(htmlentitydefs.entitydefs)

    > {'AElig': 'Æ',
    > 'Aacute': 'Á',
    > 'Acirc': 'Â',
    > .
    > .
    > .
    > 'nbsp': '\xa0',
    > .
    > .
    > .
    > etc...
    >
    >
    > HTH,
    > ~Simon
    >
    > "You keep using that word. I do not think it means what you think it
    > means."
    > -Inigo Montoya, "The Princess Bride"
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list


    --
    http://mail.python.org/mailman/listinfo/python-list
     
    bruce, Jul 4, 2006
    #1
    1. Advertising

  2. On Tue, 04 Jul 2006 08:09:53 -0700, bruce wrote:

    > simon...
    >
    > the issue that i'm seeing is not a result of simply using the
    > 'string.replace' function. it appears that there's something else going on
    > in the text....
    >
    > although i can see the nbsp in the file, the file is manipulated by a number
    > of other functions prior to me writing the information out to a file.
    > somewhere the 'nbsp' is changed, so there's something else going on...
    >
    > however, the error i get indicates that the char 'u\xa0' is what's causing
    > the issue..


    As you have written it, that's not a character, it is a string of length
    two. Did you perhaps mean the Unicode character u'\xa0'?

    >>> len('u\xa0')

    2
    >>> len(u'\xa0')

    1


    > as far as i can determine, the string.replace can't/doesn't
    > handle non-ascii chars. i'm still looking for a way to search/replace
    > non-ascii chars...


    Seems to work for me:

    >>> c = u'\xa0'
    >>> s = "hello " + c + " world"
    >>> s

    u'hello \xa0 world'
    >>> s.replace(c, "?")

    u'hello ? world'



    --
    Steven.
     
    Steven D'Aprano, Jul 4, 2006
    #2
    1. Advertising

  3. bruce wrote:

    > i've done the s.replace('\xa0','') with no luck.


    let me guess: you wrote

    s.replace("\xa0", "")

    instead of

    s = s.replace("\xa0", "")

    ?

    </F>
     
    Fredrik Lundh, Jul 4, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. bruce
    Replies:
    6
    Views:
    831
    Simon Forman
    Jul 4, 2006
  2. bruce
    Replies:
    1
    Views:
    618
    Marc 'BlackJack' Rintsch
    Jul 4, 2006
  3. bruce
    Replies:
    1
    Views:
    2,476
    Steven D'Aprano
    Jul 4, 2006
  4. TOXiC
    Replies:
    5
    Views:
    1,261
    TOXiC
    Jan 31, 2007
  5. Replies:
    3
    Views:
    116
    Benoit Lefebvre
    Dec 24, 2007
Loading...

Share This Page