Removing obscure chars

Discussion in 'ASP General' started by Yobbo, Apr 3, 2007.

  1. Yobbo

    Yobbo Guest

    Hi All

    I have an ASP function in place to strip invalid chars out of a data store
    before I create an XML file of this data, but my function doesn't work on a
    certain set of chars.

    As far as I can see these are the following:

    a) trademark char
    b) long hyphen/dash char
    c) smart/curly quotes (both left and right)

    Even though my function is set up as follows:

    Function ReFormatStringForXML(s)
    IF LEN(s) > 0 AND NOT IsNull(s) THEN
    s = Replace(s,"™","™")
    s = Replace(s,"—","-")
    s = Replace(s,"’",""")
    s = Replace(s,"'",""")
    s = Replace(s,"""",""")
    s = Replace(s,"&","&")
    s = Replace(s,"<","&lt;")
    s = Replace(s,">","&gt;")
    END IF
    ReFormatStringForXML = s
    End Function

    These chars still pass by and foul up my XML file.

    I have a feeling that its down to the fact that my function is looking for
    the html equiv rather than the actual char, but I can't possibly get away
    with simply copy and pasting these friggin(!!) chars into my function.
    Surely this is bad practise?

    Does anybody know how I can trap and replace/remove these chars if need be?

    Thanks
     
    Yobbo, Apr 3, 2007
    #1
    1. Advertising

  2. Gazing into my crystal ball I observed "Yobbo" <>
    writing in news::

    > Hi All
    >
    > I have an ASP function in place to strip invalid chars out of a data
    > store before I create an XML file of this data, but my function
    > doesn't work on a certain set of chars.
    >
    > As far as I can see these are the following:
    >
    > a) trademark char
    > b) long hyphen/dash char
    > c) smart/curly quotes (both left and right)


    I detest these "smart" quotes. Are regular quotes dumb by comparison?

    >
    > Even though my function is set up as follows:
    >
    > Function ReFormatStringForXML(s)
    > IF LEN(s) > 0 AND NOT IsNull(s) THEN
    > s = Replace(s,"™","&trade;")
    > s = Replace(s,"—","-")
    > s = Replace(s,"’","&quot;")
    > s = Replace(s,"'","&quot;")
    > s = Replace(s,"""","&quot;")
    > s = Replace(s,"&","&amp;")
    > s = Replace(s,"<","&lt;")
    > s = Replace(s,">","&gt;")
    > END IF
    > ReFormatStringForXML = s
    > End Function
    >
    > These chars still pass by and foul up my XML file.
    >
    > I have a feeling that its down to the fact that my function is looking
    > for the html equiv rather than the actual char, but I can't possibly
    > get away with simply copy and pasting these friggin(!!) chars into my
    > function. Surely this is bad practise?


    You are putting in the HTML entity, you may need to put the ascii
    character instead, for example:
    s = replace(s,chr(60),"&gt;")

    >
    > Does anybody know how I can trap and replace/remove these chars if
    > need be?
    >
    > Thanks
    >
    >
    >
    >


    HTH

    --
    Adrienne Boswell at Home
    Arbpen Web Site Design Services
    http://www.cavalcade-of-coding.info
    Please respond to the group so others can share
     
    Adrienne Boswell, Apr 4, 2007
    #2
    1. Advertising

  3. Yobbo wrote on Tue, 3 Apr 2007 18:17:59 +0100:

    > Hi All
    >
    > I have an ASP function in place to strip invalid chars out of a data store
    > before I create an XML file of this data, but my function doesn't work on
    > a certain set of chars.
    >
    > As far as I can see these are the following:
    >
    > a) trademark char
    > b) long hyphen/dash char
    > c) smart/curly quotes (both left and right)
    >
    > Even though my function is set up as follows:
    >
    > Function ReFormatStringForXML(s)
    > IF LEN(s) > 0 AND NOT IsNull(s) THEN
    > s = Replace(s,"™","&trade;")
    > s = Replace(s,"—","-")
    > s = Replace(s,"’","&quot;")
    > s = Replace(s,"'","&quot;")
    > s = Replace(s,"""","&quot;")
    > s = Replace(s,"&","&amp;")
    > s = Replace(s,"<","&lt;")
    > s = Replace(s,">","&gt;")
    > END IF
    > ReFormatStringForXML = s
    > End Function
    >
    > These chars still pass by and foul up my XML file.
    >
    > I have a feeling that its down to the fact that my function is looking for
    > the html equiv rather than the actual char, but I can't possibly get away
    > with simply copy and pasting these friggin(!!) chars into my function.
    > Surely this is bad practise?
    >
    > Does anybody know how I can trap and replace/remove these chars if need
    > be?


    Your function is quite limited. What happens when a character not in your
    list appears? The XML supported entity list is pretty small.

    Here's the function I use in my own XML generation code, it's crude but it works:

    function XMLEncode(strText)

    'loop through code and replace all non-alphanumeric characters with their
    ascii value
    strNewText = ""

    For i = 1 to Len(strText)

    j = Asc(Mid(strText,i,1))

    If j = 10 Then
    'replace tab with a line break
    strNewText= strNewText & "&lt;br&gt;"
    ElseIf j = 13 or j = 9 then 'cr, lf, tab
    'strip them
    ElseIf j = 34 then
    strNewText = strNewText & "&quot;"
    ElseIf j = 39 then
    strNewText = strNewText & "&apos;"
    ElseIf j = 32 or j = 45 or (j >=49 and j <= 57) or (j >=65 and j <= 90) or
    (j >= 97 and j <= 122) then
    'ok
    strNewText = strNewText & Mid(strText,i,1)
    ElseIf j = 38 Then '&
    strNewText = strNewText & "&amp;"
    ElseIf j = 60 then '<
    strNewText = strNewText & "&lt;"
    ElseIf j = 62 then '>
    strNewText = strNewText & "&gt;"
    Else
    strNewText = strNewText & "&#" & j & ";"
    End If

    Next

    XMLEncode = strNewText
    End Function


    This checks each character in the string in turn, and replaces some with
    entities, and the rest of the non-printable characters with their numeric
    value. You could easily add a few more entity replacements as required. Just
    watch out for the first couple of replacements where I replace tabs with a
    <br>, and strip out carriage returns and line feeds, as that might not fit
    what you want do with the XML yourself.

    Dan
     
    Daniel Crichton, Apr 4, 2007
    #3
  4. "Yobbo" <> wrote in message
    news:...
    > Hi All
    >
    > I have an ASP function in place to strip invalid chars out of a data store
    > before I create an XML file of this data, but my function doesn't work on

    a
    > certain set of chars.
    >
    > As far as I can see these are the following:
    >
    > a) trademark char
    > b) long hyphen/dash char
    > c) smart/curly quotes (both left and right)
    >
    > Even though my function is set up as follows:
    >
    > Function ReFormatStringForXML(s)
    > IF LEN(s) > 0 AND NOT IsNull(s) THEN
    > s = Replace(s,"™","&trade;")
    > s = Replace(s,"—","-")
    > s = Replace(s,"’","&quot;")
    > s = Replace(s,"'","&quot;")
    > s = Replace(s,"""","&quot;")
    > s = Replace(s,"&","&amp;")
    > s = Replace(s,"<","&lt;")
    > s = Replace(s,">","&gt;")
    > END IF
    > ReFormatStringForXML = s
    > End Function
    >
    > These chars still pass by and foul up my XML file.
    >
    > I have a feeling that its down to the fact that my function is looking for
    > the html equiv rather than the actual char, but I can't possibly get away
    > with simply copy and pasting these friggin(!!) chars into my function.
    > Surely this is bad practise?
    >
    > Does anybody know how I can trap and replace/remove these chars if need

    be?
    >
    > Thanks


    If you are creating an XML file can you use a DOMDocument to build it and
    save it?
    That'll ensure correct XML is created.
     
    Anthony Jones, Apr 5, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. bruce
    Replies:
    6
    Views:
    830
    Simon Forman
    Jul 4, 2006
  2. Kosio

    Floats to chars and chars to floats

    Kosio, Sep 16, 2005, in forum: C Programming
    Replies:
    44
    Views:
    1,292
    Tim Rentsch
    Sep 23, 2005
  3. Hongyu
    Replies:
    9
    Views:
    913
    James Kanze
    Aug 8, 2008
  4. M.Posseth

    receiving ??? chars instead of "special" chars

    M.Posseth, Nov 15, 2004, in forum: ASP .Net Web Services
    Replies:
    3
    Views:
    231
    Dan Rogers
    Nov 16, 2004
  5. Yobbo

    Removing obscure chars

    Yobbo, Apr 3, 2007, in forum: ASP General
    Replies:
    0
    Views:
    103
    Yobbo
    Apr 3, 2007
Loading...

Share This Page