Will standard C++ allow me to replace a string in a unicode-encoded text file?

Discussion in 'C++' started by Eric Lilja, Feb 21, 2005.

  1. Eric Lilja

    Eric Lilja Guest

    Hello, I had what I thought was normal text-file and I needed to locate a
    string matching a certain pattern in that file and, if found, replace that
    string. I thought this would be simple but I had problems getting my
    algorithm to work and in order to help me find the solution I decided to
    print each line to screen as I read them.
    Then, to my surprise, I noticed that there was a space between every
    character as I outputted the lines to the screen. I opened the file in a
    more competent text editor and it informed me the file was "encoded" in
    U-DOS. What's that, unicode? Anyway, my question is, can I read and write
    unicode text files using standard C++ or will I have to resort to platform
    specific tools in order to accomplish what I want?

    Thanks for reading and replying

    / Eric
     
    Eric Lilja, Feb 21, 2005
    #1
    1. Advertising

  2. Eric Lilja

    Jerry Coffin Guest

    Eric Lilja wrote:
    > Hello, I had what I thought was normal text-file and I needed to
    > locate a string matching a certain pattern in that file and, if
    > found, replace that string. I thought this would be simple but I
    > had problems getting my algorithm to work and in order to help me
    > find the solution I decided to print each line to screen as I
    > read them. Then, to my surprise, I noticed that there was a space
    > between every character as I outputted the lines to the screen. I
    > opened the file in a more competent text editor and it informed
    > me the file was "encoded" in U-DOS. What's that, unicode? Anyway,
    > my question is, can I read and write unicode text files using
    > standard C++ or will I have to resort to platform specific tools
    > in order to accomplish what I want?


    You should be able to read nearly any sort of file in standard C++. The
    major question is how much work it'll be -- i.e. whether your library
    already has code to handle the encoding used or not.

    "U-DOS" doesn't mean much to me -- to get very far, you'll probably
    want to look at something like a hex-dump of the file to figure out
    what it really contains. Based on your description, it sounds as if it
    _may_ have been written as UCS-2 or UTF-16 Unicode, but it's hard to
    guess. If (nearly) every other byte is 00, one of those is a strong
    possibility. Then again, if quite a few of the odd bytes aren't 00's,
    it might still be UCS-2 (for example) but it's harder to say for sure.

    If the file's truly properly written Unicode, then it's supposed to
    start with a byte-order mark, and based on how that's been written, you
    can pretty much figure out how the rest of the file should be decoded
    as well. Unfortunately, an awful lot of files use (one of the several
    forms of) Unicode encoding elsewhere, but leave out the byte-order
    mark. In that case, you'll have to figure out the encoding on your own
    -- there are heuristics to us to try to figure out (much as I've
    outlined above) but none of them is perfect by any means.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Feb 21, 2005
    #2
    1. Advertising

  3. Eric Lilja

    Eric Lilja Guest

    "Jerry Coffin" wrote:
    news:...
    > Eric Lilja wrote:
    >> Hello, I had what I thought was normal text-file and I needed to
    >> locate a string matching a certain pattern in that file and, if
    >> found, replace that string. I thought this would be simple but I
    >> had problems getting my algorithm to work and in order to help me
    >> find the solution I decided to print each line to screen as I
    >> read them. Then, to my surprise, I noticed that there was a space
    >> between every character as I outputted the lines to the screen. I
    >> opened the file in a more competent text editor and it informed
    >> me the file was "encoded" in U-DOS. What's that, unicode? Anyway,
    >> my question is, can I read and write unicode text files using
    >> standard C++ or will I have to resort to platform specific tools
    >> in order to accomplish what I want?

    >
    > You should be able to read nearly any sort of file in standard C++. The
    > major question is how much work it'll be -- i.e. whether your library
    > already has code to handle the encoding used or not.
    >
    > "U-DOS" doesn't mean much to me -- to get very far, you'll probably
    > want to look at something like a hex-dump of the file to figure out
    > what it really contains. Based on your description, it sounds as if it
    > _may_ have been written as UCS-2 or UTF-16 Unicode, but it's hard to
    > guess. If (nearly) every other byte is 00, one of those is a strong
    > possibility. Then again, if quite a few of the odd bytes aren't 00's,
    > it might still be UCS-2 (for example) but it's harder to say for sure.
    >
    > If the file's truly properly written Unicode, then it's supposed to
    > start with a byte-order mark, and based on how that's been written, you
    > can pretty much figure out how the rest of the file should be decoded
    > as well. Unfortunately, an awful lot of files use (one of the several
    > forms of) Unicode encoding elsewhere, but leave out the byte-order
    > mark. In that case, you'll have to figure out the encoding on your own
    > -- there are heuristics to us to try to figure out (much as I've
    > outlined above) but none of them is perfect by any means.


    Thanks for your reply, Jerry. The file starts with 0xFF 0xFE, so that means
    utf-16? I was thinking of opening it in binary mode, read the first two
    bytes then start a loop that reads from the file byte by byte and adds the
    first, the third, the fifth byte etc to a std::string (or a std::vector of
    chars maybe). When the loop is done I should have the actual text of the
    file. Then I can look for the pattern I want and replace it as needed. Then
    I will open the file for writing (still in binary of course) and write out
    as utf-16. Sounds like this should work?
    >
    > --
    > Later,
    > Jerry.
    >
    > The universe is a figment of its own imagination.
    >


    / Eric
     
    Eric Lilja, Feb 22, 2005
    #3
  4. Eric Lilja wrote:
    > Thanks for your reply, Jerry. The file starts with 0xFF 0xFE, so that

    means
    > utf-16?


    Not necessarily: it indicates UTF-16 or UCS2. However, the difference
    only matters if you want to access the whole set of Unicode characters:
    UTF-16 provides the possibility to access characters where the code
    requires more than 16 bits while UCS2 does not (and is thus not an
    encoding covering all Unicode characters).

    > I was thinking of opening it in binary mode, read the first two
    > bytes then start a loop that reads from the file byte by byte and

    adds the
    > first, the third, the fifth byte etc to a std::string (or a

    std::vector of
    > chars maybe).


    It depends on what you want to do: if your goal is only to process the
    given file, this may work but you are probably easier off using a
    Unicode enabled editor for this task. If you need to process more files
    or similar nature you should use a rather different approach: the first
    two bytes are conventionally considered to be a byte order mark if they
    either consist of FFFE or FEFF. Otherwise, the file does not have a
    byte
    order mark and you should figure the details of the encoding out
    differently: for example, XML specifies that a certain string should
    appear early and this can be used to find out the byte ordering. Often
    files have some form of a "magic" code to indicate their contents.

    Since you are apparently handling Unicode, you should not use a
    'std::string' but at least 'std::wstring' to cope with non-ASCII
    characters, too: the Unicode encoding does not just waste the space, it
    does so for a reason. The zero bytes you are seening just indicate that
    you got essentially ASCII characters but there are also many other
    characters which require more than seven bits of encoding. The usual
    Unicode characters just take two bytes (which happens to be the size of
    'wchar_t' on some platforms) but full coverage of Unicode requires 20
    bits (the last time I looked; there was a time when 16 bits where
    sufficient for Unicode, too). You should probably at least compute the
    'whcar_t' and assume a UCS2 encoding (you mgiht want to bail out if
    you detect UTF-16; I don't remember the details off-hand but this is
    pretty easy: just look for documentation of UTF-16).

    Effectively, you might get away without even touching this whole mold,
    though: if you are using 'std::wifstream' you might get the right thing
    immediately. If not, you probably can set up the locale to do the right
    thing. Unfortunately are the details of the locale setup not part of
    the standard and depend on the platform, i.e. you have to find out with
    your documentation.

    > When the loop is done I should have the actual text of the
    > file. Then I can look for the pattern I want and replace it as

    needed. Then
    > I will open the file for writing (still in binary of course) and

    write out
    > as utf-16. Sounds like this should work?


    I'd consider it unlikely. You might want to get a text containing e.g.
    Japanese character to test on...
    --
    <mailto:> <http://www.dietmar-kuehl.de/>
    <http://www.contendix.com> - Software Development & Consulting
     
    Dietmar Kuehl, Feb 22, 2005
    #4
  5. Eric Lilja

    Heinz Ozwirk Guest

    "Eric Lilja" <> schrieb im Newsbeitrag news:cvdu4d$85n$...
    > Thanks for your reply, Jerry. The file starts with 0xFF 0xFE, so that means
    > utf-16? I was thinking of opening it in binary mode, read the first two
    > bytes then start a loop that reads from the file byte by byte and adds the
    > first, the third, the fifth byte etc to a std::string (or a std::vector of
    > chars maybe). When the loop is done I should have the actual text of the
    > file. Then I can look for the pattern I want and replace it as needed. Then
    > I will open the file for writing (still in binary of course) and write out
    > as utf-16. Sounds like this should work?


    0xFF, 0xFE looks like the little-endian byte-order-mark. So it is a good guess to assume it to contain 16-bit unicode text, created on (or for) a little-endian machine. If your program runs on such a machine, you could use wchar_t/wstring to read and process your file. If your program does not run on a little-endian machine, you can still use wstring, but you have to swap bytes after reading (and before writing). [Actually, wchar_t is not garanteed to be unicode, but it is very likely to be unicode. If you are very suspicious, you could typedef your own unicode and ustring types as wchar_t and wstring.]

    Of cause, you can also read and process it as a binary file, but simply discard every other byte is not a good idea.

    HTH
    Heinz
     
    Heinz Ozwirk, Feb 22, 2005
    #5
  6. Heinz Ozwirk wrote:
    > 0xFF, 0xFE looks like the little-endian byte-order-mark. So it is a

    good guess to assume it to contain 16-bit unicode text, created on (or
    for) a little-endian machine. If your program runs on such a machine,
    you could use wchar_t/wstring to read and process your file. If your
    program does not run on a little-endian machine, you can still use
    wstring, but you have to swap bytes after reading (and before writing).
    [Actually, wchar_t is not garanteed to be unicode, but it is very
    likely to be unicode. If you are very suspicious, you could typedef
    your own unicode and ustring types as wchar_t and wstring.]

    Actually, the details of the encoding should be entirely independent of
    the architecture and should be handled by the 'std::codecvt<>' facet!
    If you are reading a 'std::wstring' from a 'std::wistream' there should
    be no need to tinker with the bytes at all.
    --
    <mailto:> <http://www.dietmar-kuehl.de/>
    <http://www.contendix.com> - Software Development & Consulting
     
    Dietmar Kuehl, Feb 22, 2005
    #6
  7. On Tue, 22 Feb 2005 01:24:58 +0100, Eric Lilja
    <> wrote:

    > Thanks for your reply, Jerry. The file starts with 0xFF 0xFE, so that means
    > utf-16? I was thinking of opening it in binary mode, read the first two
    > bytes then start a loop that reads from the file byte by byte and adds the
    > first, the third, the fifth byte etc to a std::string (or a std::vector of
    > chars maybe). When the loop is done I should have the actual text of the
    > file. Then I can look for the pattern I want and replace it as needed. Then
    > I will open the file for writing (still in binary of course) and write out
    > as utf-16. Sounds like this should work?


    It's more likely to be UCS-2 (UTF-16 is an extension to UCS-2 which
    allows UCS-4 characters to be embedded in a UCS-2 stream). The Byte
    Order Mark is defined to be 0xFEFF, with the character 0xFFFE defined as
    invalid, so that the byte order (big/little endian) can be determined.
    In your case the order must be LSB MSB, so you want all even numbered
    bytes (assuming standard C array indices starting at zero), but you
    ought to check for a portable implementation.

    You really should check that the other bytes are zero, as well, and give
    some sort of error if not (it's a character not representable in a
    normal string, unless you're on an implementation with 16 bit or more
    bytes); at minimum I would either ignore such a character or convert it
    to an error character ('?' for instance, like my mailer does).

    Or you can do all of your work in UCS-2 (or UCS-4), and thus preserve
    any non-ASCII characters. This will be a bit slower as an
    implementation, but on modern machines still faster than the I/O.

    If you really want portability, look at interpreting UCS-32, UTF-8 and
    UTF-16 as well as UCS-2 (and plain old text), with both big- and
    little-endian representations, and write a generic routine which
    converts any of them to a string (note that a C++ string type can take
    wide characters or longs as its element type). But for your case you
    may only need to do one or two of the formats.

    For further reading, see:

    http://www.unicode.org/faq/

    (and its parent if you want to get into the spec.). Warning: if you're
    like me, you can waste (er, spend) many happy hours reading the spec.
    and forget to do the work <g>...

    Chris C
     
    Chris Croughton, Feb 22, 2005
    #7
  8. Eric Lilja

    Eric Lilja Guest

    "Chris Croughton" wrote:
    > On Tue, 22 Feb 2005 01:24:58 +0100, Eric Lilja
    > <> wrote:
    >
    >> Thanks for your reply, Jerry. The file starts with 0xFF 0xFE, so that
    >> means
    >> utf-16? I was thinking of opening it in binary mode, read the first two
    >> bytes then start a loop that reads from the file byte by byte and adds
    >> the
    >> first, the third, the fifth byte etc to a std::string (or a std::vector
    >> of
    >> chars maybe). When the loop is done I should have the actual text of the
    >> file. Then I can look for the pattern I want and replace it as needed.
    >> Then
    >> I will open the file for writing (still in binary of course) and write
    >> out
    >> as utf-16. Sounds like this should work?

    >
    > It's more likely to be UCS-2 (UTF-16 is an extension to UCS-2 which
    > allows UCS-4 characters to be embedded in a UCS-2 stream). The Byte
    > Order Mark is defined to be 0xFEFF, with the character 0xFFFE defined as
    > invalid, so that the byte order (big/little endian) can be determined.
    > In your case the order must be LSB MSB, so you want all even numbered
    > bytes (assuming standard C array indices starting at zero), but you
    > ought to check for a portable implementation.
    >
    > You really should check that the other bytes are zero, as well, and give
    > some sort of error if not (it's a character not representable in a
    > normal string, unless you're on an implementation with 16 bit or more
    > bytes); at minimum I would either ignore such a character or convert it
    > to an error character ('?' for instance, like my mailer does).
    >
    > Or you can do all of your work in UCS-2 (or UCS-4), and thus preserve
    > any non-ASCII characters. This will be a bit slower as an
    > implementation, but on modern machines still faster than the I/O.
    >
    > If you really want portability, look at interpreting UCS-32, UTF-8 and
    > UTF-16 as well as UCS-2 (and plain old text), with both big- and
    > little-endian representations, and write a generic routine which
    > converts any of them to a string (note that a C++ string type can take
    > wide characters or longs as its element type). But for your case you
    > may only need to do one or two of the formats.
    >
    > For further reading, see:
    >
    > http://www.unicode.org/faq/
    >
    > (and its parent if you want to get into the spec.). Warning: if you're
    > like me, you can waste (er, spend) many happy hours reading the spec.
    > and forget to do the work <g>...
    >
    > Chris C


    Thanks for your replies everyone. I wrote the following little test program
    that I hope to get working for ucs-2 encoded files where all characters are
    representable using ascii (i.e, the second byte after the byte-order mark is
    \0 for all chars in the file). The program doesn't work as expected,
    however, because if you look at the function read_file it will read the byte
    order mark into the contents variable so when I write the new file (where I
    have replaced some strings), I get the byte-order mark twice although the
    second one has padding. If you look at the file in a hex editor you see: FF
    FE FF 00 FE 00. I can easily work around it by I want to know why
    read_file() is doing what it's doing.

    Here's the complete code:
    #include <cstdlib>
    #include <fstream>
    #include <iostream>
    #include <string>

    using std::cerr;
    using std::cout;
    using std::endl;
    using std::exit;
    using std::ifstream;
    using std::ios_base;
    using std::eek:fstream;
    using std::string;

    static string read_file(const char *);
    static void find_and_replace(string& s, const string&, const string&);
    static void write_file(const char *, const string&);

    static const char padding = '\0';

    int
    main()
    {
    const string find_what = "foobar";
    const string replace_with = "abcdef";

    string contents = read_file("testfile.txt");

    find_and_replace(contents, find_what, replace_with);

    write_file("outfile.txt", contents);

    return EXIT_SUCCESS;
    }

    static string
    read_file(const char *filename)
    {
    ifstream file(filename, ios_base::binary);

    if(!file)
    {
    cerr << "Error: Failed to open " << filename << endl;

    exit(EXIT_FAILURE);
    }

    char c = '\0';
    string contents;

    file.read(&c, sizeof(c));
    contents += c;
    file.read(&c, sizeof(c));
    contents += c;

    if((unsigned char)contents[0] != 0xFF ||
    (unsigned char)contents[1] != 0xFE)
    {
    cerr << "Error: The file doesn't appear to be a unicode-file." <<
    endl;

    /* std::ifstreams destructor will close the file. */
    exit(EXIT_FAILURE);
    }

    int count = 0;

    while(file.read(&c, sizeof(c)))
    {
    if(!(count++ % 2))
    contents.push_back(c);
    else
    if(c != padding) /* padding is a static global that equals \0 */
    {
    cerr << "Error: Found a character that is too "
    << "big to fit into a single byte." << endl;

    /* std::ifstreams destructor will close the file. */
    exit(EXIT_FAILURE);
    }
    }

    /* std::ifstreams destructor will close the file. */
    return contents;
    }

    static void
    find_and_replace(string& s, const string& find_what, const string&
    replace_with)
    {
    string::size_type start = 0;
    string::size_type offset = 0;
    size_t occurencies = 0;

    while((start = s.find(find_what, offset)) != string::npos)
    {
    s.replace(start, find_what.length(), replace_with);

    /* Very important that we set offset to start + 1 or we will
    go into an infinite loop because we will find the first {
    over and over again. */
    offset = start + 1;

    ++occurencies;
    }

    cout << "Replaced " << occurencies << " occurencies." << endl;
    }

    static void
    write_file(const char *filename, const string& contents)
    {
    ofstream file(filename, ios_base::binary);

    const char byte_order_mark[2] = { 0xFF, 0xFE };

    file.write(&byte_order_mark[0], sizeof(char));
    file.write(&byte_order_mark[1], sizeof(char));

    for(string::size_type i = 0; i < contents.length(); ++i)
    {
    file.write(&contents, sizeof(char));
    file.write(&padding, sizeof(char));
    }
    }

    Thanks for any replies

    / Eric
     
    Eric Lilja, Feb 22, 2005
    #8
  9. Eric Lilja

    Eric Lilja Guest

    "Eric Lilja" <> wrote in message
    news:cvff9h$hii$...
    >
    > "Chris Croughton" wrote:
    >> On Tue, 22 Feb 2005 01:24:58 +0100, Eric Lilja
    >> <> wrote:
    >>
    >>> Thanks for your reply, Jerry. The file starts with 0xFF 0xFE, so that
    >>> means
    >>> utf-16? I was thinking of opening it in binary mode, read the first two
    >>> bytes then start a loop that reads from the file byte by byte and adds
    >>> the
    >>> first, the third, the fifth byte etc to a std::string (or a std::vector
    >>> of
    >>> chars maybe). When the loop is done I should have the actual text of the
    >>> file. Then I can look for the pattern I want and replace it as needed.
    >>> Then
    >>> I will open the file for writing (still in binary of course) and write
    >>> out
    >>> as utf-16. Sounds like this should work?

    >>
    >> It's more likely to be UCS-2 (UTF-16 is an extension to UCS-2 which
    >> allows UCS-4 characters to be embedded in a UCS-2 stream). The Byte
    >> Order Mark is defined to be 0xFEFF, with the character 0xFFFE defined as
    >> invalid, so that the byte order (big/little endian) can be determined.
    >> In your case the order must be LSB MSB, so you want all even numbered
    >> bytes (assuming standard C array indices starting at zero), but you
    >> ought to check for a portable implementation.
    >>
    >> You really should check that the other bytes are zero, as well, and give
    >> some sort of error if not (it's a character not representable in a
    >> normal string, unless you're on an implementation with 16 bit or more
    >> bytes); at minimum I would either ignore such a character or convert it
    >> to an error character ('?' for instance, like my mailer does).
    >>
    >> Or you can do all of your work in UCS-2 (or UCS-4), and thus preserve
    >> any non-ASCII characters. This will be a bit slower as an
    >> implementation, but on modern machines still faster than the I/O.
    >>
    >> If you really want portability, look at interpreting UCS-32, UTF-8 and
    >> UTF-16 as well as UCS-2 (and plain old text), with both big- and
    >> little-endian representations, and write a generic routine which
    >> converts any of them to a string (note that a C++ string type can take
    >> wide characters or longs as its element type). But for your case you
    >> may only need to do one or two of the formats.
    >>
    >> For further reading, see:
    >>
    >> http://www.unicode.org/faq/
    >>
    >> (and its parent if you want to get into the spec.). Warning: if you're
    >> like me, you can waste (er, spend) many happy hours reading the spec.
    >> and forget to do the work <g>...
    >>
    >> Chris C

    >
    > Thanks for your replies everyone. I wrote the following little test
    > program that I hope to get working for ucs-2 encoded files where all
    > characters are representable using ascii (i.e, the second byte after the
    > byte-order mark is \0 for all chars in the file). The program doesn't work
    > as expected, however, because if you look at the function read_file it
    > will read the byte order mark into the contents variable so when I write
    > the new file (where I have replaced some strings), I get the byte-order
    > mark twice although the second one has padding. If you look at the file in
    > a hex editor you see: FF FE FF 00 FE 00. I can easily work around it by I
    > want to know why read_file() is doing what it's doing.
    >
    > Here's the complete code:
    > #include <cstdlib>
    > #include <fstream>
    > #include <iostream>
    > #include <string>
    >
    > using std::cerr;
    > using std::cout;
    > using std::endl;
    > using std::exit;
    > using std::ifstream;
    > using std::ios_base;
    > using std::eek:fstream;
    > using std::string;
    >
    > static string read_file(const char *);
    > static void find_and_replace(string& s, const string&, const string&);
    > static void write_file(const char *, const string&);
    >
    > static const char padding = '\0';
    >
    > int
    > main()
    > {
    > const string find_what = "foobar";
    > const string replace_with = "abcdef";
    >
    > string contents = read_file("testfile.txt");
    >
    > find_and_replace(contents, find_what, replace_with);
    >
    > write_file("outfile.txt", contents);
    >
    > return EXIT_SUCCESS;
    > }
    >
    > static string
    > read_file(const char *filename)
    > {
    > ifstream file(filename, ios_base::binary);
    >
    > if(!file)
    > {
    > cerr << "Error: Failed to open " << filename << endl;
    >
    > exit(EXIT_FAILURE);
    > }
    >
    > char c = '\0';
    > string contents;
    >
    > file.read(&c, sizeof(c));
    > contents += c;
    > file.read(&c, sizeof(c));
    > contents += c;
    >
    > if((unsigned char)contents[0] != 0xFF ||
    > (unsigned char)contents[1] != 0xFE)
    > {
    > cerr << "Error: The file doesn't appear to be a unicode-file." <<
    > endl;
    >
    > /* std::ifstreams destructor will close the file. */
    > exit(EXIT_FAILURE);
    > }
    >
    > int count = 0;
    >
    > while(file.read(&c, sizeof(c)))
    > {
    > if(!(count++ % 2))
    > contents.push_back(c);
    > else
    > if(c != padding) /* padding is a static global that equals \0 */
    > {
    > cerr << "Error: Found a character that is too "
    > << "big to fit into a single byte." << endl;
    >
    > /* std::ifstreams destructor will close the file. */
    > exit(EXIT_FAILURE);
    > }
    > }
    >
    > /* std::ifstreams destructor will close the file. */
    > return contents;
    > }
    >
    > static void
    > find_and_replace(string& s, const string& find_what, const string&
    > replace_with)
    > {
    > string::size_type start = 0;
    > string::size_type offset = 0;
    > size_t occurencies = 0;
    >
    > while((start = s.find(find_what, offset)) != string::npos)
    > {
    > s.replace(start, find_what.length(), replace_with);
    >
    > /* Very important that we set offset to start + 1 or we will
    > go into an infinite loop because we will find the first {
    > over and over again. */
    > offset = start + 1;
    >
    > ++occurencies;
    > }
    >
    > cout << "Replaced " << occurencies << " occurencies." << endl;
    > }
    >
    > static void
    > write_file(const char *filename, const string& contents)
    > {
    > ofstream file(filename, ios_base::binary);
    >
    > const char byte_order_mark[2] = { 0xFF, 0xFE };
    >
    > file.write(&byte_order_mark[0], sizeof(char));
    > file.write(&byte_order_mark[1], sizeof(char));
    >
    > for(string::size_type i = 0; i < contents.length(); ++i)
    > {
    > file.write(&contents, sizeof(char));
    > file.write(&padding, sizeof(char));
    > }
    > }
    >
    > Thanks for any replies
    >
    > / Eric
    >


    Lol, nevermind! I saw that I was using the contents variable for reading the
    byte-order mark. I thought the reading position was being rewound somehow.
    Anyway, if you have any other comments on the code, please share them.

    / Eric
     
    Eric Lilja, Feb 22, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fritz Bayer
    Replies:
    5
    Views:
    24,340
    Fritz Bayer
    Oct 25, 2004
  2. Stefan Behnel
    Replies:
    4
    Views:
    429
    Nick Coghlan
    Feb 15, 2005
  3. Svennglenn

    replace text in unicode string

    Svennglenn, May 14, 2005, in forum: Python
    Replies:
    2
    Views:
    1,006
    John Machin
    May 14, 2005
  4. Replies:
    4
    Views:
    591
  5. Ryan Taylor
    Replies:
    1
    Views:
    699
    Ryan Taylor
    Sep 9, 2004
Loading...

Share This Page