Stupid std::codecvt question

Discussion in 'C++' started by wscholine, Jul 2, 2007.

  1. wscholine

    wscholine Guest

    This is with MSVC8, if there's an implementation dependency.

    I have a requirement to read lines from files that might be composed
    of of wchar_t (for example, text files written by MS Notepad using
    "Save As Unicode"). I would like to do this:

    typedef std::codecvt<wchar_t, wchar_t, mbstate_t> nullcodecvt;
    ...
    std::wifstream myFile;
    ...
    // somehow associate a nullcodecvt facet with myFile, if it has a
    Unicode BOM
    ...
    std::wstring wline;
    std::getstring(myFile, wline);
    ...

    What I tried is this:

    // awkward-looking circumlocution seems to be the only way to get
    a
    // reference to a nullcodecvt
    const nullcodecvt &conv =
    std::use_facet<nullcodecvt>(std::wcin.getloc());
    const std::locale from(std::wcin.getloc(), &conv);
    // the file I'm playing with contains the text of a sonnet, hence
    the name
    std::wifstream wsonnet;
    wsonnet.imbue(from);
    wsonnet.open(L"sonnet-2");
    // seek past the BOM
    wsonnet.seekg(2, std::ios::beg);
    std::wstring wline;
    while (wsonnet)
    {
    std::getline(wsonnet, wline);
    }

    which does not do the trick. The first time through the loop, wline
    gets the low-order half of the character after the BOM, and is empty
    thereafter.

    Inspecting the data structures with the debugger, I find that wsonnet
    has a member of type std::basic_filebuf<wchar_t,
    std::char_traits<wchar_t> >, and that this member has a member of type
    std::codecvt<wchar_t, char, int> *. The call to
    std::wifstream::imbue() doesn't touch that (unsurprisingly, since it's
    a different type than the codecvt instantiation that I want). However,
    if I manually modify the pointer to point to my nullcodecvt & conv,
    the behavior is what I want: each time through the loop, the
    successive lines get read without being converted.

    FWIW, wsonnet::basic_istream::basic_ios::ios_base dose have a
    std::locale * that includes my nullcodecvt in its facets. It doesn't
    affect the behavior of std::getline() though.

    Is what I am trying to do just wrong? Or is there something broken
    with the MS implementation of std::wifstream?

    If I'm not totally on the wrong track, is there some less kludgy-
    looking way of getting the facet instantiated?

    Thanks in advance.
    wscholine, Jul 2, 2007
    #1
    1. Advertising

  2. wscholine

    P.J. Plauger Guest

    "wscholine" <> wrote in message
    news:...

    > This is with MSVC8, if there's an implementation dependency.
    >
    > I have a requirement to read lines from files that might be composed
    > of of wchar_t (for example, text files written by MS Notepad using
    > "Save As Unicode"). I would like to do this:
    >
    > typedef std::codecvt<wchar_t, wchar_t, mbstate_t> nullcodecvt;
    > ...
    > std::wifstream myFile;
    > ...
    > // somehow associate a nullcodecvt facet with myFile, if it has a
    > Unicode BOM
    > ...
    > std::wstring wline;
    > std::getstring(myFile, wline);
    > ...
    >
    > What I tried is this:
    >
    > // awkward-looking circumlocution seems to be the only way to get
    > a
    > // reference to a nullcodecvt
    > const nullcodecvt &conv =
    > std::use_facet<nullcodecvt>(std::wcin.getloc());
    > const std::locale from(std::wcin.getloc(), &conv);
    > // the file I'm playing with contains the text of a sonnet, hence
    > the name
    > std::wifstream wsonnet;
    > wsonnet.imbue(from);
    > wsonnet.open(L"sonnet-2");
    > // seek past the BOM
    > wsonnet.seekg(2, std::ios::beg);
    > std::wstring wline;
    > while (wsonnet)
    > {
    > std::getline(wsonnet, wline);
    > }
    >
    > which does not do the trick. The first time through the loop, wline
    > gets the low-order half of the character after the BOM, and is empty
    > thereafter.
    >
    > Inspecting the data structures with the debugger, I find that wsonnet
    > has a member of type std::basic_filebuf<wchar_t,
    > std::char_traits<wchar_t> >, and that this member has a member of type
    > std::codecvt<wchar_t, char, int> *. The call to
    > std::wifstream::imbue() doesn't touch that (unsurprisingly, since it's
    > a different type than the codecvt instantiation that I want). However,
    > if I manually modify the pointer to point to my nullcodecvt & conv,
    > the behavior is what I want: each time through the loop, the
    > successive lines get read without being converted.
    >
    > FWIW, wsonnet::basic_istream::basic_ios::ios_base dose have a
    > std::locale * that includes my nullcodecvt in its facets. It doesn't
    > affect the behavior of std::getline() though.
    >
    > Is what I am trying to do just wrong?


    Yes.

    > Or is there something broken
    > with the MS implementation of std::wifstream?


    No.

    > If I'm not totally on the wrong track, is there some less kludgy-
    > looking way of getting the facet instantiated?


    You need one of the codecvt facets in our code conversion library.
    Just which one depends on details you haven't specified, but I'm
    sure what you need is in there. Or you might get lucky and find
    an open-source codecvt facet that does what you want.

    > Thanks in advance.


    P.J. Plauger
    Dinkumware, Ltd.
    http://www.dinkumware.com
    P.J. Plauger, Jul 2, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brandon McCombs
    Replies:
    4
    Views:
    513
    Richard Wheeldon
    Aug 28, 2006
  2. JH Trauntvein

    codecvt mbstate_t values

    JH Trauntvein, Nov 18, 2005, in forum: C++
    Replies:
    2
    Views:
    525
    P.J. Plauger
    Nov 18, 2005
  3. Replies:
    8
    Views:
    1,446
  4. Dancefire

    I need help for std::codecvt<>

    Dancefire, May 2, 2007, in forum: C++
    Replies:
    1
    Views:
    2,392
    Alf P. Steinbach
    May 2, 2007
  5. rincewind

    stupid, STUPID question!

    rincewind, Apr 19, 2009, in forum: HTML
    Replies:
    25
    Views:
    1,019
Loading...

Share This Page