Read a file line by line and write each line to a file based on the5th byte

Discussion in 'C++' started by scad, May 13, 2009.

  1. scad

    scad Guest

    I have a file that can have 5 different values in the 5th byte of the
    line. I want to read each line and write that line to a new file
    based on the 5th byte. The byte ascii values are 240, 241, 242, 243,
    244 and each should go to its own file (File0, file1, etc)

    Can anyone help me with this?

    Scott
     
    scad, May 13, 2009
    #1
    1. Advertising

  2. scad

    Neelesh Guest

    Re: Read a file line by line and write each line to a file based onthe 5th byte

    On May 13, 10:25 pm, scad <> wrote:
    > I have a file that can have 5 different values in the 5th byte of the
    > line.  I want to read each line and write that line to a new file
    > based on the 5th byte.  The byte ascii values are 240, 241, 242, 243,
    > 244 and each should go to its own file (File0, file1, etc)
    >
    > Can anyone help me with this?
    >


    1. Open the file using ifstream and ios::binary
    2. use read member function of ifstream to read first five bytes into
    a character array.
    3. Use ofstream to create the appropriate file. Use "write" member
    function to write the bytes to the file. If you have only five files,
    you can use a switch statement to create an ostream handle for
    appropriate file. If there are many files (one for each byte) then you
    can use an associative array like std::map to put this mapping.

    If you come across any problem while coding this, please post the code
    so that exact issues can be solved.
     
    Neelesh, May 13, 2009
    #2
    1. Advertising

  3. Re: Read a file line by line and write each line to a file basedon the 5th byte

    Neelesh wrote:
    > On May 13, 10:25 pm, scad <> wrote:
    >> I have a file that can have 5 different values in the 5th byte of the
    >> line. I want to read each line and write that line to a new file
    >> based on the 5th byte. The byte ascii values are 240, 241, 242, 243,
    >> 244 and each should go to its own file (File0, file1, etc)
    >>
    >> Can anyone help me with this?
    >>

    >
    > 1. Open the file using ifstream and ios::binary
    > 2. use read member function of ifstream to read first five bytes into
    > a character array.
    > 3. Use ofstream to create the appropriate file.


    Where does reading the lines step in?
     
    Juha Nieminen, May 13, 2009
    #3
  4. Re: Read a file line by line and write each line to a file basedon the 5th byte

    scad wrote:
    > I have a file that can have 5 different values in the 5th byte of the
    > line. I want to read each line and write that line to a new file
    > based on the 5th byte. The byte ascii values are 240, 241, 242, 243,
    > 244 and each should go to its own file (File0, file1, etc)


    There's really not enough information to give a definite answer.

    Is the input in regular text mode, and thus each "line" terminated
    with a regular newline character? (In other words, would it be enough to
    use std::getline() to read the lines in the input?)

    If the answer is yes, the problem seems so trivial that I have the
    feeling that this actually is not the case. If the answer is indeed no,
    then there's definitely not enough information to answer your question.
     
    Juha Nieminen, May 13, 2009
    #4
  5. scad

    James Kanze Guest

    Re: Read a file line by line and write each line to a file based onthe 5th byte

    On May 13, 9:12 pm, Juha Nieminen <> wrote:
    > scad wrote:
    > > I have a file that can have 5 different values in the 5th
    > > byte of the line. I want to read each line and write that
    > > line to a new file based on the 5th byte. The byte ascii
    > > values are 240, 241, 242, 243, 244 and each should go to its
    > > own file (File0, file1, etc)


    > There's really not enough information to give a definite
    > answer.


    That's putting it mildly.

    > Is the input in regular text mode, and thus each "line"
    > terminated with a regular newline character? (In other words,
    > would it be enough to use std::getline() to read the lines in
    > the input?)


    I'd consider that a given, since he speaks of lines. On the
    other hand, he speaks of ASCII codes 240-244, which don't exist.

    > If the answer is yes, the problem seems so trivial that I have
    > the feeling that this actually is not the case. If the answer
    > is indeed no, then there's definitely not enough information
    > to answer your question.


    The implementation is trivial, but first he has to specify the
    problem more clearly: what if a line contains less than 5
    characters? What if the fifth byte isn't one of these values?

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 14, 2009
    #5
  6. scad

    James Kanze Guest

    Re: Read a file line by line and write each line to a file based onthe 5th byte

    On May 13, 9:25 pm, Jeff Schwab <> wrote:
    > Neelesh wrote:


    [...]
    > > 3. Use ofstream to create the appropriate file. Use "write"
    > > member function to write the bytes to the file. If you have
    > > only five files, you can use a switch statement


    > Or a container of file-writer objects, indexed by ID.


    > std::size_t const id_column = 5;
    > for (std::string line; getline(input, line);) {
    > writers[line.at(id_column)].write(line);
    > }


    Just curious, but why at in one case, an [] in the other? (I've
    never found a real use for container<>::at.)

    Basically, until he specifies what is to happen if the line
    is shorter than five bytes or the fifth byte doesn't have one of
    the values mentionned, it's hard to say what you should do.

    And of course, if you're reading with getline, it would seem
    more logical to be writing with <<, not write. The body of the
    loop should probably be something like:

    std::eek:stream* writer( getWriter( line ) ) ;
    if ( writer != NULL ) {
    *writer << line << '\n' ;
    }

    (Alternatively, you could arrange for getWriter to return a
    "null stream" if the line is too short or doesn't have one of
    the privileged values.)

    > > to create an ostream handle for appropriate file. If there
    > > are many files (one for each byte) then you can use an
    > > associative array like std::map to put this mapping.


    > Since the expected byte values are consecutive, a
    > bounds-checked array wrapper class would serve nicely. The
    > map would have the benefit of sparsity, but the OP says there
    > are only five acceptable byte values.


    A map would be simpler to evolve.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 14, 2009
    #6
  7. Re: Read a file line by line and write each line to a file basedon the 5th byte

    James Kanze wrote:
    > (I've never found a real use for container<>::at.)


    In a situation where speed is irrelevant, why not use at() instead of
    operator[]? If you happen to make a bug, better to get it caught clearly
    than having the program misbehave in strange ways.
     
    Juha Nieminen, May 15, 2009
    #7
  8. scad

    James Kanze Guest

    Re: Read a file line by line and write each line to a file based onthe 5th byte

    On May 14, 4:38 pm, Jeff Schwab <> wrote:
    > James Kanze wrote:
    > > On May 13, 9:25 pm, Jeff Schwab <> wrote:
    > >> Neelesh wrote:


    > > [...]
    > >>> 3. Use ofstream to create the appropriate file. Use "write"
    > >>> member function to write the bytes to the file. If you have
    > >>> only five files, you can use a switch statement


    > >> Or a container of file-writer objects, indexed by ID.


    > >> std::size_t const id_column = 5;
    > >> for (std::string line; getline(input, line);) {
    > >> writers[line.at(id_column)].write(line);
    > >> }


    > > Just curious, but why at in one case, an [] in the other? (I've
    > > never found a real use for container<>::at.)


    > I use at() with standard library containers when I want
    > bounds-checking. (String may be bounds-checked anyway, but I
    > want to be clear.). The container of writers is meant to be a
    > custom container, so [] can perform bounds-checking directly.


    OK. I didn't catch the difference. (Of course, at() guarantees
    that you'll do the wrong thing in case of a bounds error, so I'm
    not sure it's a valid alternative in most cases.)

    > > Basically, until he specifies what is to happen if the line
    > > is shorter than five bytes or the fifth byte doesn't have
    > > one of the values mentionned, it's hard to say what you
    > > should do.


    > Exactly. That's why I'm throwing an exception. You and I
    > have already had this debate, so suffice it to say that we
    > disagree on the best policy here. (Actually, an Andrei-style
    > policy-based input check would probably be preferable to a
    > hard-coded exception.)


    In this particular case, I'm not arguing for or against the
    exception. What I'm arguing is that we don't know what he's
    supposed to do, and that we can't really worry about how to
    write the code until we know this.

    > > And of course, if you're reading with getline, it would seem
    > > more logical to be writing with <<, not write. The body of the
    > > loop should probably be something like:


    > > std::eek:stream* writer( getWriter( line ) ) ;
    > > if ( writer != NULL ) {
    > > *writer << line << '\n' ;
    > > }


    > Ugh. :( There's just no need for explicit pointers here,
    > especially pointers whose type is hard-coded to be raw, nor is
    > there any need for the macro.


    You'd prefer some sort of Fallible? In this case, I'm dealing
    with objects which have identity (ostream), and I have to deal
    with the possibility of there being no appropriate object. And
    I'm guessing that the correct behavior if the line doesn't have
    at least five bytes, or the fifth byte is some other value than
    the ones specified, is to do nothing. That is, of course, just
    a guess; I suspect that it's probably closer to what is wanted
    than an exception, but without more information, I really don't
    know.

    As for the macro, the only macro in the code is NULL, and that's
    not mine---I wouldn't have defined it as a macro either, if I'd
    have been designing C. (As soon as I can count on C++0x, I'll
    replace it with nullptr.)

    > Whether to use <<, std::eek:stream::write, or something else is
    > probably best encapsulated in a policy, as well; hence the
    > writer objects in my example.


    We're talking about a simple, one of application. Any use of
    policies or templates here is just unnecessary complication.

    > In real code, I'd have used my own custom line type, rather
    > than std::string.


    Probably. I have a type Line that I generally use for such
    things. In which case, of course, I'd use the << (since it's
    typed), without the trailing "<< '\n'" (since the class will
    take care of it).

    > The point of the algorithm is to be an algorithm, not to
    > hard-code a bunch of low-level details that we should be able
    > to vary independently of each other.


    We're not designing a general library.

    > > (Alternatively, you could arrange for getWriter to return a
    > > "null stream" if the line is too short or doesn't have one
    > > of the privileged values.)


    > To each his own, I guess.


    Yes. There are two valid solutions here. In some ways, I
    prefer the null stream version; in others, the version above.
    Given the apparent lack of experience of the original poster, I
    think that the solution using a null pointer is probably easier
    for him to understand and implement. In my own work, I'd
    probably use the null stream (especially since I have one ready
    made in my library).

    > >> Since the expected byte values are consecutive, a
    > >> bounds-checked array wrapper class would serve nicely. The
    > >> map would have the benefit of sparsity, but the OP says there
    > >> are only five acceptable byte values.


    > > A map would be simpler to evolve.


    > I don't see how.


    But you just said it. What happens if a sixth type comes along,
    and it isn't consecutive?

    Of course, given that there are a maximum of 256 values, the
    easiest solution might simply be:
    std::eek:stream* files[ UCHAR_MAX + 1 ] :
    Or tr1::array, if you have that available. (Or in my case,
    Gabi::ArrayOf< std::eek:stream*, CHAR_MIN, CHAR_MAX + 1 >, so you
    can index directly with the char, without worrying about
    converting it to unsigned char. But unlike C style arrays or
    tr1::array, ArrayOf doesn't support static initialization.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 15, 2009
    #8
  9. scad

    James Kanze Guest

    Re: Read a file line by line and write each line to a file based onthe 5th byte

    On May 15, 8:46 am, Juha Nieminen <> wrote:
    > James Kanze wrote:
    > > (I've never found a real use for container<>::at.)


    > In a situation where speed is irrelevant, why not use at()
    > instead of operator[]?


    Because it does the wrong thing (in most cases, anyway).

    > If you happen to make a bug, better to get it caught clearly
    > than having the program misbehave in strange ways.


    But that's the case with both operators. At least in the
    implementation I use most often, operator[] with a bounds error
    core dumps. Which is what I want if there is an error.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 15, 2009
    #9
  10. Re: Read a file line by line and write each line to a file basedon the 5th byte

    Alf P. Steinbach wrote:
    > I think you should be clear that what you're advocating is not using
    > operator[] instead of at, but using a checking version of operator[].


    Which is precisely what at() does...
     
    Juha Nieminen, May 15, 2009
    #10
  11. Re: Read a file line by line and write each line to a file basedon the 5th byte

    * Juha Nieminen:
    > Alf P. Steinbach wrote:
    >> I think you should be clear that what you're advocating is not using
    >> operator[] instead of at, but using a checking version of operator[].

    >
    > Which is precisely what at() does...


    Sorry, no.

    'at' is restricted in how it reports a bounds error, namely via an exception,
    while operator[] is not so restricted. So when we're talking about 'at' we're
    talking by default about the reporting mechanism (exception) mandated by the
    standard. While when we're talking about a checking [], then we're necessarily
    talking about some compiler-specific reporting mechanism, typically a crash.

    And James' point is (presumably) that he prefers an invalid index to assert() or
    always do the crash thing (*nix core dump or Windows JIT debugging), rather than
    throwing an exception -- because an invalid index indicates a bug, unlike e.g.
    the exhaustion of some dynamic resource. Of course, countering that, the
    standard's exception class hierarchy does have an exception class dedicated to
    logic errors. But AFAIK very few regard that hierarchy as a good design...

    However, I think that when the intention is to guarantee crash behavior then it
    shouldn't be guaranteed via some implied usage of proper compiler option and
    restriction to a compiler that supports it.

    And further, the possibility of having [] yield crash behavior through some
    compiler specific means is really not an argument in favor of [], since at least
    in principle the same can be done for 'at'. It's not like compilers are somehow
    prevented from offering non-conforming features such as crashing 'at', and it's
    not like one is relying on the standard when advocating []: one is then relying
    on very much in-practice tool usage, not guaranteed anywhere. So just saying
    that [] is preferable to 'at' is misleading because it compares a /customized/
    version of [] to the default 'at', while the proper comparision is IMHO between
    either /default/ raw indexing [] versus 'at', where 'at' wins handily wrt. to
    bug detection, or between /customized/ [] versus customized 'at', where there's
    no difference -- so, wrt. this, 'at' is either better, or the same.

    Finally, one's notational options are not limited to 'at' and [], so playing the
    one against each other is a false dichotomy.

    For example, where bounds checking with guaranteed crash is desired, one might
    use a notation such as 'in(v,i)', or even 'v AT(i)', then *guaranteeing* the
    checking rather than relying on unspecified compiler support and options.


    Cheers,

    - Alf

    --
    Due to hosting requirements I need visits to <url: http://alfps.izfree.com/>.
    No ads, and there is some C++ stuff! :) Just going there is good. Linking
    to it is even better! Thanks in advance!
     
    Alf P. Steinbach, May 15, 2009
    #11
  12. scad

    James Kanze Guest

    Re: Read a file line by line and write each line to a file based onthe 5th byte

    On May 15, 12:55 pm, "Alf P. Steinbach" <> wrote:
    > * James Kanze:


    > > On May 15, 8:46 am, Juha Nieminen <> wrote:
    > >> James Kanze wrote:
    > >>> (I've never found a real use for container<>::at.)


    > >> In a situation where speed is irrelevant, why not use at()
    > >> instead of operator[]?


    > > Because it does the wrong thing (in most cases, anyway).


    > >> If you happen to make a bug, better to get it caught clearly
    > >> than having the program misbehave in strange ways.


    > > But that's the case with both operators. At least in the
    > > implementation I use most often, operator[] with a bounds error
    > > core dumps. Which is what I want if there is an error.


    > What if a compiler does not provide a checking operator[]?


    Don't use it. And yes, I know, one doesn't always have that
    option. But G++ and VC++ do behave correctly, which means that
    the two most widely used compilers are OK.

    > I think you should be clear that what you're advocating is not
    > using operator[] instead of at, but using a checking version
    > of operator[].


    The whole purpose of undefined behavior is to allow checking.
    The standard doesn't have any notion of "requiring a crash", and
    exactly what that means depends on the platform---you wouldn't
    want to impose a "core dump" under Windows, for example. So it
    says "undefined behavior", and leaves the rest up to quality of
    implementation.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 15, 2009
    #12
  13. scad

    James Kanze Guest

    Re: Read a file line by line and write each line to a file based onthe 5th byte

    On May 15, 3:21 pm, "Alf P. Steinbach" <> wrote:
    > * Juha Nieminen:


    > > Alf P. Steinbach wrote:
    > >> I think you should be clear that what you're advocating is
    > >> not using operator[] instead of at, but using a checking
    > >> version of operator[].


    > > Which is precisely what at() does...


    > Sorry, no.


    > 'at' is restricted in how it reports a bounds error, namely
    > via an exception, while operator[] is not so restricted. So
    > when we're talking about 'at' we're talking by default about
    > the reporting mechanism (exception) mandated by the standard.
    > While when we're talking about a checking [], then we're
    > necessarily talking about some compiler-specific reporting
    > mechanism, typically a crash.


    > And James' point is (presumably) that he prefers an invalid
    > index to assert() or always do the crash thing (*nix core dump
    > or Windows JIT debugging), rather than throwing an exception
    > -- because an invalid index indicates a bug, unlike e.g. the
    > exhaustion of some dynamic resource. Of course, countering
    > that, the standard's exception class hierarchy does have an
    > exception class dedicated to logic errors. But AFAIK very few
    > regard that hierarchy as a good design...


    Exactly.

    > However, I think that when the intention is to guarantee crash
    > behavior then it shouldn't be guaranteed via some implied
    > usage of proper compiler option and restriction to a compiler
    > that supports it.


    > And further, the possibility of having [] yield crash behavior
    > through some compiler specific means is really not an argument
    > in favor of [], since at least in principle the same can be
    > done for 'at'. It's not like compilers are somehow prevented
    > from offering non-conforming features such as crashing 'at',
    > and it's not like one is relying on the standard when
    > advocating []: one is then relying on very much in-practice
    > tool usage, not guaranteed anywhere. So just saying that [] is
    > preferable to 'at' is misleading because it compares a
    > /customized/ version of [] to the default 'at', while the
    > proper comparision is IMHO between either /default/ raw
    > indexing [] versus 'at', where 'at' wins handily wrt. to bug
    > detection, or between /customized/ [] versus customized 'at',
    > where there's no difference -- so, wrt. this, 'at' is either
    > better, or the same.


    A compiler cannot crash if there is a bounds error in at(),
    because the standard says exactly what it should do. There are
    almost certainly programs which depend on it, and there are
    probably a few cases where it is even a reasonable solution,
    even if I've never seen them. (Jeff's use of at earlier in this
    thread might actually be one, if the correct behavior if the
    line is too short is to raise an exception---although more
    likely, even in that case, you'd want some different exception.)

    > Finally, one's notational options are not limited to 'at' and
    > [], so playing the one against each other is a false
    > dichotomy.


    If you're just using std::vector, those are really the only two
    choices. (Unless you're using iterators, in which case, you'll
    normally get the behavior of [].)

    > For example, where bounds checking with guaranteed crash is
    > desired, one might use a notation such as 'in(v,i)', or even
    > 'v AT(i)', then *guaranteeing* the checking rather than
    > relying on unspecified compiler support and options.


    My own experience would suggest that the exception is
    appropriate rarely enough that there really isn't a need to
    support it in the standard. In my pre-standard array classes,
    there were two access functions: operator[] and unsafeAt. The
    first was guaranteed to result in an assertion failure, and the
    second resulted in truely undefined behavior. That was state of
    the art in 1990, more or less. The standard library represents
    state of the art ca. 1970. But we're stuck with it.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 15, 2009
    #13
  14. Re: Read a file line by line and write each line to a file basedon the 5th byte

    * James Kanze:
    > On May 15, 3:21 pm, "Alf P. Steinbach" <> wrote:
    >
    >> However, I think that when the intention is to guarantee crash
    >> behavior then it shouldn't be guaranteed via some implied
    >> usage of proper compiler option and restriction to a compiler
    >> that supports it.

    >
    >> And further, the possibility of having [] yield crash behavior
    >> through some compiler specific means is really not an argument
    >> in favor of [], since at least in principle the same can be
    >> done for 'at'. It's not like compilers are somehow prevented
    >> from offering non-conforming features such as crashing 'at',
    >> and it's not like one is relying on the standard when
    >> advocating []: one is then relying on very much in-practice
    >> tool usage, not guaranteed anywhere. So just saying that [] is
    >> preferable to 'at' is misleading because it compares a
    >> /customized/ version of [] to the default 'at', while the
    >> proper comparision is IMHO between either /default/ raw
    >> indexing [] versus 'at', where 'at' wins handily wrt. to bug
    >> detection, or between /customized/ [] versus customized 'at',
    >> where there's no difference -- so, wrt. this, 'at' is either
    >> better, or the same.

    >
    > A compiler cannot crash if there is a bounds error in at(),
    > because the standard says exactly what it should do.


    Re-quoting myself from above, "It's not like compilers are somehow prevented
    from offering non-conforming features".

    Consider for example the MSVC treatment of a "throw()" specification...

    In short, when you're in compiler-specific land, that's where you are at. :)


    > There are almost certainly programs which depend on it,


    For those programs don't ask the compiler to use non-compliant crashing 'at'.

    With separate programs, your example, it's simple.

    However, with g++, how do you compile one part of the program with checking
    behavior of [], when some other part is an object file or lib compiled with
    non-checking []? This is a rhetorical question. It's my intention that instead
    of answering the question literally (involving source code changes), you compare
    it to your own argument regarding 'at', and note that for [] it's more serious.

    Hence, above consideration combined with the apparent complete lack of compilers
    that implement the crashing 'at' :), plus the fact not all compilers in common
    use support a range-checking [], e.g. note that MSVC 7.1 does not, my suggestion
    of using an alternative notation, like some indexing routine or macro.

    <example>
    // This program intentionally has Undefined Behavior: arbitrary result or e.g. a
    crash.
    #include <iostream>
    #include <string>
    #include <stddef.h> // ptrdiff_t
    #include <assert.h>

    typedef ptrdiff_t Size;
    typedef ptrdiff_t Index;

    template< typename C >
    Size nElements( C const& c ) { return c.size(); }

    template< typename C >
    typename C::value_type& operator^( C& c, Index i )
    {
    assert( 0 <= i && i < nElements( c ) );
    return c;
    }

    template< typename C >
    typename C::value_type const& operator^( C const& c, Index i )
    {
    assert( 0 <= i && i < nElements( c ) );
    return c;
    }

    int main()
    {
    using namespace std;
    string const s = "Blah blah...";
    cout << "'" << (s^43) << "'" << endl;
    }
    </example>

    Hm, I'd prefer @, as (I think) it is in Smalltalk, but no such in C++...

    Also, the % operator has better precedence, but is less mnemonic/readable. And
    there is the problem of a container with a value-producing []. That's a thorny
    one, but it's late, and I leave the thinking to you (there must surely be a
    practical solution, if not TMP auto-magic then just specialization).

    But, anyway, for the novice I just recommend 'at', and I think it's a disservice
    to them to recommend [] (even though it might in practice be the better choice
    for the professional) because it's tool specific and not necessarily available.


    > and there are
    > probably a few cases where it is even a reasonable solution,
    > even if I've never seen them. (Jeff's use of at earlier in this
    > thread might actually be one, if the correct behavior if the
    > line is too short is to raise an exception---although more
    > likely, even in that case, you'd want some different exception.)
    >
    >> Finally, one's notational options are not limited to 'at' and
    >> [], so playing the one against each other is a false
    >> dichotomy.

    >
    > If you're just using std::vector, those are really the only two
    > choices. (Unless you're using iterators, in which case, you'll
    > normally get the behavior of [].)
    >
    >> For example, where bounds checking with guaranteed crash is
    >> desired, one might use a notation such as 'in(v,i)', or even
    >> 'v AT(i)', then *guaranteeing* the checking rather than
    >> relying on unspecified compiler support and options.

    >
    > My own experience would suggest that the exception is
    > appropriate rarely enough that there really isn't a need to
    > support it in the standard. In my pre-standard array classes,
    > there were two access functions: operator[] and unsafeAt. The
    > first was guaranteed to result in an assertion failure, and the
    > second resulted in truely undefined behavior. That was state of
    > the art in 1990, more or less. The standard library represents
    > state of the art ca. 1970. But we're stuck with it.


    I agree with this. :)


    Cheers,

    - Alf

    --
    Due to hosting requirements I need visits to <url: http://alfps.izfree.com/>.
    No ads, and there is some C++ stuff! :) Just going there is good. Linking
    to it is even better! Thanks in advance!
     
    Alf P. Steinbach, May 15, 2009
    #14
  15. Re: Read a file line by line and write each line to a file basedon the 5th byte

    * James Kanze:
    > On May 15, 12:55 pm, "Alf P. Steinbach" <> wrote:
    >
    >> What if a compiler does not provide a checking operator[]?

    >
    > Don't use it. And yes, I know, one doesn't always have that
    > option. But G++ and VC++ do behave correctly, which means that
    > the two most widely used compilers are OK.


    For MSVC it depends very much on version, e.g. no such in 7.1.

    For g++ it involves using a special debugging version of the standard library,
    by defining _GLIBCXX_DEBUG.

    Which means it must be applied to all compilation units, or none.

    Anyways, it's tool specific.

    Else-thread I put some concrete flesh on my notational suggestion. Not
    completely fleshed out. But I think enough to convey basic idea. :)


    Cheers & hth.,

    - Alf

    --
    Due to hosting requirements I need visits to <url: http://alfps.izfree.com/>.
    No ads, and there is some C++ stuff! :) Just going there is good. Linking
    to it is even better! Thanks in advance!
     
    Alf P. Steinbach, May 15, 2009
    #15
  16. scad

    James Kanze Guest

    Re: Read a file line by line and write each line to a file based onthe 5th byte

    On May 15, 5:10 pm, Jeff Schwab <> wrote:
    > James Kanze wrote:
    > > A compiler cannot crash if there is a bounds error in at(),
    > > because the standard says exactly what it should do. There
    > > are almost certainly programs which depend on it, and there
    > > are probably a few cases where it is even a reasonable
    > > solution, even if I've never seen them. (Jeff's use of at
    > > earlier in this thread might actually be one, if the correct
    > > behavior if the line is too short is to raise an
    > > exception---although more likely, even in that case, you'd
    > > want some different exception.)


    > From the container's perspective, it's exactly the right exception.


    Certainly. That's why having the container throw the exception
    is rarely the right solution:).

    > One abstraction layer up, the line interpreter can easily
    > catch the out-of-bounds exception and throw something
    > meaningful to its own client code.


    Or the line interpreter could check the bounds itself, and throw
    the appropriate exception.

    Logically, I'd consider too short a line or an invalid value in
    the line an input error; I'd expect to see it detected by the
    code which validates the input. In this case, the requirements
    on the input are simple enough that just counting on the
    exception using at() is enough, but typically, there will be
    other aspects to check as well. (I'm probably a bit of a maniac
    about this, but I always validate my input, in every possible
    way.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 16, 2009
    #16
  17. scad

    James Kanze Guest

    Re: Read a file line by line and write each line to a file based onthe 5th byte

    On May 15, 11:44 pm, "Alf P. Steinbach" <> wrote:
    > * James Kanze:
    > > On May 15, 3:21 pm, "Alf P. Steinbach" <> wrote:


    > >> However, I think that when the intention is to guarantee
    > >> crash behavior then it shouldn't be guaranteed via some
    > >> implied usage of proper compiler option and restriction to
    > >> a compiler that supports it.


    > >> And further, the possibility of having [] yield crash
    > >> behavior through some compiler specific means is really not
    > >> an argument in favor of [], since at least in principle the
    > >> same can be done for 'at'. It's not like compilers are
    > >> somehow prevented from offering non-conforming features
    > >> such as crashing 'at', and it's not like one is relying on
    > >> the standard when advocating []: one is then relying on
    > >> very much in-practice tool usage, not guaranteed anywhere.
    > >> So just saying that [] is preferable to 'at' is misleading
    > >> because it compares a /customized/ version of [] to the
    > >> default 'at', while the proper comparision is IMHO between
    > >> either /default/ raw indexing [] versus 'at', where 'at'
    > >> wins handily wrt. to bug detection, or between /customized/
    > >> [] versus customized 'at', where there's no difference --
    > >> so, wrt. this, 'at' is either better, or the same.


    > > A compiler cannot crash if there is a bounds error in at(),
    > > because the standard says exactly what it should do.


    > Re-quoting myself from above, "It's not like compilers are
    > somehow prevented from offering non-conforming features".


    Yes, but we're not supposed to talk about them here:).
    Seriously, if conformance isn't an issue, the compiler might not
    have the at() function anyway. (I seem to recall some very
    early implementations which didn't.)

    > Consider for example the MSVC treatment of a "throw()"
    > specification...


    > In short, when you're in compiler-specific land, that's where
    > you are at. :)


    > > There are almost certainly programs which depend on it,


    > For those programs don't ask the compiler to use non-compliant
    > crashing 'at'.


    > With separate programs, your example, it's simple.


    > However, with g++, how do you compile one part of the program
    > with checking behavior of [], when some other part is an
    > object file or lib compiled with non-checking []?


    You don't. The results will core dump.

    In general, you don't compile different parts of the program
    with different options. Some options are harmless (e.g. warning
    levels), but a number of them will break binary compatibility.
    (Personally, I find this horrible, but that's the way it is.)

    > This is a rhetorical question. It's my intention that instead
    > of answering the question literally (involving source code
    > changes), you compare it to your own argument regarding 'at',
    > and note that for [] it's more serious.


    I must be missing something, but I don't see how it's relevant
    for either.

    > Hence, above consideration combined with the apparent complete
    > lack of compilers that implement the crashing 'at' :), plus
    > the fact not all compilers in common use support a
    > range-checking [], e.g. note that MSVC 7.1 does not, my
    > suggestion of using an alternative notation, like some
    > indexing routine or macro.


    Agreed. Fundamentally, you have the choice of using at(), and
    getting a standard defined exception, or using [], doing your
    own bounds checking beforehand, and getting anything you want.
    When the exact exception that at() generates is the appropriate
    behavior (which in my experience, is almost never the case), use
    at(). In all other cases, use [] and your own checking.

    (FWIW: in pre-standard days, when I was designing my own
    containers, I actually started by designing a callback
    mechanism. It aborted by default, but the user could set it to
    do pretty much whatever he wanted: throw an application specific
    exception, return a default value, etc. In the end, I dropped
    it, because it made the interface too heavy---to be really
    useful, you'd need different callbacks for different contexts.)

    > <example>
    > // This program intentionally has Undefined Behavior:
    > // arbitrary result or e.g. a crash.
    > #include <iostream>
    > #include <string>
    > #include <stddef.h> // ptrdiff_t
    > #include <assert.h>


    > typedef ptrdiff_t Size;
    > typedef ptrdiff_t Index;


    > template< typename C >
    > Size nElements( C const& c ) { return c.size(); }


    > template< typename C >
    > typename C::value_type& operator^( C& c, Index i )
    > {
    > assert( 0 <= i && i < nElements( c ) );
    > return c;
    > }


    > template< typename C >
    > typename C::value_type const& operator^( C const& c, Index i )
    > {
    > assert( 0 <= i && i < nElements( c ) );
    > return c;
    > }


    > int main()
    > {
    > using namespace std;
    > string const s = "Blah blah...";
    > cout << "'" << (s^43) << "'" << endl;
    > }


    > </example>


    > Hm, I'd prefer @, as (I think) it is in Smalltalk, but no such
    > in C++...


    I'd prefer making it a wrapper class, and using []. But the
    basic idea is sound.

    > Also, the % operator has better precedence, but is less
    > mnemonic/readable. And there is the problem of a container
    > with a value-producing []. That's a thorny one, but it's late,
    > and I leave the thinking to you (there must surely be a
    > practical solution, if not TMP auto-magic then just
    > specialization).


    > But, anyway, for the novice I just recommend 'at', and I think
    > it's a disservice to them to recommend [] (even though it
    > might in practice be the better choice for the professional)
    > because it's tool specific and not necessarily available.


    For the novice, I'd recommend getting a good implementation,
    which crashes. Given that this is the case with the (free)
    up-to-date implementations from Microsoft and g++, there's no
    real reason not to *for learning*. (Professionally, we don't
    always have a choice of compilers we're using. For learning,
    I'd say that you should be using either g++ 4.0 up or the latest
    VC++. Or Comeau, with the Dinkumware library, which is the same
    as the one used in VC++.)

    Not just for reasons of having a [] which crashes. (I'm less
    sure about VC++, but pre-4.0 g++ didn't have fully standard name
    look-up.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 16, 2009
    #17
  18. scad

    James Kanze Guest

    Re: Read a file line by line and write each line to a file based onthe 5th byte

    On May 15, 11:50 pm, "Alf P. Steinbach" <> wrote:
    > * James Kanze:


    > > On May 15, 12:55 pm, "Alf P. Steinbach" <> wrote:


    > >> What if a compiler does not provide a checking operator[]?


    > > Don't use it. And yes, I know, one doesn't always have that
    > > option. But G++ and VC++ do behave correctly, which means
    > > that the two most widely used compilers are OK.


    > For MSVC it depends very much on version, e.g. no such in 7.1.


    > For g++ it involves using a special debugging version of the
    > standard library, by defining _GLIBCXX_DEBUG.


    > Which means it must be applied to all compilation units, or
    > none.


    Both compilers require a fistful of options to be in anyway
    usable:

    For VC++ (8.0):
    -DNOMINMAX -DGB_EFmtDoesntWork -D_CRT_SECURE_NO_DEPRECATE
    -vmg -GR -Gy -EHs -Zc:forScope,wchar_t -J -nologo -MDd
    -GS- -Zi -w -D_DEBUG

    For g++ (4.0 and up):
    -Wno-missing-field-initializers -fdiagnostics-show-option
    -std=c++98 -pedantic -ffor-scope -fno-gnu-keywords
    -foperator-names -pipe -Wall -W -Wno-sign-compare
    -Wno-deprecated -Wno-non-virtual-dtor -Wpointer-arith
    -Wno-unused -Wno-switch -Wno-missing-braces -Wno-long-long
    -ggdb3 -D_GLIBCXX_CONCEPT_CHECKS -D_GLIBCXX_DEBUG
    -D_GLIBCXX_DEBUG_PEDANTIC

    (Extracted from my makefiles, so it's possible that there is
    some historic crud in there.)

    > Anyways, it's tool specific.


    Handling erroneous code always is.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 16, 2009
    #18
  19. scad

    James Kanze Guest

    Re: Read a file line by line and write each line to a file based onthe 5th byte

    On May 16, 4:10 pm, "Alf P. Steinbach" <> wrote:
    > * James Kanze:


    [...]
    > > Not just for reasons of having a [] which crashes. (I'm
    > > less sure about VC++, but pre-4.0 g++ didn't have fully
    > > standard name look-up.)


    > The lastest version of g++ for Windows is AFAIK 3.4.5.


    Whose last version? :)

    MSys has a 4.3.0, qualified "Testing"; the last stable version
    is 3.4.5, as you say, but a newer version is available. Cygwin
    has 4.3.2 (plus a lot of others); I'm unable to find any
    statement concerning what they consider "stable". I don't know
    about the others.

    But who'd want to use g++ under Windows anyway. Even under
    CygWin and MSys, I use VC++. (That way, I know exactly what I'm
    getting with regards to the system API. There have been several
    ports of g++ to Windows, using different libraries for the API,
    and some are, to put it mildly, strange.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 17, 2009
    #19
  20. Re: Read a file line by line and write each line to a file basedon the 5th byte

    * James Kanze:
    > On May 16, 4:10 pm, "Alf P. Steinbach" <> wrote:
    >> * James Kanze:

    >
    > [...]
    >>> Not just for reasons of having a [] which crashes. (I'm
    >>> less sure about VC++, but pre-4.0 g++ didn't have fully
    >>> standard name look-up.)

    >
    >> The lastest version of g++ for Windows is AFAIK 3.4.5.

    >
    > Whose last version? :)
    >
    > MSys has a 4.3.0, qualified "Testing"; the last stable version
    > is 3.4.5, as you say, but a newer version is available. Cygwin
    > has 4.3.2 (plus a lot of others); I'm unable to find any
    > statement concerning what they consider "stable". I don't know
    > about the others.
    >
    > But who'd want to use g++ under Windows anyway.


    Anybody serious about programming.

    It's a good idea to have the code compile with at least two compilers.

    I believe that's even an item in two or more "effective, efficient, effig
    whatever" C++ books.


    Cheers & hth.,

    - Alf

    --
    Due to hosting requirements I need visits to <url: http://alfps.izfree.com/>.
    No ads, and there is some C++ stuff! :) Just going there is good. Linking
    to it is even better! Thanks in advance!
     
    Alf P. Steinbach, May 17, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. crash.test.dummy
    Replies:
    1
    Views:
    943
    Knute Johnson
    Feb 17, 2006
  2. Deep
    Replies:
    6
    Views:
    497
    Nick Keighley
    Feb 28, 2007
  3. daved170

    read text file byte by byte

    daved170, Dec 12, 2009, in forum: Python
    Replies:
    30
    Views:
    1,860
    Nobody
    Dec 16, 2009
  4. PerlFAQ Server
    Replies:
    0
    Views:
    145
    PerlFAQ Server
    Jan 26, 2011
  5. PerlFAQ Server

    FAQ 6.14 How do I process each word on each line?

    PerlFAQ Server, Apr 8, 2011, in forum: Perl Misc
    Replies:
    0
    Views:
    159
    PerlFAQ Server
    Apr 8, 2011
Loading...

Share This Page