Question about reading from stream.

Discussion in 'C++' started by Carfield Yim, Mar 25, 2009.

  1. Carfield Yim

    Carfield Yim Guest

    HI all, we currently using following code to read a file that another
    process continuously appending content ( like tail -f something to
    process )

    infile.seekg(_currentFilePointer);
    infile.read(_buffer,_buffer_size) ;
    _bytesLeftInBuffer = infile.gcount() ;

    I suspect this in fact is very noneffective, what is the more prefer
    way to read from a growing files? Use getc() ? But isn't that need to
    loop many time?
    Carfield Yim, Mar 25, 2009
    #1
    1. Advertising

  2. Carfield Yim wrote, On 25.3.2009 16:32:
    > HI all, we currently using following code to read a file that another
    > process continuously appending content ( like tail -f something to
    > process )
    >
    > infile.seekg(_currentFilePointer);
    > infile.read(_buffer,_buffer_size) ;
    > _bytesLeftInBuffer = infile.gcount() ;
    >
    > I suspect this in fact is very noneffective, what is the more prefer
    > way to read from a growing files? Use getc() ? But isn't that need to
    > loop many time?

    If you are concerned about raw speed than do not use streams for IO, there
    are many layers of abstraction that makes things less than spectacular. That
    said, do you know that the IO is the bottleneck of your application? Avoid
    premature optimization.

    --
    VH
    Vaclav Haisman, Mar 25, 2009
    #2
    1. Advertising

  3. Vaclav Haisman wrote:
    > Carfield Yim wrote, On 25.3.2009 16:32:
    >> HI all, we currently using following code to read a file that another
    >> process continuously appending content ( like tail -f something to
    >> process )
    >>
    >> infile.seekg(_currentFilePointer);
    >> infile.read(_buffer,_buffer_size) ;
    >> _bytesLeftInBuffer = infile.gcount() ;
    >>
    >> I suspect this in fact is very noneffective, what is the more prefer
    >> way to read from a growing files? Use getc() ? But isn't that need to
    >> loop many time?

    > If you are concerned about raw speed than do not use streams for IO, there
    > are many layers of abstraction that makes things less than spectacular. That
    > said, do you know that the IO is the bottleneck of your application? Avoid
    > premature optimization.


    Those are good suggestions, and we all can agree that to optimize one
    most often needs to measure first. But it does not take a measurement
    to know that IO is a bottleneck. In every application. Hardware is
    slow. And one needs to keep things like IO in mind when devising the
    approach to serialization. Some optimization is not premature, like
    picking quick sort over bubble sort: you don't need measurements for
    that, you can use the measurements people have collected over the years.

    On the flip side, once the sort is abstracted, one algorithm can
    probably be replaced with another easily; so, to the OP: don't integrate
    reading/writing into your code too tightly. Create an abstraction layer
    so you can use a different method of serializing once you figure that
    you might need that.

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Mar 25, 2009
    #3
  4. Carfield Yim

    James Kanze Guest

    On Mar 25, 4:32 pm, Carfield Yim <> wrote:
    > HI all, we currently using following code to read a file that
    > another process continuously appending content ( like tail -f
    > something to process )


    > infile.seekg(_currentFilePointer);
    > infile.read(_buffer,_buffer_size) ;
    > _bytesLeftInBuffer = infile.gcount() ;


    > I suspect this in fact is very noneffective, what is the more
    > prefer way to read from a growing files? Use getc() ? But
    > isn't that need to loop many time?


    There isn't really anything to support this in C++. Once a
    filebuf has seen end of file, it stops; end of file is the end.
    If you want something like tail -f, you'll have to use a lower
    level, open and read, under Unix, for example. (I don't know if
    CreateFile/ReadFile will work for this under Windows.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Mar 26, 2009
    #4
  5. Carfield Yim

    Carfield Yim Guest

    On Mar 26, 5:41 am, Victor Bazarov <> wrote:
    > Vaclav Haisman wrote:
    > > Carfield Yim wrote, On 25.3.2009 16:32:
    > >> HI all, we currently using following code to read a file that another
    > >> process continuously appending content ( like tail -f something to
    > >> process )

    >
    > >>     infile.seekg(_currentFilePointer);
    > >>     infile.read(_buffer,_buffer_size) ;
    > >>     _bytesLeftInBuffer = infile.gcount() ;

    >
    > >> I suspect this in fact is very noneffective, what is the more prefer
    > >> way to read from a growing files? Use getc() ? But isn't that need to
    > >> loop many time?

    > > If you are concerned about raw speed than do not use streams for IO, there
    > > are many layers of abstraction that makes things less than spectacular. That
    > > said, do you know that the IO is the bottleneck of your application? Avoid
    > > premature optimization.

    >
    > Those are good suggestions, and we all can agree that to optimize one
    > most often needs to measure first.  But it does not take a measurement
    > to know that IO is a bottleneck.  In every application.  Hardware is
    > slow.  And one needs to keep things like IO in mind when devising the
    > approach to serialization.  Some optimization is not premature, like
    > picking quick sort over bubble sort: you don't need measurements for
    > that, you can use the measurements people have collected over the years.
    >
    > On the flip side, once the sort is abstracted, one algorithm can
    > probably be replaced with another easily; so, to the OP: don't integrate
    > reading/writing into your code too tightly.  Create an abstraction layer
    > so you can use a different method of serializing once you figure that
    > you might need that.
    >
    > V
    > --
    > Please remove capital 'A's when replying by e-mail
    > I do not respond to top-posted replies, please don't ask


    Thx, btw I am not yet try to optimize it , I just feel this is kind of
    tedious to seek everytime. Of course may be that is required.
    Carfield Yim, Mar 26, 2009
    #5
  6. Victor Bazarov wrote, On 25.3.2009 22:41:
    > Vaclav Haisman wrote:
    >> Carfield Yim wrote, On 25.3.2009 16:32:
    >>> HI all, we currently using following code to read a file that another
    >>> process continuously appending content ( like tail -f something to
    >>> process )
    >>>
    >>> infile.seekg(_currentFilePointer);
    >>> infile.read(_buffer,_buffer_size) ;
    >>> _bytesLeftInBuffer = infile.gcount() ;
    >>>
    >>> I suspect this in fact is very noneffective, what is the more prefer
    >>> way to read from a growing files? Use getc() ? But isn't that need to
    >>> loop many time?

    >> If you are concerned about raw speed than do not use streams for IO,
    >> there
    >> are many layers of abstraction that makes things less than
    >> spectacular. That
    >> said, do you know that the IO is the bottleneck of your application?
    >> Avoid
    >> premature optimization.

    >
    > Those are good suggestions, and we all can agree that to optimize one
    > most often needs to measure first. But it does not take a measurement
    > to know that IO is a bottleneck. In every application. Hardware is

    I partially disagree. IO is always slow, that's true. But for many
    applications the time spent doing and waiting for IO is not the majority of
    their run time. Processing of data can take a lot more time than the raw IO
    itself. In such applications IO is not the bottleneck.

    > slow. And one needs to keep things like IO in mind when devising the
    > approach to serialization. Some optimization is not premature, like
    > picking quick sort over bubble sort: you don't need measurements for
    > that, you can use the measurements people have collected over the years.
    >
    > On the flip side, once the sort is abstracted, one algorithm can
    > probably be replaced with another easily; so, to the OP: don't integrate
    > reading/writing into your code too tightly. Create an abstraction layer
    > so you can use a different method of serializing once you figure that
    > you might need that.
    >
    > V


    --
    VH
    Vaclav Haisman, Mar 26, 2009
    #6
  7. Vaclav Haisman wrote:
    > Victor Bazarov wrote, On 25.3.2009 22:41:
    >> Vaclav Haisman wrote:
    >>> Carfield Yim wrote, On 25.3.2009 16:32:
    >>>> HI all, we currently using following code to read a file that another
    >>>> process continuously appending content ( like tail -f something to
    >>>> process )
    >>>>
    >>>> infile.seekg(_currentFilePointer);
    >>>> infile.read(_buffer,_buffer_size) ;
    >>>> _bytesLeftInBuffer = infile.gcount() ;
    >>>>
    >>>> I suspect this in fact is very noneffective, what is the more prefer
    >>>> way to read from a growing files? Use getc() ? But isn't that need to
    >>>> loop many time?
    >>> If you are concerned about raw speed than do not use streams for IO,
    >>> there
    >>> are many layers of abstraction that makes things less than
    >>> spectacular. That
    >>> said, do you know that the IO is the bottleneck of your application?
    >>> Avoid
    >>> premature optimization.

    >> Those are good suggestions, and we all can agree that to optimize one
    >> most often needs to measure first. But it does not take a measurement
    >> to know that IO is a bottleneck. In every application. Hardware is

    > I partially disagree.


    With what, exactly?

    > IO is always slow, that's true.


    That's it. Period. Slow. When you read data from a file or write data
    to a file, IO is the bottleneck, not conversions (if any), not
    compression (if any), not creation of any other auxiliary objects...

    > But for many
    > applications the time spent doing and waiting for IO is not the majority of
    > their run time.


    No, but what does their overall run time have to do with the fact that
    during reading or writing data the interaction with the device through
    the platform abstractions (isn't that what the streams are?) is the
    slowest part?

    Why do customers care about startup time or the time it takes to load a
    file into the application? They only do it a few times a day. And if
    the application is stable, you don't have to shut it down at all, ever,
    right? But for some reason people still try to make the startup
    quicker, loading of files threaded, and so on. Why? It only takes a
    few minutes? Yes, but those are often the minutes of waiting deducted
    from the lives of our customers, you know.

    > Processing of data can take a lot more time than the raw IO
    > itself. In such applications IO is not the bottleneck.


    In my overall life IO takes really, really tiny portion. It's not a
    bottleneck at all. But then again, nothing is. Breathing, maybe. Or
    thinking, decision making. But we're not talking about overall run of
    the program, are we?

    Sorry, did mean to snap (if it appeared to be so).

    >> [..]

    >
    > --
    > VH


    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Mar 26, 2009
    #7
  8. Carfield Yim

    James Kanze Guest

    On Mar 26, 8:44 pm, Victor Bazarov <> wrote:

    [...]
    > >> Those are good suggestions, and we all can agree that to
    > >> optimize one most often needs to measure first. But it
    > >> does not take a measurement to know that IO is a
    > >> bottleneck. In every application. Hardware is

    > > I partially disagree.


    > With what, exactly?


    > > IO is always slow, that's true.


    > That's it. Period. Slow. When you read data from a file or
    > write data to a file, IO is the bottleneck, not conversions
    > (if any), not compression (if any), not creation of any other
    > auxiliary objects...


    That's usually true, but not always. I've seen cases where
    compression was a "bottleneck" (although this is relative---in
    the final code, more time was spent on compression than on the
    physical writes, but the program was still faster with the
    compression, since it wrote a lot less). And I've seen cases
    where where allocations were far more significant than the
    actual writes. So there are exceptions.

    And of course, if you're writing to a pipe under Unix, the
    writes can be very, very fast.

    > > But for many applications the time spent doing and waiting
    > > for IO is not the majority of their run time.


    > No, but what does their overall run time have to do with the
    > fact that during reading or writing data the interaction with
    > the device through the platform abstractions (isn't that what
    > the streams are?) is the slowest part?


    Well, if you eliminate the other causes, IO will end up being
    the slowest remaining part.

    > Why do customers care about startup time or the time it takes
    > to load a file into the application? They only do it a few
    > times a day.


    That depends on the application. I've written servers that run
    for years at a time---startup time isn't significant. But I've
    also written a lot of Unix filters, which are invoked
    interactively (often on a block of text in the editor). In such
    cases, start-up time can be an issue. (If you doubt it, try
    using one written in Java---where loading the classes ensures a
    significantly long start up time.)

    > And if the application is stable, you don't have to shut it
    > down at all, ever, right?


    It depends on the application. What if it's a compiler? Or a
    Unix filter like grep or sed? For that matter, if clients share
    no data directly, there are strong arguments for starting up a
    new instance of a server for each connection; you don't want
    start up time to be too long there, either. (Note that on the
    server I currently work on, the start-up time is several tens of
    seconds---the time to reconstruct all of the persistent data
    structures in memory, resynchronize with the data base, etc.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Mar 27, 2009
    #8
  9. Carfield Yim

    Carfield Yim Guest

    On Mar 27, 1:09 am, Carfield Yim <> wrote:
    > On Mar 26, 5:41 am, Victor Bazarov <> wrote:
    >
    >
    >
    > > Vaclav Haisman wrote:
    > > > Carfield Yim wrote, On 25.3.2009 16:32:
    > > >> HI all, we currently using following code to read a file that another
    > > >> process continuously appending content ( like tail -f something to
    > > >> process )

    >
    > > >>     infile.seekg(_currentFilePointer);
    > > >>     infile.read(_buffer,_buffer_size) ;
    > > >>     _bytesLeftInBuffer = infile.gcount() ;

    >
    > > >> I suspect this in fact is very noneffective, what is the more prefer
    > > >> way to read from a growing files? Use getc() ? But isn't that need to
    > > >> loop many time?
    > > > If you are concerned about raw speed than do not use streams for IO, there
    > > > are many layers of abstraction that makes things less than spectacular. That
    > > > said, do you know that the IO is the bottleneck of your application? Avoid
    > > > premature optimization.

    >
    > > Those are good suggestions, and we all can agree that to optimize one
    > > most often needs to measure first.  But it does not take a measurement
    > > to know that IO is a bottleneck.  In every application.  Hardware is
    > > slow.  And one needs to keep things like IO in mind when devising the
    > > approach to serialization.  Some optimization is not premature, like
    > > picking quick sort over bubble sort: you don't need measurements for
    > > that, you can use the measurements people have collected over the years..

    >
    > > On the flip side, once the sort is abstracted, one algorithm can
    > > probably be replaced with another easily; so, to the OP: don't integrate
    > > reading/writing into your code too tightly.  Create an abstraction layer
    > > so you can use a different method of serializing once you figure that
    > > you might need that.

    >
    > > V
    > > --
    > > Please remove capital 'A's when replying by e-mail
    > > I do not respond to top-posted replies, please don't ask

    >
    > Thx, btw I am not yet try to optimize it , I just feel this is kind of
    > tedious to seek everytime. Of course may be that is required.


    In fact what I really like to ask is, if it is usual to

    seek and read, save the position, then seek and read again.

    for a growing file to process? Look like this required moving the file
    cursor again and again.
    Carfield Yim, Mar 27, 2009
    #9
  10. Carfield Yim

    Guest

    On Mar 27, 4:03 am, James Kanze <> wrote:
    > On Mar 26, 8:44 pm, Victor Bazarov <> wrote:
    >
    >     [...]
    >
    > > >> Those are good suggestions, and we all can agree that to
    > > >> optimize one most often needs to measure first.  But it
    > > >> does not take a measurement to know that IO is a
    > > >> bottleneck.  In every application.  Hardware is
    > > > I partially disagree.

    > > With what, exactly?
    > >  > IO is always slow, that's true.
    > > That's it.  Period.  Slow.  When you read data from a file or
    > > write data to a file, IO is the bottleneck, not conversions
    > > (if any), not compression (if any), not creation of any other
    > > auxiliary objects...

    >
    > That's usually true, but not always.  I've seen cases where
    > compression was a "bottleneck" (although this is relative---in
    > the final code, more time was spent on compression than on the
    > physical writes, but the program was still faster with the
    > compression, since it wrote a lot less).  And I've seen cases
    > where where allocations were far more significant than the
    > actual writes.  So there are exceptions.
    >
    > And of course, if you're writing to a pipe under Unix, the
    > writes can be very, very fast.
    >
    > > > But for many applications the time spent doing and waiting
    > > > for IO is not the majority of their run time.

    > > No, but what does their overall run time have to do with the
    > > fact that during reading or writing data the interaction with
    > > the device through the platform abstractions (isn't that what
    > > the streams are?) is the slowest part?

    >
    > Well, if you eliminate the other causes, IO will end up being
    > the slowest remaining part.
    >
    > > Why do customers care about startup time or the time it takes
    > > to load a file into the application?  They only do it a few
    > > times a day.

    >
    > That depends on the application.  I've written servers that run
    > for years at a time---startup time isn't significant.  But I've
    > also written a lot of Unix filters, which are invoked
    > interactively (often on a block of text in the editor).  In such
    > cases, start-up time can be an issue.  (If you doubt it, try
    > using one written in Java---where loading the classes ensures a
    > significantly long start up time.)
    >
    > > And if the application is stable, you don't have to shut it
    > > down at all, ever, right?

    >
    > It depends on the application.  What if it's a compiler?  Or a
    > Unix filter like grep or sed?  For that matter, if clients share
    > no data directly, there are strong arguments for starting up a
    > new instance of a server for each connection;


    I think the arguments in favor of a long running server are
    stronger than those against it in the case of compilers.
    The design and implementation have to be such that separate
    requests do not interfere with each other. There are some
    steps that you can take that help in that area, but don't
    require anything like a new process and a completely fresh
    start. Besides the basic efficiencies afforded, there's
    a lot of basic information that doesn't change between
    requests. Why rescan/prepare for <vector> billions of
    times when it doesn't change one little bit? It surprises
    me that you question this given what I know of your
    background.

    > you don't want
    > start up time to be too long there, either.


    It has to done efficiently. Single-run compilers are a
    luxury that is fading. I harp on this, but gcc needs to
    be overhauled twice. First a rewrite in C++ and then a
    rewrite to be an on line compiler. The first phase of
    the on line part could be to simply run once and exit
    after each request. That though would have to be
    replaced by a real server approach that runs like a
    champ. They are so far away from this it isn't even
    funny. As far as I know all they are working on is
    C++0X support. Some of that is important, too, but
    they shouldn't keep ignoring these other matters.
    It may be that gcc has just become a dinosaur that
    can't adapt to the times. They certainly haven't
    done a good job keeping up in some respects. Where's
    the "gcc on line" page like some compilers have? And
    even the compilers that have that type of page
    haven't done much with them in the past ten years.

    I understood this stuff in 1999 so don't think it
    should be a surprise to people now. The internet is
    here to stay. I didn't vote for George W. Bust (I lived
    in Texas when he was governor and I voted for him as
    governor at least once, but by the time he ran for
    President I was on to him. I didn't vote for Barack
    Obama, aka B.O., either.), but one thing Bust got right
    was to encourage people to bring new services on line in
    many of his speeches. The US would be busted even worse
    if not for that consistent encouragement.


    Brian Wood
    Ebenezer Enterprises
    www.webEbenezer.net

    "Trust in the L-rd with all your heart and lean not on
    your own understanding. In all your ways acknowledge
    him and he will direct your paths." Proverbs 3:5,6
    , Mar 27, 2009
    #10
  11. Carfield Yim

    James Kanze Guest

    On Mar 27, 8:31 pm, wrote:
    > On Mar 27, 4:03 am, James Kanze <> wrote:
    > > On Mar 26, 8:44 pm, Victor Bazarov <> wrote:


    > > [...]
    > > It depends on the application. What if it's a compiler? Or a
    > > Unix filter like grep or sed? For that matter, if clients share
    > > no data directly, there are strong arguments for starting up a
    > > new instance of a server for each connection;


    > I think the arguments in favor of a long running server are
    > stronger than those against it in the case of compilers.
    > The design and implementation have to be such that separate
    > requests do not interfere with each other. There are some
    > steps that you can take that help in that area, but don't
    > require anything like a new process and a completely fresh
    > start. Besides the basic efficiencies afforded, there's
    > a lot of basic information that doesn't change between
    > requests. Why rescan/prepare for <vector> billions of
    > times when it doesn't change one little bit? It surprises
    > me that you question this given what I know of your
    > background.


    The contents of std::vector are data, not code, and don't
    evolve. There's nothing wrong with having it cached somewhere,
    maybe loaded by mmap, but just keeping a server up so that
    compilations won't have to reread it seems a bit overkill.
    (There's also the fact that formally, the effects of compiling
    an include file depend on what macros are defined beforehand.
    Even something like std::vector: the user doesn't have the right
    to define any macros which might conflict, but most
    implementations have two or more different versions, depending
    on the settings of various macros.)

    Anyway, my comments were, largely, based on the way things are,
    rather than how they could be. I've not actually given the idea
    of implementing a compiler as a server much thought, but today's
    compilers are not implemented that way. I certainly don't want
    to imply that things have to be like they are.

    > > you don't want
    > > start up time to be too long there, either.


    > It has to done efficiently. Single-run compilers are a
    > luxury that is fading. I harp on this, but gcc needs to
    > be overhauled twice. First a rewrite in C++ and then a
    > rewrite to be an on line compiler. The first phase of
    > the on line part could be to simply run once and exit
    > after each request. That though would have to be
    > replaced by a real server approach that runs like a
    > champ. They are so far away from this it isn't even
    > funny.


    So are most of the other compiler implementers, as far as I
    know.

    I'm not too sure what the server approach would by us in most
    cases, as opposed, say, to decent pre-compiled headers and
    caching. (If I were implementing a compiler today, it would
    definitely make intensive use of caching; as you say, there's no
    point in reparsing std::vector everytime you compile.)

    > As far as I know all they are working on is
    > C++0X support. Some of that is important, too, but
    > they shouldn't keep ignoring these other matters.
    > It may be that gcc has just become a dinosaur that
    > can't adapt to the times. They certainly haven't
    > done a good job keeping up in some respects. Where's
    > the "gcc on line" page like some compilers have?


    The "xxx on line" pages I know of for other compilers are just
    front ends, which run the usual, batch mode compiler. It would
    be trivial for someone to do this with g++---if Comeau wanted,
    for example, I doubt that it would take Greg more than a half a
    day to modify his page so you could use either his compiler or
    g++ (for comparison purposes?).

    I'm certainly in favor of making compilers accessible on-line,
    but that's a different issue. No professional organization
    would use such a compiler, except for test or comparison
    purposes.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Mar 28, 2009
    #11
  12. Carfield Yim

    Guest

    On Mar 28, 6:09 am, James Kanze <> wrote:
    > On Mar 27, 8:31 pm, wrote:
    >
    >
    >
    >
    >
    > > On Mar 27, 4:03 am, James Kanze <> wrote:
    > > > On Mar 26, 8:44 pm, Victor Bazarov <> wrote:
    > > >     [...]
    > > > It depends on the application.  What if it's a compiler?  Or a
    > > > Unix filter like grep or sed?  For that matter, if clients share
    > > > no data directly, there are strong arguments for starting up a
    > > > new instance of a server for each connection;

    > > I think the arguments in favor of a long running server are
    > > stronger than those against it in the case of compilers.
    > > The design and implementation have to be such that separate
    > > requests do not interfere with each other.  There are some
    > > steps that you can take that help in that area, but don't
    > > require anything like a new process and a completely fresh
    > > start.  Besides the basic efficiencies afforded, there's
    > > a lot of basic information that doesn't change between
    > > requests.  Why rescan/prepare for <vector> billions of
    > > times when it doesn't change one little bit?  It surprises
    > > me that you question this given what I know of your
    > > background.

    >
    > The contents of std::vector are data, not code, and don't
    > evolve.  There's nothing wrong with having it cached somewhere,
    > maybe loaded by mmap, but just keeping a server up so that
    > compilations won't have to reread it seems a bit overkill.
    > (There's also the fact that formally, the effects of compiling
    > an include file depend on what macros are defined beforehand.
    > Even something like std::vector: the user doesn't have the right
    > to define any macros which might conflict, but most
    > implementations have two or more different versions, depending
    > on the settings of various macros.)
    >


    In that case, it might make sense to only have the most common
    case cached and reparse the file if someone is doing something
    somewhat unusual. That's how I would probably start to support
    the idea.

    > Anyway, my comments were, largely, based on the way things are,
    > rather than how they could be.  I've not actually given the idea
    > of implementing a compiler as a server much thought, but today's
    > compilers are not implemented that way.  I certainly don't want
    > to imply that things have to be like they are.
    >
    > > > you don't want
    > > > start up time to be too long there, either.

    > > It has to done efficiently.  Single-run compilers are a
    > > luxury that is fading.  I harp on this, but gcc needs to
    > > be overhauled twice.  First a rewrite in C++ and then a
    > > rewrite to be an on line compiler.  The first phase of
    > > the on line part could be to simply run once and exit
    > > after each request.  That though would have to be
    > > replaced by a real server approach that runs like a
    > > champ.  They are so far away from this it isn't even
    > > funny.

    >
    > So are most of the other compiler implementers, as far as I
    > know.


    Well, I think some C++ compilers are written in C++ so they
    would be in better shape (potentially) in my opinion than gcc.
    But probably few if any of them are being rewritten to be
    on line servers. I believe that will change. I read an
    article in the Wall Street Journal about how some companies
    are giving away their software because over the past 18 months
    their software stopped selling. That's not exactly news
    except that the practice of giving away software is spreading
    and companies that have previously been immune to some of the
    market forces are now having to play by the rules that the
    smaller guys accept.

    >
    > I'm not too sure what the server approach would by us in most
    > cases, as opposed, say, to decent pre-compiled headers and
    > caching.  (If I were implementing a compiler today, it would
    > definitely make intensive use of caching; as you say, there's no
    > point in reparsing std::vector everytime you compile.)


    People have mentioned on here that it is a headache to
    manage more than a couple of compilers on your own system.
    It buys efficiency though mainly. A new release is made in
    one place and then anyone may use it without having to down-
    load and install it. This avoids the case where a user
    accidentally corrupts his installation and then has to
    download and reinstall the compiler. The main advantage
    might be the speed with which new releases and fixes can be
    made available.


    >
    > The "xxx on line" pages I know of for other compilers are just
    > front ends, which run the usual, batch mode compiler.  It would
    > be trivial for someone to do this with g++---if Comeau wanted,
    > for example, I doubt that it would take Greg more than a half a
    > day to modify his page so you could use either his compiler or
    > g++ (for comparison purposes?).


    Someone will probably do it eventually.

    >
    > I'm certainly in favor of making compilers accessible on-line,
    > but that's a different issue.  No professional organization
    > would use such a compiler, except for test or comparison
    > purposes.
    >


    What about using it within their intranet? I think the choice
    comes down to using a service for free on a public site or
    paying for it and using it behind a firewall. Individuals and
    smaller organizations will probably go with the free approach
    and may use some techniques to protect their work.


    Brian Wood
    Ebenezer Enterprises
    www.webEbenezer.net
    , Mar 28, 2009
    #12
  13. Carfield Yim

    Guest

    On Mar 28, 12:21 pm, wrote:
    >
    > What about using it within their intranet?   I think the choice
    > comes down to using a service for free on a public site or
    > paying for it and using it behind a firewall.   Individuals and
    > smaller organizations will probably go with the free approach
    > and may use some techniques to protect their work.


    I would like to point out a couple more things. Those that
    can afford to buy the service and use it privately are paying
    a high price for the privacy because they have to pay for
    managing/maintaining it themselves. They have to patch it
    whenever they want to pick up a fix. They better be a very
    well run organization in order to make that work. And say
    Comeau had an on line version of his compiler. How can he
    prevent someone who works for an organization who has bought
    a license from him from copying his software and then trying
    to make some money by making illegal copies of it? It's tough
    to stop. This is a reason why Google and others who have the
    on line versions of programs similar to Microsoft Office are
    in better shape in my opinion than Microsoft going forward.
    Microsoft drags it's feet in developing/promoting their on
    line versions.

    In my case, I don't trust most politicians or governments
    to do the right thing by me. So I make the service
    available for free on line, but don't think it's a good
    idea to sell it to someone because it might be illegally
    copied after that. Software piracy/theft was a problem
    before the recent economic problems, so I doubt matters
    will improve and they may get worse. The truth is the
    Chinese, Indian, Russian, etc. governments just don't
    care about an American boy like me. They gotta think
    about helping their people out. If that means
    betraying me or Microsoft, that's OK with them.


    Brian Wood
    Ebenezer Enterprises
    www.webEbenezer.net
    , Mar 28, 2009
    #13
  14. Carfield Yim

    Guest

    On Mar 28, 12:21 pm, wrote:
    > On Mar 28, 6:09 am, James Kanze <> wrote:
    >
    >
    >
    > > On Mar 27, 8:31 pm, wrote:

    >
    > > > On Mar 27, 4:03 am, James Kanze <> wrote:
    > > > > On Mar 26, 8:44 pm, Victor Bazarov <> wrote:
    > > > >     [...]
    > > > > It depends on the application.  What if it's a compiler?  Or a
    > > > > Unix filter like grep or sed?  For that matter, if clients share
    > > > > no data directly, there are strong arguments for starting up a
    > > > > new instance of a server for each connection;
    > > > I think the arguments in favor of a long running server are
    > > > stronger than those against it in the case of compilers.
    > > > The design and implementation have to be such that separate
    > > > requests do not interfere with each other.  There are some
    > > > steps that you can take that help in that area, but don't
    > > > require anything like a new process and a completely fresh
    > > > start.  Besides the basic efficiencies afforded, there's
    > > > a lot of basic information that doesn't change between
    > > > requests.  Why rescan/prepare for <vector> billions of
    > > > times when it doesn't change one little bit?  It surprises
    > > > me that you question this given what I know of your
    > > > background.

    >
    > > The contents of std::vector are data, not code, and don't
    > > evolve.  There's nothing wrong with having it cached somewhere,
    > > maybe loaded by mmap, but just keeping a server up so that
    > > compilations won't have to reread it seems a bit overkill.
    > > (There's also the fact that formally, the effects of compiling
    > > an include file depend on what macros are defined beforehand.
    > > Even something like std::vector: the user doesn't have the right
    > > to define any macros which might conflict, but most
    > > implementations have two or more different versions, depending
    > > on the settings of various macros.)

    >
    > In that case, it might make sense to only have the most common
    > case cached and reparse the file if someone is doing something
    > somewhat unusual.  That's how I would probably start to support
    > the idea.
    >
    >
    >
    > > Anyway, my comments were, largely, based on the way things are,
    > > rather than how they could be.  I've not actually given the idea
    > > of implementing a compiler as a server much thought, but today's
    > > compilers are not implemented that way.  I certainly don't want
    > > to imply that things have to be like they are.

    >
    > > > > you don't want
    > > > > start up time to be too long there, either.
    > > > It has to done efficiently.  Single-run compilers are a
    > > > luxury that is fading.  I harp on this, but gcc needs to
    > > > be overhauled twice.  First a rewrite in C++ and then a
    > > > rewrite to be an on line compiler.  The first phase of
    > > > the on line part could be to simply run once and exit
    > > > after each request.  That though would have to be
    > > > replaced by a real server approach that runs like a
    > > > champ.  They are so far away from this it isn't even
    > > > funny.

    >
    > > So are most of the other compiler implementers, as far as I
    > > know.

    >
    > Well, I think some C++ compilers are written in C++ so they
    > would be in better shape (potentially) in my opinion than gcc.
    > But probably few if any of them are being rewritten to be
    > on line servers.  I believe that will change.  I read an
    > article in the Wall Street Journal about how some companies
    > are giving away their software because over the past 18 months
    > their software stopped selling.  That's not exactly news
    > except that the practice of giving away software is spreading
    > and companies that have previously been immune to some of the
    > market forces are now having to play by the rules that the
    > smaller guys accept.
    >
    >
    >
    > > I'm not too sure what the server approach would by us in most
    > > cases, as opposed, say, to decent pre-compiled headers and
    > > caching.  (If I were implementing a compiler today, it would
    > > definitely make intensive use of caching; as you say, there's no
    > > point in reparsing std::vector everytime you compile.)

    >
    > People have mentioned on here that it is a headache to
    > manage more than a couple of compilers on your own system.
    > It buys efficiency though mainly.  A new release is made in
    > one place and then anyone may use it without having to down-
    > load and install it.  This avoids the case where a user
    > accidentally corrupts his installation and then has to
    > download and reinstall the compiler.  The main advantage
    > might be the speed with which new releases and fixes can be
    > made available.
    >


    I felt like there were some other things to list, but couldn't
    think of them at the time. On one of Comeau's page it says:

    "Evaluating Comeau C++ for your purchase. Some customers ask
    about a test, demo, eval or trial version of Comeau C++. This
    form is the next best thing."

    Well, that form is helpful, but I'm not sure he can use it
    to replace demos if he has a big client that wants more
    than they can tell from the on line form. In that case he
    would have to do something additional to provide them with
    a demo. If he had the compiler fully on line, he would be
    able to totally avoid demo-related expenses.


    Brian Wood
    Ebenezer Enterprises
    www.webEbenezer.net
    , Apr 3, 2009
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rasmusson, Lars
    Replies:
    1
    Views:
    754
    popov
    Apr 30, 2004
  2. Replies:
    9
    Views:
    628
    Alex Buell
    Apr 27, 2006
  3. Alexander Korsunsky

    get stream mode flags from an opened stream

    Alexander Korsunsky, Feb 17, 2007, in forum: C++
    Replies:
    1
    Views:
    452
    John Harrison
    Feb 17, 2007
  4. dolphin
    Replies:
    6
    Views:
    555
    Thomas Fritsch
    Mar 18, 2007
  5. mrstephengross
    Replies:
    3
    Views:
    397
    James Kanze
    May 10, 2007
Loading...

Share This Page