hmm: code bloat?...

Discussion in 'C Programming' started by BGB / cr88192, Jan 5, 2010.

  1. hmm, an actual question of sorts...

    I recently noticed in my project, which is primarily C based, a few bits of
    trivia:
    it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);
    610.5 kloc of this go into a VM project (dynamic C compiler + garbage
    collector + ...).

    so, ~ 1.094 Mloc of C.

    about 483.5 kloc then is stuff related to 3D and misc (328.5 kloc of which
    are my own code, 155 kloc coming from Quake2, errm, being essentially the
    entire Q2 engine, where the code was combined mostly for my own
    fiddling...).


    mind that the vast majority of this is code I wrote myself, this being
    primarily a single person project...


    so, does something like this seem like there is a good deal of code bloat
    going on?...
    is it better to keep on going like this, or maybe look for parts to shave
    off (even if one doesn't want to remove much?...).

    or do others just keep on going as always before?...

    more so, are there other implications from a codebase having a continual
    tendency to inflate?... (say, if ones' code tends to inflate at an average
    rate of, say, 100-150 kloc/yr or so...).



    another observation:
    the ratio between C and headers is notably different than in Quake2, where
    Q2 has an H/C ratio of 0.14, but in my code it is closer to 0.4 or so...

    any thoughts about the ration of headers and C code, or any "interesting"
    implications from this property?...
    (well, beyond my usage of some automatic header-writing tools, preference
    for small-ish functions, ...).


    ....

    just wondering is all...
    BGB / cr88192, Jan 5, 2010
    #1
    1. Advertising

  2. BGB / cr88192

    bartc Guest

    Re: code bloat?...

    "BGB / cr88192" <> wrote in message
    news:hhuvlg$kg2$...
    > hmm, an actual question of sorts...
    >
    > I recently noticed in my project, which is primarily C based, a few bits
    > of trivia:
    > it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);
    > 610.5 kloc of this go into a VM project (dynamic C compiler + garbage
    > collector + ...).
    >
    > so, ~ 1.094 Mloc of C.
    >
    > about 483.5 kloc then is stuff related to 3D and misc (328.5 kloc of which
    > are my own code, 155 kloc coming from Quake2, errm, being essentially the
    > entire Q2 engine, where the code was combined mostly for my own
    > fiddling...).


    >
    > mind that the vast majority of this is code I wrote myself, this being
    > primarily a single person project...
    >
    >
    > so, does something like this seem like there is a good deal of code bloat
    > going on?...
    > is it better to keep on going like this, or maybe look for parts to shave
    > off (even if one doesn't want to remove much?...).


    For a language project, 600kloc sounds like a lot of code to me, especially
    for a one-man project.

    But then, from bits of code you've posted elsewhere, you seem to like
    complicated ways of doing things...

    (I kept my own projects in line by doing reviews every so often and perhaps
    reorganising/rewriting some subsystem or other, when it seemed about to get
    out of hand, or consolidating it with another.

    Revising to keep the source-code small was also quite fun to do, second only
    to optimising for performance... My largest project, quite an elaborate
    application, was some 150kloc; included inside that (about 20kloc) was a
    bytecode compiler and interpreter...)

    > or do others just keep on going as always before?...


    I would say: redesign and rewrite. Otherwise I don't think you're going to
    make much impact on that 1000kloc.

    (However, if you're writing a commercial application, customers like value
    for money. I had to ship my product on CD (this was some years ago) even
    though it fitted easily on one floppy disk, to make it appear more
    substantial.)

    --
    bartc
    bartc, Jan 5, 2010
    #2
    1. Advertising

  3. BGB / cr88192

    Tom St Denis Guest

    Re: code bloat?...

    On Jan 5, 7:33 am, "bartc" <> wrote:
    > For a language project, 600kloc sounds like a lot of code to me, especially
    > for a one-man project.


    Depends on whether you're maintaining it. I mean I write apps against
    glibc, do I get to include that 100K [or whatever it is] loc in my
    project tally?

    If he's including the Q2 engine and only modifying a few lines here or
    there to suit his particular platform he's hardly maintaining it. To
    him it's just another library he links in.

    His build process should really have two "clean" targets, one that
    just cleans his code (removes objects corresponding to files he
    maintains) and another "cleanest" [or whatever you want to call it]
    that removes all objects. That wait for most of his builds he's not
    rebuilding static code over and over...

    Unless long build times are impressing his boss...

    > (However, if you're writing a commercial application, customers like value
    > for money. I had to ship my product on CD (this was some years ago) even
    > though it fitted easily on one floppy disk, to make it appear more
    > substantial.)


    Customers for whatever reason like to think their applications are
    doing a lot of thinking. That's why you'll see installers that churn
    with progress bars while not really doing much of anything (I've
    caught a few where the idle time was near 100% and syswait 0% and it
    just sat there). Always good to put technical sounding words in there
    like "optimizing data" ... [damn you Adobe...]

    If you need to fill space, nothing better than 100s of 10MB files full
    of random crap stuffed inside a zip archive renamed .PAK to look
    important :)

    Tom
    Tom St Denis, Jan 5, 2010
    #3
  4. On 5 Jan, 09:10, "BGB / cr88192" <> wrote:

    > hmm, an actual question of sorts...
    >
    > I recently noticed in my project, which is primarily C based, a few bits of
    > trivia:
    > it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);
    > 610.5 kloc of this go into a VM project (dynamic C compiler + garbage
    > collector + ...).
    >
    > so, ~ 1.094 Mloc of C.
    >
    > about 483.5 kloc then is stuff related to 3D and misc (328.5 kloc of which
    > are my own code, 155 kloc coming from Quake2, errm, being essentially the
    > entire Q2 engine, where the code was combined mostly for my own
    > fiddling...).
    >
    > mind that the vast majority of this is code I wrote myself, this being
    > primarily a single person project...


    you've written an MLOC of code...


    > so, does something like this seem like there is a good deal of code bloat
    > going on?...


    how could we tell? Maybe you've got a MLOC of functionality in there.
    I suspect with such a large code base there's *some* fat but without
    looking its just a guess. What do you think? Have you tried
    refactoring it?


    > is it better to keep on going like this,


    like what? What's wrong with what you are doing? Is it hard to modify
    or debug? Is there a lot of duplicate code?

    > or maybe look for parts to shave
    > off (even if one doesn't want to remove much?...).


    are you talking about removing duplicate code or dead code [good] or
    removing little used functionality [ok] or actually reducing
    functionality [sounds bad]. Do you have users? Do they want
    functionality removed (in my experience- hardly ever!). Do they think
    its too big?


    > or do others just keep on going as always before?...


    I'm a firm believer (in theory if not in practice!) that there comes a
    point where things should be re-written if they get too creaky.

    You should take a look at refactoring...


    > more so, are there other implications from a codebase having a continual
    > tendency to inflate?... (say, if ones' code tends to inflate at an average
    > rate of, say, 100-150 kloc/yr or so...).


    are you adding functionality? Do you refactor?


    > another observation:
    > the ratio between C and headers is notably different than in Quake2, where
    > Q2 has an H/C ratio of 0.14, but in my code it is closer to 0.4 or so...


    never measured this one. My first thought was I'd expect this be about
    1.0.
    A quick look at some code and... yes around 1.0. Why do you have so
    few header files! And what the hell does Q2 put in its C files! Do
    they use a one file per function coding standard?

    > any thoughts about the ration of headers and C code, or any "interesting"
    > implications from this property?...
    > (well, beyond my usage of some automatic header-writing tools, preference
    > for small-ish functions, ...).
    >
    > just wondering is all...


    I tend to use a one header and c file per "module" (or class if its C+
    +). Why would you have multiple C files to one header file? I suppose
    a module could require multiple C files to implement it that all
    communicated by one header file. Sounds like such a module needs
    breaking up.
    Nick Keighley, Jan 5, 2010
    #4
  5. On Jan 5, 11:10 am, "BGB / cr88192" <> wrote:
    > hmm, an actual question of sorts...


    I'll try and give you an answer, sort of... :)

    > I recently noticed in my project, which is primarily C based, a few bits of
    > trivia:
    > it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);
    > 610.5 kloc of this go into a VM project (dynamic C compiler + garbage
    > collector + ...).


    600 kLOC for a C VM is too much. I've actually implemented a (large)
    subset of a C compiler + an interpreter + garbage collector in under
    4k LOC, so I know the magnitude of the problem. Perhaps you need a lot
    of code refactoring, which is expected in one man projects.

    Then again, it occurs to me you might not be using the correct tools
    for the job. I'd be interested to hear more details regarding your
    project, actually.

    > about 483.5 kloc then is stuff related to 3D and misc (328.5 kloc of which
    > are my own code, 155 kloc coming from Quake2, errm, being essentially the
    > entire Q2 engine, where the code was combined mostly for my own
    > fiddling...).


    This part sounds fairly typical.

    > so, does something like this seem like there is a good deal of code bloat
    > going on?...


    Definitely sounds bloated (at least the first part).

    > is it better to keep on going like this, or maybe look for parts to shave
    > off (even if one doesn't want to remove much?...).


    Definitely try to refactor. Or even redesign.

    > or do others just keep on going as always before?...


    Nope. I always review and reconsider my design, trying to identify
    problems, limitations, etc. Perhaps not on a regular chronological
    basis, but surely before the shit hits the fan - that would be when I
    find out $(wc -l *.c *.h) -eq OVER NINE THOUSAND. ;-)

    > more so, are there other implications from a codebase having a continual
    > tendency to inflate?... (say, if ones' code tends to inflate at an average
    > rate of, say, 100-150 kloc/yr or so...).


    That all depends on the project really. There's no objectivity in pure
    LOC.

    > another observation:
    > the ratio between C and headers is notably different than in Quake2, where
    > Q2 has an H/C ratio of 0.14, but in my code it is closer to 0.4 or so...
    > any thoughts about the ration of headers and C code, or any "interesting"
    > implications from this property?


    Well, such an increased ratio might mean that there's lots of macros
    or functions etc. in those .h files that can be reused. Which is
    somewhat contradictory to the LOC count for your project (which we
    don't know what it's about, btw. Care to give a hint?).
    Michael Foukarakis, Jan 5, 2010
    #5
  6. BGB / cr88192

    Nick Guest

    "BGB / cr88192" <> writes:

    > hmm, an actual question of sorts...
    >
    > I recently noticed in my project, which is primarily C based, a few bits of
    > trivia:
    > it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);
    > 610.5 kloc of this go into a VM project (dynamic C compiler + garbage
    > collector + ...).
    >
    > so, ~ 1.094 Mloc of C.


    That does seem a heck of a lot. The website in my sig is driven by
    36klock of my own, and a total of 270kloc (but that includes a wiki
    renderer, an HTML templating system, the SQLite database engine and a
    pile of other bits).

    That implements a scripting language to do the boring bits, and the
    data structures to allow that scripting language to do clever things
    (like compute the best path through the graph depending on current
    preferences).

    Four times as much seems a lot for /anything/.

    I know I've a fair chunk of dead code in there as well - it still
    supports legacy data formats and other databases.

    > another observation:
    > the ratio between C and headers is notably different than in Quake2, where
    > Q2 has an H/C ratio of 0.14, but in my code it is closer to 0.4 or so...


    Is that lines in *.c to lines in *.h, or number of .c files and number
    of .h files?
    --
    Online waterways route planner | http://canalplan.eu
    Plan trips, see photos, check facilities | http://canalplan.org.uk
    Nick, Jan 5, 2010
    #6
  7. Re: code bloat?...

    "Francis Glassborow" <> wrote in message
    news:...
    > bartc wrote:
    >> "BGB / cr88192" <> wrote in message
    >> news:hhuvlg$kg2$...
    >>> hmm, an actual question of sorts...
    >>>
    >>> I recently noticed in my project, which is primarily C based, a few bits
    >>> of trivia:
    >>> it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);
    >>> 610.5 kloc of this go into a VM project (dynamic C compiler + garbage
    >>> collector + ...).
    >>>
    >>> so, ~ 1.094 Mloc of C.
    >>>
    >>> about 483.5 kloc then is stuff related to 3D and misc (328.5 kloc of
    >>> which
    >>> are my own code, 155 kloc coming from Quake2, errm, being essentially
    >>> the
    >>> entire Q2 engine, where the code was combined mostly for my own
    >>> fiddling...).

    >>
    >>>
    >>> mind that the vast majority of this is code I wrote myself, this being
    >>> primarily a single person project...
    >>>
    >>>
    >>> so, does something like this seem like there is a good deal of code
    >>> bloat
    >>> going on?...
    >>> is it better to keep on going like this, or maybe look for parts to
    >>> shave
    >>> off (even if one doesn't want to remove much?...).

    >>
    >> For a language project, 600kloc sounds like a lot of code to me,
    >> especially
    >> for a one-man project.
    >>
    >> But then, from bits of code you've posted elsewhere, you seem to like
    >> complicated ways of doing things...
    >>
    >> (I kept my own projects in line by doing reviews every so often and
    >> perhaps
    >> reorganising/rewriting some subsystem or other, when it seemed about to
    >> get
    >> out of hand, or consolidating it with another.
    >>
    >> Revising to keep the source-code small was also quite fun to do, second
    >> only
    >> to optimising for performance... My largest project, quite an elaborate
    >> application, was some 150kloc; included inside that (about 20kloc) was a
    >> bytecode compiler and interpreter...)
    >>
    >>> or do others just keep on going as always before?...

    >>
    >> I would say: redesign and rewrite. Otherwise I don't think you're going
    >> to
    >> make much impact on that 1000kloc.

    >
    > I agree, it is very easy for code to grow and eventually become
    > unmaintainable. I would certainly review the whole of the code base for
    > starters. Though that will take some time given its size I am sure that
    > you will benefit in the long run.
    >


    yeah.

    I noted some while documenting stuff that there are things floating around
    in the code that I have almost never used, for example:
    the 'channels' feature I heard about from a video from Rob Pike, and then
    implemented (I guess I am more used to async communication mechanisms, and
    have a harder time seeing why I would have multiple threads but want them to
    operate lock-step...);
    some stuff for genetic-programming and neural nets, for which thus far I
    have found little use (apart from specialized tools, which is where I
    originally wrote this code);
    ....

    in a few cases, I have implemented APIs which have turned out absurdly
    bulky:
    my class/instance system has around 500 API calls;
    ....

    as well I guess as a general design strategy which leads to bulk:
    writing an actual textual assembler, code generators which produce textual
    ASM;
    ....

    components which, in retrospect, have debatable use:
    such as an x86 interpreter (this thing by itself being ~ 50 kloc);
    ....


    > I would not worry over much about the .h/.c ratio. Well designed code
    > where each function does one thing tends to have rather higher proportion
    > of lines in header files but modern optimising compilers often manage to
    > do link time optimisations so that your large number of functions do not
    > actually result in an equivalent number of fgunction calls in the
    > executable.


    yeah.

    I guess there is lots going on to inflate the amount of stuff present in
    headers.

    it is also notable that given my use of tools, pretty much all prototypes
    end up in headers, rather than just ones manaually put there. Q2 seems to be
    a little more lax, and only put "many" of the prototypes in headers.

    I guess there are a lot of structs, ... as well, since I generally don't put
    structs in C files, ...
    BGB / cr88192, Jan 5, 2010
    #7
  8. "Nick Keighley" <> wrote in message
    news:...
    > On 5 Jan, 09:10, "BGB / cr88192" <> wrote:
    >
    >> hmm, an actual question of sorts...
    >>
    >> I recently noticed in my project, which is primarily C based, a few bits
    >> of
    >> trivia:
    >> it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);
    >> 610.5 kloc of this go into a VM project (dynamic C compiler + garbage
    >> collector + ...).
    >>
    >> so, ~ 1.094 Mloc of C.
    >>
    >> about 483.5 kloc then is stuff related to 3D and misc (328.5 kloc of
    >> which
    >> are my own code, 155 kloc coming from Quake2, errm, being essentially the
    >> entire Q2 engine, where the code was combined mostly for my own
    >> fiddling...).
    >>
    >> mind that the vast majority of this is code I wrote myself, this being
    >> primarily a single person project...

    >
    > you've written an MLOC of code...
    >


    apparently...


    >
    >> so, does something like this seem like there is a good deal of code bloat
    >> going on?...

    >
    > how could we tell? Maybe you've got a MLOC of functionality in there.
    > I suspect with such a large code base there's *some* fat but without
    > looking its just a guess. What do you think? Have you tried
    > refactoring it?
    >


    I used to do that some, usually trying to trim down the project
    occasionally.
    however, I have not done that in a while, and I suspect there is a bit of a
    pile-up.

    the present state thus being a little over 1 Mloc of code, and around 5GB of
    files (the vast majority being PNG's).


    last time I did a major clean up was back when my project was in the ~ 300
    kloc range, was I think a few years ago.


    >
    >> is it better to keep on going like this,

    >
    > like what? What's wrong with what you are doing? Is it hard to modify
    > or debug? Is there a lot of duplicate code?
    >


    I meant, just keep going on coding stuff, as the codebase gets ever larger.


    >> or maybe look for parts to shave
    >> off (even if one doesn't want to remove much?...).

    >
    > are you talking about removing duplicate code or dead code [good] or
    > removing little used functionality [ok] or actually reducing
    > functionality [sounds bad]. Do you have users? Do they want
    > functionality removed (in my experience- hardly ever!). Do they think
    > its too big?
    >


    I could do all of these...

    I suspect I may get around 50-100 kloc from dead code removal (or, more
    like, dead component removal, nevermind smaller code).

    removing some little-used stuff is also possible.
    removing actually-used components is possible, but I guess I will probably
    not do this.


    >
    >> or do others just keep on going as always before?...

    >
    > I'm a firm believer (in theory if not in practice!) that there comes a
    > point where things should be re-written if they get too creaky.
    >
    > You should take a look at refactoring...
    >


    yeah, I may need to consider this.


    >
    >> more so, are there other implications from a codebase having a continual
    >> tendency to inflate?... (say, if ones' code tends to inflate at an
    >> average
    >> rate of, say, 100-150 kloc/yr or so...).

    >
    > are you adding functionality? Do you refactor?
    >


    adding functionality is done a lot;
    refactoring, not so much.

    I had often used the "cellular" approach of splitting components if they got
    too large, and interfacing components with defined APIs and allowed
    dependencies. this keeps things maintainable, but does little to constrain
    size (actually, abstract APIs probably make this issue worse, FWIW).


    >
    >> another observation:
    >> the ratio between C and headers is notably different than in Quake2,
    >> where
    >> Q2 has an H/C ratio of 0.14, but in my code it is closer to 0.4 or so...

    >
    > never measured this one. My first thought was I'd expect this be about
    > 1.0.
    > A quick look at some code and... yes around 1.0. Why do you have so
    > few header files! And what the hell does Q2 put in its C files! Do
    > they use a one file per function coding standard?
    >


    well, I have header files for what I have them for, mostly structs and
    prototypes.
    I tend to use tools to mine prototypes, mostly because IMO having to go copy
    prototypes to the headers is a hassle.


    Q2 tends to have lots of big hairy functions (where in many cases 50-200
    lines will be used by single functions), and seems to be a little lax about
    which prototypes actually make it to the headers.

    granted, there are a few of these in my codebase, but these tend to be a
    strong minority.


    well, that, and the whole engine has lots of bit twiddling and fairly nasty
    use of pointer aritmetic.

    or, at least by my standards where something like:
    "if(*(float *)((char *)(&array_of_structs[index])+offset) == foo) ..."
    is, IMO, nasty...


    >> any thoughts about the ration of headers and C code, or any "interesting"
    >> implications from this property?...
    >> (well, beyond my usage of some automatic header-writing tools, preference
    >> for small-ish functions, ...).
    >>
    >> just wondering is all...

    >
    > I tend to use a one header and c file per "module" (or class if its C+
    > +). Why would you have multiple C files to one header file? I suppose
    > a module could require multiple C files to implement it that all
    > communicated by one header file. Sounds like such a module needs
    > breaking up.
    >


    ok, both my project and Q2 tend to use more centralized headers with all
    declarations from a "component" (where, in my terminology: component =
    'library' = 'mass of code compiled into a single DLL').

    basically, both my project and Q2 have the practice of creating a single
    header which is included by all source files in a given library (and is
    often the only header included).


    but, actually, it was a measure of C loc vs H loc, FWIW...
    (I have not been counting files here).

    modifying line-counter to also count files, codebase totals:
    1826 C files; 1795 H files.

    interesting...

    calculating: average C file loc is 558.4, average H file loc is 297.6.

    Q2 only:
    189 C files, 78 H files.
    average C file loc is 821 loc, average H file loc is 278.


    my codebase tends to, thus, have a much bigger portion of content in its
    headers than in Q2.

    IOW: my headers contain, on average, 2.86x more stuff per C kloc than Q2's C
    kloc.

    then again, as noted elsewhere:
    tools mine prototypes, and also pretty much all structs, constant
    declarations, ... go here.

    but, there are lots of possible considerations here...
    BGB / cr88192, Jan 5, 2010
    #8
  9. "Michael Foukarakis" <> wrote in message
    news:...
    On Jan 5, 11:10 am, "BGB / cr88192" <> wrote:
    > hmm, an actual question of sorts...


    I'll try and give you an answer, sort of... :)

    > I recently noticed in my project, which is primarily C based, a few bits
    > of
    > trivia:
    > it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);
    > 610.5 kloc of this go into a VM project (dynamic C compiler + garbage
    > collector + ...).


    <--
    600 kLOC for a C VM is too much. I've actually implemented a (large)
    subset of a C compiler + an interpreter + garbage collector in under
    4k LOC, so I know the magnitude of the problem. Perhaps you need a lot
    of code refactoring, which is expected in one man projects.
    -->

    I am not sure how that is possible, assuming writing things in C...


    <--
    Then again, it occurs to me you might not be using the correct tools
    for the job. I'd be interested to hear more details regarding your
    project, actually.
    -->

    well, most stuff is hand-written C, but a few minor things are tool-written
    (as far as C goes, this is not likely to contribute much to overall LOC).

    most tasks like parsing, ... use hand-written recursive-descent parsing.

    the VM project includes a number of components, and tends to compile the C
    into native code which is run at runtime.

    so, major components:
    a garbage collector;
    an assembler+linker (x86, x86-64);
    a big library giving a dynamic typesystem, P-OO and C/I OO stuff, ...;
    a codegen (RPNIL -> ASM, x86 / x86-64, supports: cdecl, stdcall, SysV/AMD64,
    and Win64 calling conventions);
    a C compiler frontend (C -> RPNIL);
    a Java-ByteCode interpreter;
    an x86 interpreter (simulates userspace-only, POSIX-derived process-model
    and core APIs);
    ....


    > about 483.5 kloc then is stuff related to 3D and misc (328.5 kloc of which
    > are my own code, 155 kloc coming from Quake2, errm, being essentially the
    > entire Q2 engine, where the code was combined mostly for my own
    > fiddling...).


    <--
    This part sounds fairly typical.
    -->

    yeah.
    my part also contains a 3D modeler and skeletal animation tool...


    > so, does something like this seem like there is a good deal of code bloat
    > going on?...


    >Definitely sounds bloated (at least the first part).


    > is it better to keep on going like this, or maybe look for parts to shave
    > off (even if one doesn't want to remove much?...).


    >Definitely try to refactor. Or even redesign.


    ok.

    > or do others just keep on going as always before?...


    <--
    Nope. I always review and reconsider my design, trying to identify
    problems, limitations, etc. Perhaps not on a regular chronological
    basis, but surely before the shit hits the fan - that would be when I
    find out $(wc -l *.c *.h) -eq OVER NINE THOUSAND. ;-)
    -->

    yeah.

    hmm... there was a standard tool for line-counting, and I was off having
    written my own version of this as well...


    > more so, are there other implications from a codebase having a continual
    > tendency to inflate?... (say, if ones' code tends to inflate at an average
    > rate of, say, 100-150 kloc/yr or so...).


    <--
    That all depends on the project really. There's no objectivity in pure
    LOC.
    -->

    yes, ok.


    > another observation:
    > the ratio between C and headers is notably different than in Quake2, where
    > Q2 has an H/C ratio of 0.14, but in my code it is closer to 0.4 or so...
    > any thoughts about the ration of headers and C code, or any "interesting"
    > implications from this property?


    <--
    Well, such an increased ratio might mean that there's lots of macros
    or functions etc. in those .h files that can be reused. Which is
    somewhat contradictory to the LOC count for your project (which we
    don't know what it's about, btw. Care to give a hint?).
    -->

    macros, not that many...
    I don't like excessive macro use.

    not so many functions either, though I do make some use of "static inlines"
    in a few places.

    also possible is that I have some amount of functions which are
    "one-liners":

    void libTypeFoo(libType *obj)
    { if(obj->iface->foo) obj->iface->foo(obj); }

    mostly as I really don't like code directly messing around with struct
    internals if they don't actually own the struct.

    ....


    actually, the main project is mostly 3D stuff, and random misc stuff...

    it was partly merged with Q2 as part of a test, where I had noted that Q2
    does lots of things my project has not mastered, and so the partial reason
    for merging the codebases was to "see what happens".


    otherwise, the project doesn't have a whole lot of a particular purpose,
    mostly just myself and idle coding I guess...
    BGB / cr88192, Jan 5, 2010
    #9
  10. BGB / cr88192

    bartc Guest

    "BGB / cr88192" <> wrote in message
    news:hi05qg$jjq$...
    >
    > "Michael Foukarakis" <> wrote in message
    > news:...
    > On Jan 5, 11:10 am, "BGB / cr88192" <> wrote:
    >> hmm, an actual question of sorts...

    >
    > I'll try and give you an answer, sort of... :)
    >
    >> I recently noticed in my project, which is primarily C based, a few bits
    >> of
    >> trivia:
    >> it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);
    >> 610.5 kloc of this go into a VM project (dynamic C compiler + garbage
    >> collector + ...).

    >
    > <--
    > 600 kLOC for a C VM is too much. I've actually implemented a (large)
    > subset of a C compiler + an interpreter + garbage collector in under
    > 4k LOC, so I know the magnitude of the problem. Perhaps you need a lot
    > of code refactoring, which is expected in one man projects.
    > -->
    >
    > I am not sure how that is possible, assuming writing things in C...


    That does sound tight. I would have reckoned on several 10Kloc of
    C/C-equivalent code for such a project. Perhaps he missed out a zero.

    > the VM project includes a number of components, and tends to compile the C
    > into native code which is run at runtime.
    >
    > so, major components:
    > a garbage collector;
    > an assembler+linker (x86, x86-64);
    > a big library giving a dynamic typesystem, P-OO and C/I OO stuff, ...;
    > a codegen (RPNIL -> ASM, x86 / x86-64, supports: cdecl, stdcall,
    > SysV/AMD64, and Win64 calling conventions);
    > a C compiler frontend (C -> RPNIL);
    > a Java-ByteCode interpreter;
    > an x86 interpreter (simulates userspace-only, POSIX-derived process-model
    > and core APIs);
    > ...


    This doesn't sound like a single executable. So perhaps you shouldn't just
    combine all the Loc for each.

    --
    Bartc
    bartc, Jan 5, 2010
    #10
  11. "bartc" <> wrote in message
    news:l9N0n.22409$...
    >
    > "BGB / cr88192" <> wrote in message
    > news:hi05qg$jjq$...
    >>
    >> "Michael Foukarakis" <> wrote in message
    >> news:...
    >> On Jan 5, 11:10 am, "BGB / cr88192" <> wrote:
    >>> hmm, an actual question of sorts...

    >>
    >> I'll try and give you an answer, sort of... :)
    >>
    >>> I recently noticed in my project, which is primarily C based, a few bits
    >>> of
    >>> trivia:
    >>> it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);
    >>> 610.5 kloc of this go into a VM project (dynamic C compiler + garbage
    >>> collector + ...).

    >>
    >> <--
    >> 600 kLOC for a C VM is too much. I've actually implemented a (large)
    >> subset of a C compiler + an interpreter + garbage collector in under
    >> 4k LOC, so I know the magnitude of the problem. Perhaps you need a lot
    >> of code refactoring, which is expected in one man projects.
    >> -->
    >>
    >> I am not sure how that is possible, assuming writing things in C...

    >
    > That does sound tight. I would have reckoned on several 10Kloc of
    > C/C-equivalent code for such a project. Perhaps he missed out a zero.
    >


    yep.


    >> the VM project includes a number of components, and tends to compile the
    >> C into native code which is run at runtime.
    >>
    >> so, major components:
    >> a garbage collector;
    >> an assembler+linker (x86, x86-64);
    >> a big library giving a dynamic typesystem, P-OO and C/I OO stuff, ...;
    >> a codegen (RPNIL -> ASM, x86 / x86-64, supports: cdecl, stdcall,
    >> SysV/AMD64, and Win64 calling conventions);
    >> a C compiler frontend (C -> RPNIL);
    >> a Java-ByteCode interpreter;
    >> an x86 interpreter (simulates userspace-only, POSIX-derived process-model
    >> and core APIs);
    >> ...

    >
    > This doesn't sound like a single executable. So perhaps you shouldn't just
    > combine all the Loc for each.
    >


    actually, it is all a bit "fuzzy"...

    there is no clear line of demarcation which components will go into which
    exact EXE's.

    (think of it more like DirectX, with piles of semi-independent DLL's, some
    loosely coupled via "interfaces", others fully independent, and others fully
    dependent, ...).


    all of these are compiled into DLL's, some or all of which may be used by a
    "front-end", but may be used by any number of frontends...


    there are actually around 45 EXE's, many of which are component-specific
    tests (such as to verify that OO stuff still works, ...) or tools. a few of
    which are app's (3D engine, mesh-modeler, skeletal tool, Q2 frontend).

    counting here, there are 17 DLL's (2 of which belong to Q2's renderer). 8
    belong to my VM, 7 to my 3D engine.


    Q2 is in total a small minority of the code, and currently uses a few of the
    other components, and is at present not really used by anything (since much
    of Q2's code is hardly well-organized or generic), and also because the
    "merge" was fairly recent and not really intended to be a "permanent"
    solution (partly due to Q2 using GPL, and otherwise there is little escape
    from this, but my VM is being kept "clean", mostly as I am migrating the
    thing to Public Domain, vs before where it was LGPL).

    ....
    BGB / cr88192, Jan 5, 2010
    #11
  12. BGB / cr88192

    Stefan Ram Guest

    "BGB / cr88192" <> writes:
    >it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);


    The number of lines of code (globally) should not be relevant,
    when code is well organized. When code is well organized, size
    disappears.

    For example, when I write

    #include <stdio.h>
    int main( void ){ printf( "hello, world\n" ); }

    , how many lines are this?

    Two?

    But the I/O library used might also have been written in C.

    Why do you not take the number of lines of the I/O library
    into account?

    Because they are well hidden behind an interface.

    So, when a project is well organized nearly everything is
    well hidden behind an interface, and you never see 1Mloc,
    you see only about what fits on your screen or less.

    You organize you project so that you never see 1 Mlocs.

    A line count, such as »1 Mloc« is nearly meaningless, because
    it is arbitrary what you take into account for this and what
    not. You might name a part of your project »the graph library«
    (wie 189,124 locs) and consider it to be a »separate project«,
    and suddenly, there are 189,124 locs less in you project. So
    such counts bear little information.

    A master once came to an interview and was asked »Do you have
    any experience with large software projects?«. He immediatly
    answered »No.« and he was not given the job. What they didn't
    know is: He organized every project in such a way, that to him
    it appeared small. Thus, whatever he did, he never did see a
    »large« project.
    Stefan Ram, Jan 5, 2010
    #12
  13. BGB / cr88192

    Flash Gordon Guest

    BGB / cr88192 wrote:
    > "Nick Keighley" <> wrote in message
    > news:...
    >> On 5 Jan, 09:10, "BGB / cr88192" <> wrote:


    <snip>

    >>> or maybe look for parts to shave
    >>> off (even if one doesn't want to remove much?...).

    >> are you talking about removing duplicate code or dead code [good] or
    >> removing little used functionality [ok] or actually reducing
    >> functionality [sounds bad]. Do you have users? Do they want
    >> functionality removed (in my experience- hardly ever!). Do they think
    >> its too big?

    >
    > I could do all of these...
    >
    > I suspect I may get around 50-100 kloc from dead code removal (or, more
    > like, dead component removal, nevermind smaller code).


    I would start off by doing this. It will mean you have that much less
    code to look at for the next stage...

    The next stage for me would be to identify if there are redundant
    features which were either implemented but never used or implemented and
    used, but have since been superseded and are no longer used (or with
    minimal work could be no longer used). That should get rid of a load
    more code in my opinion.

    > removing some little-used stuff is also possible.


    <snip>

    Then you could look at this, with the other eye on whether making *more*
    use of it else were would be an even better saving!

    That should shrink the code base and make it easier to refactor and
    improve it further.

    At each stage, getting rid of code will make the next stage easier since
    you have less to look at. Then repeat the exercise from the beginning,
    since through all these stages you make make more code in to dead code!
    Also you might make more features obsolete allowing them to be deleted!
    Keep repeating until it does not give you enough benefit to be worth the
    effort of going further.
    --
    Flash Gordon
    Flash Gordon, Jan 6, 2010
    #13
  14. On 5 Jan, 19:30, "BGB / cr88192" <> wrote:
    > "Nick Keighley" <> wrote in message
    > news:...
    > > On 5 Jan, 09:10, "BGB / cr88192" <> wrote:



    > >> I recently noticed in my project, which is primarily C based, a few bits
    > >> of trivia:
    > >> it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);
    > >> 610.5 kloc of this go into a VM project (dynamic C compiler + garbage
    > >> collector + ...).

    >
    > >> so, ~ 1.094 Mloc of C.


    <snip>

    > >> so, does something like this seem like there is a good deal of code bloat
    > >> going on?...

    >
    > > how could we tell? Maybe you've got a MLOC of functionality in there.
    > > I suspect with such a large code base there's *some* fat but without
    > > looking its just a guess. What do you think? Have you tried
    > > refactoring it?

    >
    > I used to do that some, usually trying to trim down the project
    > occasionally.
    > however, I have not done that in a while, and I suspect there is a bit of a
    > pile-up.


    then there almost certainly is some fat (unless you're a really clever
    coder!)

    > the present state thus being a little over 1 Mloc of code, and around 5GB of
    > files (the vast majority being PNG's).
    >
    > last time I did a major clean up was back when my project was in the ~ 300
    > kloc range, was I think a few years ago.
    >
    > >> is it better to keep on going like this,

    >
    > > like what? What's wrong with what you are doing? Is it hard to modify
    > > or debug? Is there a lot of duplicate code?

    >
    > I meant, just keep going on coding stuff, as the codebase gets ever larger.


    I don't really understand the question. Is your code base causing you
    a problem? You are presumably adding code for a reason (unless you
    just like coding) so, er, what was the question again?

    > >> or maybe look for parts to shave
    > >> off (even if one doesn't want to remove much?...).

    >
    > > are you talking about removing duplicate code or dead code [good] or
    > > removing little used functionality [ok] or actually reducing
    > > functionality [sounds bad]. Do you have users? Do they want
    > > functionality removed (in my experience- hardly ever!). Do they think
    > > its too big?

    >
    > I could do all of these...
    >
    > I suspect I may get around 50-100 kloc from dead code removal (or, more
    > like, dead component removal, nevermind smaller code).


    removing real dead code seems a good idea. If-it-might-com-in-useful-
    one-day (vanishingly unlikely really) then you can get it out of your
    repository.

    > removing some little-used stuff is also possible.


    well that's your call. Little used might be very useful when you need
    it! The US's SDI system was only supposed to be used once (or less)...

    Error handling often doesn't get used very often.

    > removing actually-used components is possible, but I guess I will probably
    > not do this.
    >
    > >> or do others just keep on going as always before?...

    >
    > > I'm a firm believer (in theory if not in practice!) that there comes a
    > > point where things should be re-written if they get too creaky.

    >
    > > You should take a look at refactoring...

    >
    > yeah, I may need to consider this.
    >
    > >> more so, are there other implications from a codebase having a continual
    > >> tendency to inflate?... (say, if ones' code tends to inflate at an
    > >> average rate of, say, 100-150 kloc/yr or so...).

    >
    > > are you adding functionality? Do you refactor?

    >
    > adding functionality is done a lot;
    > refactoring, not so much.


    I don't see the code increasing in size as inherently wrong. I'm more
    interested in *why* it is growing.


    > I had often used the "cellular" approach of splitting components if they got
    > too large, and interfacing components with defined APIs and allowed
    > dependencies.


    all sounds good (though the "cellular" term seems a little odd)


    > this keeps things maintainable, but does little to constrain
    > size


    really? I thought it would help

    > (actually, abstract APIs probably make this issue worse, FWIW).


    oh? I'd have thought good abstractions would keep the code size down.

    > >> another observation:
    > >> the ratio between C and headers is notably different than in Quake2,
    > >> where
    > >> Q2 has an H/C ratio of 0.14, but in my code it is closer to 0.4 or so...


    I misunderstood you here. You were talking header kloc and c file
    kloc. I'd expect there to be a big difference but I've never done any
    measurements (never really cared taht much to be honest!)

    <snip>

    > Q2 tends to have lots of big hairy functions (where in many cases 50-200
    > lines will be used by single functions), and seems to be a little lax about
    > which prototypes actually make it to the headers.
    >
    > granted, there are a few of these in my codebase, but these tend to be a
    > strong minority.
    >
    > well, that, and the whole engine has lots of bit twiddling and fairly nasty
    > use of pointer aritmetic.
    >
    > or, at least by my standards where something like:
    > "if(*(float *)((char *)(&array_of_structs[index])+offset) == foo) ..."
    > is, IMO, nasty...


    fairly nasty. But does it really matter? If you need to do stuff like
    that isolate it in a function/module/macro and document it. The rest
    of the code doesn't care how nasty it is.

    <snip>
    Nick Keighley, Jan 6, 2010
    #14
  15. "Nick Keighley" <> wrote in message
    news:...
    > On 5 Jan, 19:30, "BGB / cr88192" <> wrote:
    >> "Nick Keighley" <> wrote in message
    >> news:...
    >> > On 5 Jan, 09:10, "BGB / cr88192" <> wrote:

    >
    >
    >> >> I recently noticed in my project, which is primarily C based, a few
    >> >> bits
    >> >> of trivia:
    >> >> it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);
    >> >> 610.5 kloc of this go into a VM project (dynamic C compiler + garbage
    >> >> collector + ...).

    >>
    >> >> so, ~ 1.094 Mloc of C.

    >
    > <snip>
    >
    >> >> so, does something like this seem like there is a good deal of code
    >> >> bloat
    >> >> going on?...

    >>
    >> > how could we tell? Maybe you've got a MLOC of functionality in there.
    >> > I suspect with such a large code base there's *some* fat but without
    >> > looking its just a guess. What do you think? Have you tried
    >> > refactoring it?

    >>
    >> I used to do that some, usually trying to trim down the project
    >> occasionally.
    >> however, I have not done that in a while, and I suspect there is a bit of
    >> a
    >> pile-up.

    >
    > then there almost certainly is some fat (unless you're a really clever
    > coder!)
    >


    yeah.
    well, I think there is as well, so no issue there...


    >> > like what? What's wrong with what you are doing? Is it hard to modify
    >> > or debug? Is there a lot of duplicate code?

    >>
    >> I meant, just keep going on coding stuff, as the codebase gets ever
    >> larger.

    >
    > I don't really understand the question. Is your code base causing you
    > a problem? You are presumably adding code for a reason (unless you
    > just like coding) so, er, what was the question again?
    >


    well, usually my reason for adding code is not to deal with problems...

    more often, it is adding code to add "features"...

    sometimes though, the features turn out to be almost entirely useless.

    I once added a partial "precise" mode to my (otherwise conservative) GC, but
    ended up not using it (because precise GC makes using it a lot more
    difficult for not much gain). later on, this is no longer part of the public
    API. internally, some of the code lingers on, and a cleanup of this
    component would likely eliminate this.

    the GC also does ref-counting, which is another one of those "rarely used"
    features, mostly because ref-counting is one of those "all or nothing"
    features, meaning that any code which uses the feature has to be entirely
    ref-counting safe.


    then, elsewhere, there is another precise GC, which was written because I
    realized that it was kind of pointless to have a precise GC on the same heap
    as a conservative one if they end up essentially "splitting the world in
    half" anyways.

    in this case, I had intended this other GC mostly for a particular use, but
    ended up using a different MM strategy for that code instead: allocating
    lots of memory in a temporary heap, and then destroying this heap when done.
    this was done as an attempt to improve both stability and performance.

    though not very generic, it works fairly well for code which produces 10s of
    MB of stuff in a fraction of a second but never needs to refer to it again
    after the task completes. it works much better than using a generic GC for
    this task, since this usage pattern is essentially "abusive" to the GC
    (tending to cause misbehavior and poor performance).

    and, it is all this fuss over a single set of features.


    >>
    >> I could do all of these...
    >>
    >> I suspect I may get around 50-100 kloc from dead code removal (or, more
    >> like, dead component removal, nevermind smaller code).

    >
    > removing real dead code seems a good idea. If-it-might-com-in-useful-
    > one-day (vanishingly unlikely really) then you can get it out of your
    > repository.
    >


    well, this has happened sometimes, but it often takes a while for features
    to "become" useful.

    but, yeah, a good start is removing dead components and subsystems, a few of
    which I know to exist.
    (particularly related to my codegen).

    although, I had made a new experimental codegen with a nifty register
    allocator which was never really migrated back into the main codegen (mostly
    as the new codegen worked a bit different, and my old codegen is a big
    tangled mess that has been hacked on a lot).

    it is just that the old codegen has proven a bit difficult to replace.


    >> removing some little-used stuff is also possible.

    >
    > well that's your call. Little used might be very useful when you need
    > it! The US's SDI system was only supposed to be used once (or less)...
    >
    > Error handling often doesn't get used very often.
    >


    ok.


    >>
    >> > are you adding functionality? Do you refactor?

    >>
    >> adding functionality is done a lot;
    >> refactoring, not so much.

    >
    > I don't see the code increasing in size as inherently wrong. I'm more
    > interested in *why* it is growing.
    >


    partly as a result of adding features;
    partly as a result of "generalizing" things;
    ....


    >
    >> I had often used the "cellular" approach of splitting components if they
    >> got
    >> too large, and interfacing components with defined APIs and allowed
    >> dependencies.

    >
    > all sounds good (though the "cellular" term seems a little odd)
    >


    well, because originally, one has a lot of code which is in a single
    component.
    so, all the code shares the same directory, makefile, and naming prefixes,
    ....

    but, then it gets too large, and is split.

    then, often, the library name prefix gets changed, as well as the sub-parts
    being moved into new directories, ...

    say:
    FOO_SubSysA_...
    FOO_SusSysB_...
    FOO_SusSysC_...
    FOO_SusSysD_...
    FOO_SusSysE_...
    FOO_SusSysF_...

    splits into:
    FOO_SubSysA_...
    FOO_SusSysC_...
    FOO_SusSysE_...
    FOO_SusSysF_...

    BAR_SusSysB_...
    BAR_SusSysD_...

    then, maybe some internal patch-up is done to compensate for the change in
    naming, ...
    (this is often done via 'sed' or search/replace).


    so, it a way, it is sort of like mitosis or similar...


    >
    >> this keeps things maintainable, but does little to constrain
    >> size

    >
    > really? I thought it would help
    >


    component splitting very often causes there to be sub-components which are
    larger than the original singular component.

    this is usually the result of "abstracting" one component from another,
    which often adds new code in the form of abstract API wrappers, ...


    >> (actually, abstract APIs probably make this issue worse, FWIW).

    >
    > oh? I'd have thought good abstractions would keep the code size down.
    >


    on the larger scale, probably.
    but at the small scale, it adds a bunch of function calls which often do
    little more than redirect to other functions.

    in a few components, this can end up being a significant part of the overall
    code size (in particular, in one of the larger components, which consists
    almost entirely of exported APIs and relatively little internal logic code).

    most other components contain more of a balance, with most of the size being
    due to logic-code, and a smaller amount of wrapper code usually serving to
    serve as an interface to the outside world.


    >> >> another observation:
    >> >> the ratio between C and headers is notably different than in Quake2,
    >> >> where
    >> >> Q2 has an H/C ratio of 0.14, but in my code it is closer to 0.4 or
    >> >> so...

    >
    > I misunderstood you here. You were talking header kloc and c file
    > kloc. I'd expect there to be a big difference but I've never done any
    > measurements (never really cared taht much to be honest!)
    >


    yep.

    well, I measured a lot of stuff, some of which I didn't bother to mention.


    >> or, at least by my standards where something like:
    >> "if(*(float *)((char *)(&array_of_structs[index])+offset) == foo) ..."
    >> is, IMO, nasty...

    >
    > fairly nasty. But does it really matter? If you need to do stuff like
    > that isolate it in a function/module/macro and document it. The rest
    > of the code doesn't care how nasty it is.
    >


    yep, I usually avoid doing stuff like this personally, or if it is done, it
    is wrapped up somewhat...
    Q2 does things like this often as a matter of common practice.

    as well as the good old trick of reading raw data from a file, casting it to
    a struct pointer, and just using this structure directly (although often
    with either endianess-swap functions, or a pre-pass to go over the read-in
    file and pre-swap all the values if needed).


    I often use more explicit read/write value operations, such as reading a
    datum at a time.

    Foo_Vertex *Foo_ReadVertex(VFILE *fd)
    {
    Foo_Vertex *tmp;
    tmp=Foo_AllocVertex();
    tmp->x=Foo_ReadFloat(fd);
    tmp->y=Foo_ReadFloat(fd);
    tmp->z=Foo_ReadFloat(fd);
    return(tmp);
    }

    Foo_Triangle *Foo_ReadTriangle(VFILE *fd)
    {
    Foo_Triangle *tmp;
    tmp=Foo_AllocTriangle();
    tmp->v0=Foo_ReadVertex(fd);
    tmp->v1=Foo_ReadVertex(fd);
    tmp->v2=Foo_ReadVertex(fd);
    return(tmp);
    }

    ....
    BGB / cr88192, Jan 6, 2010
    #15
  16. "Nick" <> wrote in message
    news:...
    > "BGB / cr88192" <> writes:
    >
    >> hmm, an actual question of sorts...
    >>
    >> I recently noticed in my project, which is primarily C based, a few bits
    >> of
    >> trivia:
    >> it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);
    >> 610.5 kloc of this go into a VM project (dynamic C compiler + garbage
    >> collector + ...).
    >>
    >> so, ~ 1.094 Mloc of C.

    >
    > That does seem a heck of a lot. The website in my sig is driven by
    > 36klock of my own, and a total of 270kloc (but that includes a wiki
    > renderer, an HTML templating system, the SQLite database engine and a
    > pile of other bits).
    >
    > That implements a scripting language to do the boring bits, and the
    > data structures to allow that scripting language to do clever things
    > (like compute the best path through the graph depending on current
    > preferences).
    >
    > Four times as much seems a lot for /anything/.
    >
    > I know I've a fair chunk of dead code in there as well - it still
    > supports legacy data formats and other databases.
    >


    yep...

    well, the codebase tends to be fairly large and, per scale, not do quite as
    much as some other smaller codebases I have seen...


    >> another observation:
    >> the ratio between C and headers is notably different than in Quake2,
    >> where
    >> Q2 has an H/C ratio of 0.14, but in my code it is closer to 0.4 or so...

    >
    > Is that lines in *.c to lines in *.h, or number of .c files and number
    > of .h files?


    C vs H loc...

    but, then I can also note that I do a few "unusual" things with headers
    which tend to bloat the total loc in this case (many headers exist which are
    tool-written, but not nearly so much C is tool-written, ...).


    > --
    > Online waterways route planner | http://canalplan.eu
    > Plan trips, see photos, check facilities | http://canalplan.org.uk
    BGB / cr88192, Jan 6, 2010
    #16
  17. "Stefan Ram" <-berlin.de> wrote in message
    news:-berlin.de...
    > "BGB / cr88192" <> writes:
    >>it is in excess of 1Mloc (1 Mloc = 1,000,000 lines of code);

    >
    > The number of lines of code (globally) should not be relevant,
    > when code is well organized. When code is well organized, size
    > disappears.
    >
    > For example, when I write
    >
    > #include <stdio.h>
    > int main( void ){ printf( "hello, world\n" ); }
    >
    > , how many lines are this?
    >
    > Two?
    >


    for this file, yes.
    if this is the only C file in the the project, then this is the case.


    > But the I/O library used might also have been written in C.
    >
    > Why do you not take the number of lines of the I/O library
    > into account?
    >


    well, if you mean the IO library in the C runtime (such as MSVCRT), no, this
    is not included in my case.
    granted, I have a VFS system, which is included (since it is part of my
    overall codebase).

    my x86 interpreter also includes a C library, which is included in the total
    count, but is built specifically for the virtualized environment with in the
    interpreter (this is about 13 kloc for the C-RTL, and 5 kloc for syscall
    wrappers, ...).


    > Because they are well hidden behind an interface.
    >
    > So, when a project is well organized nearly everything is
    > well hidden behind an interface, and you never see 1Mloc,
    > you see only about what fits on your screen or less.
    >


    granted.

    well, interfaces keep things managable, but all the code does still exist
    (even if most of it largely just "does something" and is not messed with
    often much beyond this...).

    part of what made me more aware of the mass amounts of code, was going and
    trying to start documenting some of my stuff...


    > You organize you project so that you never see 1 Mlocs.
    >
    > A line count, such as »1 Mloc« is nearly meaningless, because
    > it is arbitrary what you take into account for this and what
    > not. You might name a part of your project »the graph library«
    > (wie 189,124 locs) and consider it to be a »separate project«,
    > and suddenly, there are 189,124 locs less in you project. So
    > such counts bear little information.
    >


    ok.

    this is a thought...
    I tend to include pretty much everything that is not system libraries, and
    tend to minimize use of 3rd-party libs and code (mostly due to the
    possibility of awkward dependency issues, ...).


    > A master once came to an interview and was asked »Do you have
    > any experience with large software projects?«. He immediatly
    > answered »No.« and he was not given the job. What they didn't
    > know is: He organized every project in such a way, that to him
    > it appeared small. Thus, whatever he did, he never did see a
    > »large« project.
    >


    interesting...


    well, I guess it can be noted then that, code seems "bigger" when it is less
    cleanly written.

    after all, the Q2 code would "seem" a lot bigger when one tries to work on
    it, mostly because the code is so tangled and nasty (doing one thing in one
    place often leads to hunting for bugs appearing somewhere else, often as a
    result of subtle bit-twiddling, where, FWIW, Q2 has explicit bit-twiddling
    that is done on a project-wide scale).


    my codebase is much larger, but working on most things is a good deal less
    effort, I guess because I do tend to modularize and abstract things bit
    more...

    I also guess I tend to make much more heavy use of ASCII-based data
    serialization as well (most serialized complex data tends to be ASCII-based,
    ....).

    and, personally, I tend to have an aversion to passing structs between
    components, as I have ran into a lot of troubles here in the past, whereas
    Q2 passes structs all over the place (and likes to assume that structs in
    one place are the same as in other, and as in their file formats, really
    likes arrays of structs, ...).

    oddly, I almost always use arrays of struct pointers, and almost never
    arrays of raw structs, whereas Q2 uses lots of arrays of raw structs, ...
    BGB / cr88192, Jan 6, 2010
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Salvador I. Ducros

    STL & reducing code bloat

    Salvador I. Ducros, Jul 28, 2003, in forum: C++
    Replies:
    5
    Views:
    1,578
    Howard Hinnant
    Aug 5, 2003
  2. RainBow
    Replies:
    6
    Views:
    362
  3. Steven T. Hatton

    Exceptions and object code bloat

    Steven T. Hatton, Nov 27, 2006, in forum: C++
    Replies:
    5
    Views:
    361
    =?iso-8859-1?q?Kirit_S=E6lensminde?=
    Nov 28, 2006
  4. Tony
    Replies:
    48
    Views:
    1,146
    Mathias Gaunard
    Dec 12, 2006
  5. Lawrence D'Oliveiro

    Managed-Code Bloat

    Lawrence D'Oliveiro, Jun 6, 2011, in forum: Java
    Replies:
    37
    Views:
    1,018
Loading...

Share This Page