statistic of integer values

Discussion in 'C Programming' started by Stefan Wallentowitz, Dec 10, 2005.

  1. Hello together!

    I'm searching for a statistic about the value ranges of integers in a set
    of "standard" c-programs.
    I'm specifically interested in the avarage ratio of assignments etc. of
    integers where the value is either of (un)signed 16 bit or 32 bit size.
    Has anybody ever been in contact with something similar?

    Bye,
    Stefan
    Stefan Wallentowitz, Dec 10, 2005
    #1
    1. Advertising

  2. Stefan Wallentowitz

    Malcolm Guest

    "Stefan Wallentowitz" <> wrote
    >
    > I'm searching for a statistic about the value ranges of integers in a set
    > of "standard" c-programs.
    > I'm specifically interested in the avarage ratio of assignments etc. of
    > integers where the value is either of (un)signed 16 bit or 32 bit size.
    > Has anybody ever been in contact with something similar?
    >

    No. It will be rather tricky to define exactly what you mean.

    For instance, a loop in which a variable, i, takes range between 0 and N-1,
    is an extremely common construct. However the counting variable usually
    isn't held for more than a few machine cycles. N, on the other hand, may
    well keep its value for the life of the program.

    However if the loop is in a subroutine, variable N may be passed to that
    routine, copied from a "Nemployees" member of a data structure. So does N
    now count as one or two integers?

    Also, because of C semantics, i will actually be set to N at the end of the
    for loop. Probably, however, i will never be accessed outside of the loop.
    So has i taken the value N, or not?
    Malcolm, Dec 10, 2005
    #2
    1. Advertising

  3. Am Sat, 10 Dec 2005 21:25:38 +0000 schrieb Malcolm:

    Hi!

    > No. It will be rather tricky to define exactly what you mean.


    Ok, I thought it might be a bit unexact.

    > Also, because of C semantics, i will actually be set to N at the end of
    > the for loop. Probably, however, i will never be accessed outside of the
    > loop. So has i taken the value N, or not?


    Yes, N is one of it. But the initialisation of i also. In fact I'm
    interested in the ratio of common integer ranges being loaded in
    registers. Several examples:
    a=1, a=b+4, a=0x0feff0c4 etc.
    It is related to a compiler optimization and it is interesting how much
    instructions are in average saved, e.g. when theres a high occurence-rate
    of 16 bit signed integers, the optimization brings more success. But some
    measurement data would be great.
    I also know that there is a machine dependency, but I envision some
    overall data from a more abstract view on c-code.

    Greets,
    Stefan
    Stefan Wallentowitz, Dec 10, 2005
    #3
  4. Stefan Wallentowitz

    Thad Smith Guest

    Stefan Wallentowitz wrote:

    > I'm searching for a statistic about the value ranges of integers in a set
    > of "standard" c-programs.


    You will probably have to generate your own.

    > I'm specifically interested in the avarage ratio of assignments etc. of
    > integers where the value is either of (un)signed 16 bit or 32 bit size.


    You need to be more precise about what you need.
    "ratio of assignments etc. of integers where the value is either of
    (un)signed 16 bit or 32 bit size."

    Does that mean you want to count the number of integer assignments
    executed in which the value has exactly 16 sigificant bits and exactly
    32 significant bits and compute their ratio?

    Do you want to know how many assignment statements are made to 16-bit
    vs. 32-bit integer types without regard to value and without regard to
    the number of times executed? Or something else?

    --
    Thad
    Thad Smith, Dec 10, 2005
    #4
  5. Stefan Wallentowitz

    Guest

    Stefan Wallentowitz wrote:
    > It is related to a compiler optimization and it is interesting how much
    > instructions are in average saved, e.g. when theres a high occurence-rate
    > of 16 bit signed integers, the optimization brings more success.
    >


    Not a good idea. The compiler should be consistent or the programmer
    will be surprised later on if the run-time code suddenly overflows on
    65536. If the compiler implements 16bit ints, then that's fine. If it
    is 32bit ints, then that's fine also. But if it is sometimes 32 and
    sometimes 16 that's not a good thing. An example of a compiler where
    this is configurable is the HITECH C compiler for PIC microcontrollers.
    The programmer gets to decide via compiler options or #pragma
    statements weather ints are 16 or 32 bits.
    , Dec 11, 2005
    #5
  6. In article <>,
    <> wrote:
    >Stefan Wallentowitz wrote:
    >> It is related to a compiler optimization and it is interesting how much
    >> instructions are in average saved, e.g. when theres a high occurence-rate
    >> of 16 bit signed integers, the optimization brings more success.



    >Not a good idea. The compiler should be consistent or the programmer
    >will be surprised later on if the run-time code suddenly overflows on
    >65536. If the compiler implements 16bit ints, then that's fine. If it
    >is 32bit ints, then that's fine also. But if it is sometimes 32 and
    >sometimes 16 that's not a good thing.


    Ummm, it looked to me like that OP wasn't trying to say anything like
    that.

    If I understand correctly, the question is about integral literals
    appearing in source code -- though it should probably instead be about
    integral literals appearing after pre-processing and constant folding.

    The question is about the relative rates at which 16 bit literals
    appear, compared to 32 bit literals. The point, if I understand
    correctly, is not about computing at those widths, but rather about
    the instructions involved in coding the literals into the machine
    code.

    There are architectures which offer several different ways of loading
    the same literal.

    For example, there are architectures which have (say) "Add Quick", an
    instruction which operates on registers of whatever size is specified
    in other parts of the instruction (e.g., ADDQ.W, ADDQ.L) -- but no
    matter what the register size, the value being added is restricted to a
    particular range, such as -128 to +127 . For example, ADDQ.L A3,#16
    might be used to increment address register A3 by 16. Such instructions
    could hypothetically crop up quite a bit in loops over arrays.

    The same architecture might have an ADDI.W and ADDI.L (Add a 16 bit
    constant to a 16 bit register; add a 16 bit constant to a 32 bit
    register) and might have a regular ADD instruction in which
    one of the source operand modes was "immediate 32 bit constant".

    When these instructions are available, they can result in smaller
    machine language, which is a benefit for fitting more into
    cache; they may also end up operating faster because of the
    reduced number of cycles required to read the constant from
    memory. There may also turn out to be interesting tradeoffs
    in such architectures, such as it potentially being faster to
    load a byte value in one instruction and sign-extending it to 32
    bits in a second instruction, rather than using a single
    instruction to load a 32 bit constant -- the dual instruction
    version might, for example, hypothetically pipeline better,
    or the "sign extend" instruction might be only a two byte-long
    instruction, with the "load quick" being 4 bytes long, for
    a total of 6 bytes where the architecture would need 8 bytes
    (4 for the instruction, 4 for the constant) to load a 32 bit
    constant.


    Suppose one is working on a compiler for an architecture
    that offers different instructions for the same basic task,
    with some of the variations being shorter or faster for particular
    ranges of literal constants, If one were doing that, then one
    might wonder whether it is worth the effort to put in a bunch
    of compiler optimization work to get the absolute best possible
    memory fit or absolute best possible instruction speed. Such work
    could get particularily messy if the less-memory variations
    turn out to take longer, as one would then have to analyze to see
    whether the surrounding code is such that it is better to
    optimize for space or time. In a tight loop that was likely to
    stay in cache for a while, the answer could depend on the exact
    size of the loop...


    If this is what the OP is trying to talk about, then I would
    make some suggestions:

    1) That possibly this kind of study was undertaken back when RISC
    theory was being developed;

    2) That not just the explicitly coded (or constant folded) literals
    are important, since there can be many implicit literals for
    pointer/address manipulation

    3) That it matters not only how often various sizes of literals appear,
    but also what kind of circumstances they appear in. There might
    be (hypothetically) several 32 bit constants per "for" loop
    (initialization, termination test, increment), but those would tend
    to get loaded into registers, and the literal that might be
    most crucial to the speed of the loop might be (say) the 4 in
    "add #4, %eax" where %eax is being used to index arrays. Or
    depending on the architecture and generated code, it might be the 2
    in the sequence INC R3; MOV R3, R2; SHL R2, #2; LOAD R4, $1BE00C(R2)
    i.e., increment, shift-level to get a byte offset, load from that
    byte offset relative to the contant location of a static array...
    --
    I was very young in those days, but I was also rather dim.
    -- Christopher Priest
    Walter Roberson, Dec 11, 2005
    #6
  7. Stefan Wallentowitz

    Guest

    Walter Roberson wrote:
    > In article <>,
    > <> wrote:
    > >Stefan Wallentowitz wrote:
    > >> It is related to a compiler optimization and it is interesting how much
    > >> instructions are in average saved, e.g. when theres a high occurence-rate
    > >> of 16 bit signed integers, the optimization brings more success.

    >
    >
    > >Not a good idea. The compiler should be consistent or the programmer
    > >will be surprised later on if the run-time code suddenly overflows on
    > >65536. If the compiler implements 16bit ints, then that's fine. If it
    > >is 32bit ints, then that's fine also. But if it is sometimes 32 and
    > >sometimes 16 that's not a good thing.

    >
    > Ummm, it looked to me like that OP wasn't trying to say anything like
    > that.
    >
    > If I understand correctly, the question is about integral literals
    > appearing in source code -- though it should probably instead be about
    > integral literals appearing after pre-processing and constant folding.
    >


    In this case there would be no need to ask about statistics. It is
    always safe to do this because the compiler know there is ONLY one
    value for each literal - not a range. This is implemented by almost all
    modern compilers. Not only that, if for example on a RISC machine you
    use the literal 1 or 0 the compiler would simply generate code to read
    the one and zero registers.
    , Dec 11, 2005
    #7
  8. In article <>,
    <> wrote:
    >Walter Roberson wrote:
    >> If I understand correctly, the question is about integral literals
    >> appearing in source code -- though it should probably instead be about
    >> integral literals appearing after pre-processing and constant folding.


    >In this case there would be no need to ask about statistics. It is
    >always safe to do this because the compiler know there is ONLY one
    >value for each literal - not a range.


    There is more than one meaning of "safe" that might apply here.

    There is the "safe" of "Is there a certainty that I can find
    a valid instruction to use for this particular literal, that the
    literal will not overflow the range allowed for the instruction?".
    I suspect that's the meaning of "safe" that you are using.

    There is another meaning, though, when it comes to generating
    instructions: "Is there a certainty that if I chose to generate
    this particular one of the instruction sequences that I could
    generate, that this choice will not result in worse performance
    than could be obtained with a different sequence?" And the answer
    to that is "It depends on the instruction set, upon the
    internal details of multiple dispatch or pipelining, and upon
    the cache characteristics of the surrounding code."


    Also, as I indicated earlier, the point might not be "can it
    safely be done", but rather, "Is it worth putting in the programming
    effort to do this potential optimization? Do small literals occur
    often enough that the savings would be appreciable? Is it worth
    the time to find good algorithms for the boundary cases such as
    the short load taking longer to execute but the extra cost being worth
    it if it allows a tight loop to fit in cache when it would not otherwise
    fit?" And that's where the statistics come in.

    You know the usual refrain about "premature optimization" -- well,
    it looks to me as if the OP is trying to get a feel for whether
    this kind of optimization is premature, or is worth while.
    --
    "law -- it's a commodity"
    -- Andrew Ryan (The Globe and Mail, 2005/11/26)
    Walter Roberson, Dec 11, 2005
    #8
  9. Am Sun, 11 Dec 2005 03:31:09 +0000 schrieb Walter Roberson:

    Hi Walter!

    Many thanks, you got what I was trying to point out.
    > If this is what the OP is trying to talk about, then I would
    > make some suggestions:

    Yes, thats what I had in mind. In fact it's dealing about designing a
    compiler for a simple RISC-architecture, where loading a 32bit value
    requires two instructions, but when having 16bit-fitting value it is
    possible utilizing only one instruction. It is quite a simple
    "optimization", but in a small presentation I wanted to describe its
    possible impact with such a study.

    > 1) That possibly this kind of study was undertaken back when RISC theory
    > was being developed;
    >
    > 2) That not just the explicitly coded (or constant folded) literals
    > are important, since there can be many implicit literals for
    > pointer/address manipulation

    In general this is true, but in my case, it is only literals being
    interested because addresses/pointers are not handled in this way.
    >
    > 3) That it matters not only how often various sizes of literals appear,
    > but also what kind of circumstances they appear in. There might be
    > (hypothetically) several 32 bit constants per "for" loop
    > (initialization, termination test, increment), but those would tend to
    > get loaded into registers, and the literal that might be most crucial to
    > the speed of the loop might be (say) the 4 in "add #4, %eax" where %eax
    > is being used to index arrays. Or depending on the architecture and
    > generated code, it might be the 2 in the sequence INC R3; MOV R3, R2;
    > SHL R2, #2; LOAD R4, $1BE00C(R2) i.e., increment, shift-level to get a
    > byte offset, load from that byte offset relative to the contant location
    > of a static array...

    Yes, I was thinking about some equivalent issues. In total perhaps someone
    did some work like profiling somewhat "reference code" with exact values
    (or ranges) of values being loaded. Doing it on my own for just a small
    chart doesn't make sense. But perhaps someone reading has ever been in
    contact with some related studies ans knows a source for raw data.

    Thanks so far,
    Stefan
    Stefan Wallentowitz, Dec 11, 2005
    #9
  10. Am Sat, 10 Dec 2005 23:35:03 -0800 schrieb :

    > In this case there would be no need to ask about statistics. It is
    > always safe to do this because the compiler know there is ONLY one
    > value for each literal - not a range. This is implemented by almost all
    > modern compilers. Not only that, if for example on a RISC machine you
    > use the literal 1 or 0 the compiler would simply generate code to read
    > the one and zero registers.


    Yes, but creating an own compiler leads me to this question regarding the
    impact of a quite simple optimization.
    I am searching for a statistic of how often a loaded integer (either in
    register or as part of an immediate instruction etc.) in whatever general
    is:
    - in 16-bit domain
    - above
    (might be extended to is zero, 8-bit and so on)

    Bye, Stefan
    Stefan Wallentowitz, Dec 11, 2005
    #10
  11. Am Sun, 11 Dec 2005 08:21:25 +0000 schrieb Walter Roberson:

    > Also, as I indicated earlier, the point might not be "can it
    > safely be done", but rather, "Is it worth putting in the programming
    > effort to do this potential optimization? Do small literals occur
    > often enough that the savings would be appreciable? Is it worth
    > the time to find good algorithms for the boundary cases such as
    > the short load taking longer to execute but the extra cost being worth
    > it if it allows a tight loop to fit in cache when it would not otherwise
    > fit?" And that's where the statistics come in.
    >
    > You know the usual refrain about "premature optimization" -- well,
    > it looks to me as if the OP is trying to get a feel for whether
    > this kind of optimization is premature, or is worth while.


    Exactly.
    I want to illustrate the possible impact of a simple optimization I made.
    In fact this optimization most oftens gains execution, but a quite simple
    value of optimization ratio reachid in such an easy way would be nice.
    It isn't something very scientific , just for getting a feeling of gain.

    Thanks for your elaborated statements,
    Stefan
    Stefan Wallentowitz, Dec 11, 2005
    #11
  12. Stefan Wallentowitz

    Guest

    Stefan Wallentowitz wrote:
    > Am Sat, 10 Dec 2005 23:35:03 -0800 schrieb :
    >
    > > In this case there would be no need to ask about statistics. It is
    > > always safe to do this because the compiler know there is ONLY one
    > > value for each literal - not a range. This is implemented by almost all
    > > modern compilers. Not only that, if for example on a RISC machine you
    > > use the literal 1 or 0 the compiler would simply generate code to read
    > > the one and zero registers.

    >
    > Yes, but creating an own compiler leads me to this question regarding the
    > impact of a quite simple optimization.
    > I am searching for a statistic of how often a loaded integer (either in
    > register or as part of an immediate instruction etc.) in whatever general
    > is:
    > - in 16-bit domain
    > - above
    > (might be extended to is zero, 8-bit and so on)
    >


    For what it's worth, I find the following values very common in code:

    0, 1, 0xFF

    Also, from experience, I find the majority of string literals to be
    below 100, usually in the range of 0 to 10. The value 1 is common
    enough that most archictectures have either increment/decrement
    instructions or a hardwired 1 register. The value 0 is common in a lot
    of data initialisations. The value 255 (0xFF) is very common in
    protocol handling code usually used for byte masking.
    , Dec 11, 2005
    #12
  13. Am Sun, 11 Dec 2005 08:20:37 -0800 schrieb :

    >
    > For what it's worth, I find the following values very common in code:
    >
    > 0, 1, 0xFF
    >
    > Also, from experience, I find the majority of string literals to be
    > below 100, usually in the range of 0 to 10. The value 1 is common
    > enough that most archictectures have either increment/decrement
    > instructions or a hardwired 1 register. The value 0 is common in a lot
    > of data initialisations. The value 255 (0xFF) is very common in
    > protocol handling code usually used for byte masking.


    FOr sure, but a number sais more than thousand words..
    Stefan Wallentowitz, Dec 11, 2005
    #13
  14. Stefan Wallentowitz

    Guest

    Stefan Wallentowitz wrote:
    > Am Sun, 11 Dec 2005 08:20:37 -0800 schrieb :
    >
    > >
    > > For what it's worth, I find the following values very common in code:
    > >
    > > 0, 1, 0xFF
    > >
    > > Also, from experience, I find the majority of string literals to be
    > > below 100, usually in the range of 0 to 10. The value 1 is common
    > > enough that most archictectures have either increment/decrement
    > > instructions or a hardwired 1 register. The value 0 is common in a lot
    > > of data initialisations. The value 255 (0xFF) is very common in
    > > protocol handling code usually used for byte masking.

    >
    > FOr sure, but a number sais more than thousand words..


    Download lots of source code, preferably large ones like the Linux
    kernel, Apache, Xfree86 and Mozilla. Then find all literals. I did the
    following on my company's source codes:

    grep -r -e "= *[0-9]" SONaccessGateway/ > stat.txt

    This gets all lines containing an equal sign followed by a number.
    Then, if you have Tcl installed, run the following script on the file
    above:

    ================== begin script
    #! /usr/bin/env tclsh
    if {$argc < 1} {
    set input stdin
    } else {
    set input [open $argv "r"]
    }
    array set numbers {}
    while {[eof $input] == 0} {
    set line [gets $input]
    # extract right hand side:
    set line [lindex [split $line "="] 1]
    if {$line != ""} {
    # extract first word:
    set line [split [string trim $line] " "]
    set n [lindex $line 0]
    # get and count the number:
    if {[string is integer -strict $n]} {
    set n [expr $n] ;# normalise the number
    if {[info exists numbers($n)]} {
    incr numbers($n)
    } else {
    set numbers($n) 1
    }
    }
    }
    }
    if {$input != "stdin"} {
    close $input
    }
    # now we are done processing, print out stats:
    puts "Literals found:"
    foreach n [lsort -integer [array names numbers]] {
    puts " $n - $numbers($n) times"
    }
    ===================== end script

    Save the script as sumstat.tcl. To run it, you can type the following:

    tclsh sumstat.tcl stat.txt

    where stat.txt is the file we generated earlier. The following is the
    statistics of my employer's source codes:

    Literals found:
    -774712364 - 7 times
    -505224220 - 7 times
    -235736076 - 7 times
    -60 - 1 times
    -1 - 3 times
    0 - 1039 times
    1 - 723 times
    2 - 119 times
    3 - 159 times
    4 - 159 times
    5 - 87 times
    6 - 10 times
    7 - 16 times
    8 - 15 times
    9 - 1 times
    10 - 17 times
    11 - 2 times
    12 - 41 times
    13 - 1 times
    14 - 4 times
    15 - 6 times
    16 - 20 times
    17 - 7 times
    20 - 172 times
    23 - 6 times
    24 - 2 times
    25 - 2 times
    27 - 1 times
    28 - 2 times
    30 - 10 times
    31 - 3 times
    32 - 29 times
    36 - 2 times
    38 - 2 times
    39 - 3 times
    40 - 5 times
    42 - 10 times
    44 - 2 times
    48 - 19 times
    50 - 6 times
    54 - 1 times
    56 - 5 times
    57 - 2 times
    60 - 21 times
    64 - 46 times
    67 - 2 times
    71 - 2 times
    78 - 2 times
    80 - 3 times
    85 - 1 times
    90 - 35 times
    96 - 3 times
    100 - 7 times
    112 - 3 times
    120 - 6 times
    127 - 4 times
    128 - 12 times
    144 - 3 times
    160 - 3 times
    164 - 1 times
    167 - 3 times
    168 - 4 times
    169 - 2 times
    176 - 3 times
    192 - 4 times
    200 - 9 times
    208 - 3 times
    210 - 6 times
    211 - 8 times
    212 - 8 times
    224 - 3 times
    226 - 1 times
    253 - 1 times
    255 - 7 times
    256 - 14 times
    271 - 1 times
    299 - 1 times
    300 - 1 times
    384 - 23 times
    400 - 3 times
    420 - 4 times
    493 - 10 times
    500 - 35 times
    513 - 1 times
    600 - 10 times
    636 - 1 times
    647 - 1 times
    648 - 4 times
    691 - 16 times
    730 - 1 times
    800 - 4 times
    802 - 1 times
    995 - 1 times
    1000 - 8 times
    1024 - 15 times
    1200 - 1 times
    1234 - 3 times
    1257 - 1 times
    1500 - 3 times
    1800 - 1 times
    2000 - 1 times
    2020 - 1 times
    2048 - 3 times
    2054 - 2 times
    2345 - 2 times
    3600 - 1 times
    4096 - 3 times
    5000 - 3 times
    10000 - 3 times
    16384 - 9 times
    28800 - 3 times
    32224 - 1 times
    32768 - 6 times
    36000 - 4 times
    65535 - 3 times
    86400 - 2 times
    90000 - 1 times
    123456 - 3 times
    1000000 - 2 times
    50331652 - 2 times
    67452301 - 1 times
    305441741 - 17 times
    592100561 - 17 times
    806429235 - 10 times
    823206451 - 10 times
    839983667 - 10 times
    883674386 - 17 times
    1073741824 - 2 times
    2147483647 - 2 times


    Hope this helps. Of course, I'm assuming you're on a platform that has
    grep and tclsh installed. If not, just use my results. But note that
    the statistics above covers C, C++ and Perl code. I was too lazy to
    separate them.
    , Dec 12, 2005
    #14
  15. Stefan Wallentowitz

    Guest

    wrote:
    > Stefan Wallentowitz wrote:
    > > FOr sure, but a number sais more than thousand words..

    > <snip>
    > Save the script as sumstat.tcl. To run it, you can type the following:
    >
    > tclsh sumstat.tcl stat.txt
    > <snip>
    > Hope this helps. Of course, I'm assuming you're on a platform that has
    > grep and tclsh installed. If not, just use my results. But note that
    > the statistics above covers C, C++ and Perl code. I was too lazy to
    > separate them.


    I've modified the sumstat.tcl script to further analyse results:

    ============================ begin sumstat.tcl script
    #! /usr/bin/env tclsh
    if {$argc < 1} {
    set input stdin
    } else {
    set input [open $argv "r"]
    }
    array set numbers {}
    while {[eof $input] == 0} {
    set line [gets $input]
    # extract words:
    set line [split $line " "]
    foreach n $line {
    # get and count the number:
    if {[string is integer -strict $n]} {
    set n [expr $n] ;# normalise the number
    if {[info exists numbers($n)]} {
    incr numbers($n)
    } else {
    set numbers($n) 1
    }
    }
    }
    }
    if {$input != "stdin"} {
    close $input
    }

    # now we are done processing, print out stats:
    set total 0
    set bits8 0
    set bits16 0
    set bits32 0
    puts "Literals found:"
    foreach n [lsort -integer [array names numbers]] {
    puts " $n - $numbers($n) times"
    if {$n < 256 && $n > -128} {
    incr bits8 $numbers($n)
    } elseif {$n < 65536 && $n > -32768} {
    incr bits16 $numbers($n)
    } else {
    incr bits32 $numbers($n)
    }
    incr total $numbers($n)
    }
    set percent8 [format "%0.2f" [expr $bits8*100.0/$total]]
    set percent16 [format "%0.2f" [expr $bits16*100.0/$total]]
    set percent32 [format "%0.2f" [expr $bits32*100.0/$total]]
    puts "\nStatistics:"
    puts " 8bit numbers - $bits8 ($percent8%)"
    puts " 16bit numbers - $bits16 ($percent16%)"
    puts " 32bit numbers - $bits32 ($percent32%)"

    ============================ end script

    also, befure I used: grep -r -e "= [0-9]". I now realise that this only
    captures assignments and not integers used for other purposes such as a
    = b + 12. I also modified the script to get all numbers in the source
    code instead of just numbers after the = sign.

    I ran this new script with:

    grep -r -E "[0-9]+" code_dir/ | tclsh sumstat.tcl

    piping the result of grep directly into the script. This simply gets
    all lines containing numbers. The new result is obviously much larger
    than before and is probably too long to post. The following is the a
    snipped version of the result:

    Literals found:
    -2147483648 - 5 times
    -976170042 - 2 times
    -823302554 - 1 times
    <snip>
    -1 - 285 times
    0 - 6704 times
    1 - 5408 times
    2 - 3600 times
    3 - 2194 times
    4 - 1735 times
    <snip>
    1936484395 - 1 times
    1936484398 - 1 times
    2147483647 - 2 times

    Statistics:
    8bit numbers - 34984 (83.66%)
    16bit numbers - 6346 (15.18%)
    32bit numbers - 486 (1.16%)
    , Dec 12, 2005
    #15
  16. Am Sun, 11 Dec 2005 20:22:47 -0800 schrieb :

    > wrote:
    >> Stefan Wallentowitz wrote:
    >> > FOr sure, but a number sais more than thousand words..

    >> <snip>
    >> Save the script as sumstat.tcl. To run it, you can type the following:
    >>
    >> tclsh sumstat.tcl stat.txt
    >> <snip>
    >> Hope this helps. Of course, I'm assuming you're on a platform that has
    >> grep and tclsh installed. If not, just use my results. But note that
    >> the statistics above covers C, C++ and Perl code. I was too lazy to
    >> separate them.

    >


    Thank you very, very much. I will use this approach and apply it on some
    code specific to embedded systems. Although it doesn't in effect have
    profiling character it gives a smart feeling of the subject-matter.

    Bye and thanks,
    Stefan
    Stefan Wallentowitz, Dec 12, 2005
    #16
  17. In article <>, "" <> writes:
    >
    > Download lots of source code, preferably large ones like the Linux
    > kernel, Apache, Xfree86 and Mozilla. Then find all literals. I did the
    > following on my company's source codes:
    >
    > grep -r -e "= *[0-9]" SONaccessGateway/ > stat.txt


    As Walter Roberson suggested, it would probably be better to do this
    on the result of preprocessing the source (ie, the output of trans-
    lation phase 4, 5, or 6). That at least will catch literals that are
    expressed using object-like macros, though not necessarily folded
    constants (which could be computed during phase 7, or even not until
    runtime).

    Most implementations have an option to write the result of preprocess-
    ing to a file.

    It might be possible to get even more accurate results by analyzing
    an intermediate result from a later translation phase, such as the
    assembly-language output available from many implementations. I
    recently used such analysis to check for large auto-class objects in
    a large code base. (Details are implementation-dependent and OT,
    obviously.)

    --
    Michael Wojcik

    Cooperation is just like two pagodas, one hardware and one software.
    -- Wen Jiabao
    Michael Wojcik, Dec 13, 2005
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. tony lincoln
    Replies:
    0
    Views:
    314
    tony lincoln
    Oct 13, 2003
  2. er
    Replies:
    1
    Views:
    392
    Barry
    Sep 5, 2007
  3. i-cable hk

    website statistic

    i-cable hk, Dec 17, 2005, in forum: ASP General
    Replies:
    1
    Views:
    109
    McKirahan
    Dec 17, 2005
  4. Replies:
    1
    Views:
    117
    Axel Etzold
    Oct 17, 2008
  5. GIMME
    Replies:
    1
    Views:
    172
    Dr John Stockton
    Apr 9, 2004
Loading...

Share This Page