C++ programming challenge

I

Ioannis Vranos

Peter said:
Dear news group,

I have created a small programming challenge for those of you who are
interested in challenging your Standard C++ programming skills. The
challenge is about counting character frequency in large texts,
perhaps useful for spam filtering or classical crypto analysis. You
can read more about it here:

http://blog.p-jansson.com/2009/06/programming-challenge-letter-frequency.html

With kind regards,
Peter Jansson



I think some clarifications should be added to the contest, because the current rules are too vague.


Here are my suggestions, everyone who wishes may try to comply with them.



For Peter Jannson:

1. The code should be compiled with the same C++ compiler. That implies that if C90/95 code is provided, it
should be compilable as C++ code with the relevant casts. It also implies that no C99/C++0x specific features
should be used.

2. The recommended compilation flags are g++ -ansi -pedantic-errors -Wall -O3 -NDEBUG filename.cc -o foobar.

3. Two markings should be taken. An uncached one (e.g. after a reboot), and an OS cached one (after some runs).





For programmers (the initial rules first):


1. Your program should be case insensitive when it counts letters.

2. The program should be written using Standard C++98/03 (C90/95 code can be usually converted to valid C++
code easily, by using casts), so that it can easily be compiled and run on various platforms. This also
permits the code submissions to be tested by Peter Jansson with the same compiler (g++ (C++03)@ubuntu).

3. The code must be absolutely portable, that is to compile and run on all C++03 compilers, without requiring
the installation of additional software/code. This implies e.g. that no other libraries than the standard
library can be used, but for example OpenMP can be used since it works on compilers supporting it and is
ignored on compilers not supporting it (note: I do not know OpenMP).

4. You must begin timing your code just before the text file is opened and you must end the timing after the
results have been written to standard output. For timing, use the style given below:


#incude <ctime> or #include <time.h> (the later mainly for C-style code)


// ...

using namespace std;

// ...

clock_t time 1, time2;



// We start timing.
time1= clock();


// ... Perform the operations


// We "stop" timing.
time2= clock();


// We convert the timing to seconds.
double totalTimeInSeconds= static_cast<double>(time2- time1)/ CLOCKS_PER_SEC;



This works on all platforms (including e.g. Mac OS X), and it will work on the Peter Jansson's testing machine
(g++@Ubuntu Linux), with fractions of seconds accuracy.


5. The timing code should be included in the code submitted.






--
Ioannis A. Vranos

C95 / C++03 Developer

http://www.cpp-software.net
 
L

ld

I think some clarifications should be added to the contest, because the current rules are too vague.

Here are my suggestions, everyone who wishes may try to comply with them.

For Peter Jannson:

1. The code should be compiled with the same C++ compiler. That implies that if C90/95 code is provided, it
should be compilable as C++ code with the relevant casts. It also implies that no C99/C++0x specific features
should be used.

2. The recommended compilation flags are g++ -ansi -pedantic-errors -Wall -O3 -NDEBUG filename.cc -o foobar.

3. Two markings should be taken. An uncached one (e.g. after a reboot), and an OS cached one (after some runs).

For programmers (the initial rules first):

1. Your program should be case insensitive when it counts letters.

2. The program should be written using Standard C++98/03 (C90/95 code can be usually converted to valid C++
code easily, by using casts), so that it can easily be compiled and run on various platforms. This also
permits the code submissions to be tested by Peter Jansson with the same compiler (g++ (C++03)@ubuntu).

3. The code must be absolutely portable, that is to compile and run on all C++03 compilers, without requiring
the installation of additional software/code. This implies e.g. that no other libraries than the standard
library can be used, but for example OpenMP can be used since it works on compilers supporting it and is
ignored on compilers not supporting it (note: I do not know OpenMP).

4. You must begin timing your code just before the text file is opened and you must end the timing after the
results have been written to standard output. For timing, use the style given below:

#incude <ctime> or #include <time.h> (the later mainly for C-style code)

// ...

using namespace std;

// ...

clock_t time 1, time2;

// We start timing.
time1= clock();

// ... Perform the operations

// We "stop" timing.
time2= clock();

// We convert the timing to seconds.
double totalTimeInSeconds=  static_cast<double>(time2- time1)/ CLOCKS_PER_SEC;

This works on all platforms (including e.g. Mac OS X), and it will work on the Peter Jansson's testing machine
  (g++@Ubuntu Linux), with fractions of seconds accuracy.

5. The timing code should be included in the code submitted.

I would rely on the time command to time the process. clock is not
very stable/accurate on some systems, especially when run in virtual
machine (my experience).

BTW, you should add the constraint that the program must open and
close explicitly the input file and not use stdin/cin. Because
commands like "cat in_file | ./foobar" or "./foobar < in_file" may use
more efficient system-level code (non-standard C/C++) to read the file
like mapping files to memory (pipe) and bypass the OS caches. Then
your program would mainly copy memory buffers and would rely on
external tools to optimize the reading.

a+, ld.
 
J

Jorgen Grahn

Ioannis said:
Thomas said:
Ioannis Vranos schrieb:
Thomas J. Gritzan wrote:
Ioannis Vranos schrieb:
do
{
inputFile.read(buffer, sizeof(buffer));


for(streamsize i= 0; i< inputFile.gcount(); ++i)
++characterFrequencies[ buffer ];

}while(not inputFile.eof());

By the way, this "not" instead of "!" is horrible. I would write:



Horrible? "not" is a *built in* keyword, as "and" and "or". They are
more user readable.


You don't have to use it just because it's a keyword. register also is a
keyword, and you shouldn't use it, because it doesn't have any meaning
with current compilers.

About "not" or "!", C++ programmers are more used to the symbol. I
rarely see the use of not/and/or in this newsgroup. It might be more
readable to you, but it's not more readable in general.


I kind of agree -- but it seems from earlier discussions here that
there are a few C++ programmers who use 'not' and friends. I wouldn't
call it "horrible".

/Jorgen
 
C

Chris M. Thomasson

Jorgen Grahn said:
I think that was his point. (Hint: look at the Newsgroups: and
Subject: headers above.)

The contest states that one can submit C++ or C code. Well, I guess I should
have posted it on the blog instead of here.
 
I

Ioannis Vranos

Chris said:
The contest states that one can submit C++ or C code. Well, I guess I
should have posted it on the blog instead of here.


Actually I think the challenge terms are wrong. C and C++ are not the same language. They are different languages.

And even code under the common subset may perform differently when compiled with gcc and g++ (two distinct C
and C++ compiler respectively).


It would be the same if fortran and pascal were accepted too. We can't compare code produced by different
compilers.




--
Ioannis A. Vranos

C95 / C++03 Developer

http://www.cpp-software.net
 
J

James Kanze

I assume you are talking about the return value of getchar().

Or std::istream::get()
I do not think "it may be possible to have something other
than EOF when std::ios::eof() returns true".

If std::istream::get() is called before std::ios::eof(), it's
entirely possible (although admittedly not likely with a
"conventional" implementation---eofbit normally only gets set on
successful input when look-ahead is required, which isn't the
case for std::istream::get().

The important thing to realize is that if std::ios::eof()
returns true, it says nothing about any previous input; all it
means is that the next input will fail (assuming no intervening
seek, etc.). And if it returns false, it's also possible for
the preceding input to have failed (format error, etc.). In
practice, in fact, it makes no sense testing std::ios::eof()
until after you've detected a failure: basically, if
input.fail() && ! input.bad() && ! input.eof(), there was a
format failure. (Regretfully, the reverse isn't true. You can
have input.fail() && input.eof(), and still have a format error,
rather than a true end of file.)
 
J

James Kanze

Peter Jansson wrote:
I think some clarifications should be added to the contest,
because the current rules are too vague.
1. Your program should be case insensitive when it counts
letters.

Define case insensitive. Does 'é' map to 'E' or 'É'? (And what
about the fact that 'é' is the two byte sequence 0xC3, 0xA9 on
my Linux boxes?)
2. The program should be written using Standard C++98/03
(C90/95 code can be usually converted to valid C++ code
easily, by using casts), so that it can easily be compiled and
run on various platforms. This also permits the code
submissions to be tested by Peter Jansson with the same
compiler (g++ (C++03)@ubuntu).

G++ won't compile C++03, or even C++98. At the very least, if
he wants the code to compile with g++, he'll have to ban export.

And of course, he needs to state which version of g++ is to be
used as a reference. Every time we upgrade to a more recent
version, we have to fix a few things.
3. The code must be absolutely portable, that is to compile
and run on all C++03 compilers, without requiring the
installation of additional software/code. This implies e.g.
that no other libraries than the standard library can be used,
but for example OpenMP can be used since it works on compilers
supporting it and is ignored on compilers not supporting it
(note: I do not know OpenMP).

And how do you verify this? (Also, of course, "export" is part
of C++03, but very few compilers implement it.)
 
J

Jerry Coffin

Thomas said:
[...]
The loop should be:

do {
// ...
} while (f);

instead of
/*...*/ while (!f.eof());

I wonder where this wrong usage of eof comes from. Is it some bad book
or something?

I think this is where general indifference towards checking
for rare errors meet a badly documented and somewhat clumsy
old interface.

I can't really agree. The same problem routinly arises in C where the
interface is clear, straightforward and well documented.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top