counting words in input

A

arnuld

I am able to create the 90% of this program and it runs fine. In its
present implementation, it reads from standard input. I am not able to
complete this program as last part requires to read from a file. All I
know about file-streams is that I need to use:
<int main(int argc, char**argv)>
and nothing more than that. I will appreciate if someone can help me:

/* C++ Primer - 4/e
*
* chapter 11, exercise 11.9
* STATEMENT
* Write a program to count word size of greater than or equal to 4
including printing the list of unique words in the input. Test your
program by running it on program's source file.
*
*/


#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>


/* this functions appends the 3rd argument to its 2nd argument if 1st
argument is true */
std::string make_plural( size_t ctr,
const std::string &word,
const std::string & ending )
{
return (ctr == 1) ? word : word + ending;
}



bool isShorter( const std::string &s1, const std::string &s2 ) {
return s1.size() < s2.size();
}


bool GT4( const std::string &s )
{
return s.size() >= 4;
}


int main( )
{
std::vector<std::string> svec;
/* input some words */
std::copy( std::istream_iterator<std::string>( std::cin ),
std::istream_iterator<std::string>(), std::back_inserter( svec ) );



/* copy the vector, to be used later for printing */
std::vector<std::string> svec_old( svec );
std::sort( svec.begin(), svec.end() );

/* to eliminate th dupilcate words we 1st, rearrange the words by
putting all the duplicate words in the end of vector and then we will
use vector operation ERASE to remove them */

std::vector<std::string>::iterator begin_duplicates =
std::unique( svec.begin(), svec.end() );

svec.erase( begin_duplicates, svec.end() );

/* sort the words by size while maintaining the alphabetical order */
std::stable_sort( svec.begin(), svec.end(), isShorter );

std::vector<std::string>::size_type unique_count =
std::count_if( svec.begin(), svec.end(), GT4 );


std::cout << unique_count << " "
<< make_plural( unique_count, "word", "s" )
<< " 4 characters or longer"
<< std::endl;

for( std::vector<std::string>::const_iterator iter = svec_old.begin();
iter != svec_old.end();
++iter )
{
if( GT4( *iter ))
{
std::cout << *iter << std::endl;
}
}

return 0;

}
 
A

Alf P. Steinbach

* arnuld:
I am able to create the 90% of this program and it runs fine. In its
present implementation, it reads from standard input. I am not able to
complete this program as last part requires to read from a file. All I
know about file-streams is that I need to use:
<int main(int argc, char**argv)>
and nothing more than that. I will appreciate if someone can help me:

/* C++ Primer - 4/e
*
* chapter 11, exercise 11.9
* STATEMENT
* Write a program to count word size of greater than or equal to 4
including printing the list of unique words in the input. Test your
program by running it on program's source file.
*
*/

Assuming your program's executable is 'myprogram', and the program's
source file is 'myprogram.cpp', on a *nix system try

$ ./myprogram <myprogram.cpp

or

$ cat myprogram.cpp | ./myprogram

or on a Windows system

C:\wherever> myprogram <myprogram.cpp

or

C:\wherever> type myprogram.cpp | myprogram

That said, you can do program arguments simply like (disclaimer:
off-the-cuff code, not touched by compiler's hands):

#include <fstream>
#include <iostream>
#include <ostream>
#include <cstddef>

void doThings( std::istream& input )
{
// ...
}

int main( int argc, char* argv[] )
{
using namespace std;

switch( argc )
{
case 1:
{
doThings( cin );
return EXIT_SUCCESS;
}

case 2:
{
ifstream input( argv[1] );
if( !input )
{
cerr << "Unable to open [" << argv[1] << "]." << endl;
return EXIT_FAILURE;
}
else
{
doThings( input );
return EXIT_SUCCESS;
}
}

case default:
{
cerr << "Usage: plingplong [FILENAME]" << endl;
return EXIT_FAILURE;
}
}
}

Cheers, & hth.,

- Alf
 
J

Juha Nieminen

Alf said:
$ cat myprogram.cpp | ./myprogram

You really shouldn't be teaching useless use of cat.

If you really want to express the input file before the program you
can do it like this:

$ < myprogram.cpp ./myprogram
 
A

Alf P. Steinbach

* Juha Nieminen:
You really shouldn't be teaching useless use of cat.

If you really want to express the input file before the program you
can do it like this:

$ < myprogram.cpp ./myprogram

Well, the cat is idiomatic and easy to read, whereas the command you
show isn't.

C++ related: the impossibility of writing a copy-input-to-output-exactly
program in a portable way for those systems where it's meaningful.

Cheers, & hth.,

- Alf
 
A

arnuld

int main( int argc, char* argv[] )
{
using namespace std;

switch( argc )
{
case 1:
{
doThings( cin );
return EXIT_SUCCESS;
}

in C++ Primer 4/e, I read that <argv[0]> is always going to be reserved.
It means that argv[] is always going to have one element and If I expect
one input file then there will be 2 elements: argv[0] and argv[1] and
argv[1] will be the input file.

So if "case 1" holds then it means there is no input file, then why
EXIT_SUCCESS ?
 
A

arnuld

case default:
{
cerr << "Usage: plingplong [FILENAME]" << endl;
return EXIT_FAILURE;
}
}

I am sure you left that "case" just before "default" to teach me a lesson.
Well, It took me 20 min to figure out the error from GCC and it really
taught me some lesson :)
 
A

arnuld

Assuming your program's executable is 'myprogram', and the program's
source file is 'myprogram.cpp', on a *nix system try

$ ./myprogram <myprogram.cpp
............ [SNIP].............

case default:
{
cerr << "Usage: plingplong [FILENAME]" << endl;
return EXIT_FAILURE;
}
}
}


switch( argc )
{
case 1:
{
std::cerr << "No input file ?\n";
return EXIT_FAILURE;
}
case 2:
{
std::ifstream infile( argv[1] );
if ( !infile )
{
std::cerr << "Can't open file :( \n" << std::endl;
return EXIT_FAILURE;
}
else
{
save_to_vec(infile, svec);
return EXIT_SUCCESS;
}
}
default:
{
std::cerr << "Usage Pling-Plong\n";
return EXIT_FAILURE;
}
}



it always outputs this:

[arnuld@arch programs]$ ls
10.01.cpp 11.09.cpp 11.09.cpp~ 11.09_using-std-input.cpp a.out post.txt
[arnuld@arch programs]$ ./a.out <10.01.cpp
No input file ?
[arnuld@arch programs]$ ./a.out < 10.01.cpp
No input file ?
[arnuld@arch programs]$ ./a.out
No input file ?
[arnuld@arch programs]$ cat 10.01.cpp | ./a.out
No input file ?
[arnuld@arch programs]$
 
P

Pete Becker

it always outputs this:

[arnuld@arch programs]$ ls
10.01.cpp 11.09.cpp 11.09.cpp~ 11.09_using-std-input.cpp a.out post.txt
[arnuld@arch programs]$ ./a.out <10.01.cpp
No input file ?
[arnuld@arch programs]$ ./a.out < 10.01.cpp
No input file ?
[arnuld@arch programs]$ ./a.out
No input file ?
[arnuld@arch programs]$ cat 10.01.cpp | ./a.out
No input file ?
[arnuld@arch programs]$

As it should. <g> The third example should be obvious: there is no
input whatsoever. The rest all do the same thing: they put data on the
standard input stream. The program doesn't read standard input, though,
so it complains that there's no input file. (The earlier suggested
version, with case 1: would read the standard input stream in that
case).

../a.out 10.0.1.cpp
 
D

Daniel T.

arnuld said:
I am able to create the 90% of this program and it runs fine. In its
present implementation, it reads from standard input. I am not able to
complete this program as last part requires to read from a file. All I
know about file-streams is that I need to use:
<int main(int argc, char**argv)>
and nothing more than that. I will appreciate if someone can help me:

To answer your specific question, you would need to do something like
this:

int main( int argc, char** argv ) {
if ( argc < 2 ) {
cout << "format: " << argv[0] << " filename\n";
return -1;
}
ifstream file( argv[1] );

// from here on out, use 'file' instead of 'cin'...
/* C++ Primer - 4/e
*
* chapter 11, exercise 11.9
* STATEMENT
* Write a program to count word size of greater than or equal to 4
including printing the list of unique words in the input. Test your
program by running it on program's source file.
*
*/

I take the above to mean, count the total number of words >= 4
(including duplicates) and list all unique words. So for example:

"this this this" would print 3 words >= 4, unique words: 'this'

I am currently dealing with a lot of text because I am involved with
converting/writing several programs to handle multiple languages right
now with work.

In the real world, such a problem statement would expect you to handle
upper/lower case letters properly, and deal with punctuation. On top of
that, several languages (even strictly Western European ones) use
letters that are not in the ASCII or Latin-1 character sets. You are
likely to get input that is in either UTF-16BE, UTF-16LE, or UTF-8
formats. To solve the problem statement, you would need to know what
input formats you must support... In other words, that deceptively
simple problem statement, has the potential of producing a very
complicated program.
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>


/* this functions appends the 3rd argument to its 2nd argument if 1st
argument is true */
std::string make_plural( size_t ctr,
const std::string &word,
const std::string & ending )
{
return (ctr == 1) ? word : word + ending;
}



bool isShorter( const std::string &s1, const std::string &s2 ) {
return s1.size() < s2.size();
}


bool GT4( const std::string &s )
{
return s.size() >= 4;
}


int main( )
{
std::vector<std::string> svec;
/* input some words */
std::copy( std::istream_iterator<std::string>( std::cin ),
std::istream_iterator<std::string>(), std::back_inserter( svec ) );



/* copy the vector, to be used later for printing */
std::vector<std::string> svec_old( svec );

Rather than make a copy like above, put the stuff that uses svec in a
separate function. Pass the vector by value and a copy will
automatically be made. (Caveat: I don't think a copy is necessary to
solve the problem though.)
std::sort( svec.begin(), svec.end() );

/* to eliminate th dupilcate words we 1st, rearrange the words by
putting all the duplicate words in the end of vector and then we will
use vector operation ERASE to remove them */

std::vector<std::string>::iterator begin_duplicates =
std::unique( svec.begin(), svec.end() );

svec.erase( begin_duplicates, svec.end() );

The erase, remove idiom is pretty well known. No need to break it up.

svec.erase( unique( svec.begin(), svec.end() ), svec.end() );
/* sort the words by size while maintaining the alphabetical order */
std::stable_sort( svec.begin(), svec.end(), isShorter );

The above is wholly unnecessary.
std::vector<std::string>::size_type unique_count =
std::count_if( svec.begin(), svec.end(), GT4 );

Strictly speaking that should be &GT4 in the above, not putting the '&'
on the function name is deprecated.
std::cout << unique_count << " "
<< make_plural( unique_count, "word", "s" )
<< " 4 characters or longer"
<< std::endl;

If I understand the problem correctly, you are supposed to print a list
of all the unique words in the input. The below prints all the words
with a size of 4+ and prints duplicates...
for( std::vector<std::string>::const_iterator iter = svec_old.begin();
iter != svec_old.end();
++iter )
{
if( GT4( *iter ))
{
std::cout << *iter << std::endl;
}
}

return 0;

}


Here is one of the solutions I came up with. Note especially how easy
the "count_if" line is. "count_if... size_greater_than( 3 )" makes a lot
of sense grammatically. Also note that I didn't make a separate
'make_plural' function. It is not that easy to pluralize a word so I
tend to do it on a case-by-case basis.

struct size_greater_than : unary_function< string, bool >
{
size_t x;
size_greater_than( size_t x ): x( x ) { }
bool operator()( const string& s ) const {
return s.size() > x;
}
};

int main( int argc, char** argv ) {
if ( argc < 2 ) {
cout << "format: " << argv[0] << " filename\n";
return -1;
}
ifstream file( argv[1] );

vector< string > words;
copy( istream_iterator< string >( file ), istream_iterator<string>(),
back_inserter( words ) );

// count and output the total number of words >= 4
size_t count = count_if( words.begin(), words.end(),
size_greater_than( 3 ) );

cout << count << " word" << ( count == 1 ? " " : "s " )
<< "4 characters or longer\n";

// sort and remove duplicates
sort( words.begin(), words.end() );
words.erase( unique( words.begin(), words.end() ), words.end() );

// output unique words
cout << "Unique words: \n";
copy( words.begin(), words.end(),
ostream_iterator<string>( cout, "\n" ) );
}
 
R

Rolf Magnus

arnuld said:
Assuming your program's executable is 'myprogram', and the program's
source file is 'myprogram.cpp', on a *nix system try

$ ./myprogram <myprogram.cpp
............ [SNIP].............

case default:
{
cerr << "Usage: plingplong [FILENAME]" << endl;
return EXIT_FAILURE;
}
}
}


switch( argc )
{
case 1:
{
std::cerr << "No input file ?\n";
return EXIT_FAILURE;
}
case 2:
{
std::ifstream infile( argv[1] );
if ( !infile )
{
std::cerr << "Can't open file :( \n" << std::endl;
return EXIT_FAILURE;
}
else
{
save_to_vec(infile, svec);
return EXIT_SUCCESS;
}
}
default:
{
std::cerr << "Usage Pling-Plong\n";
return EXIT_FAILURE;
}
}



it always outputs this:

[arnuld@arch programs]$ ls
10.01.cpp 11.09.cpp 11.09.cpp~ 11.09_using-std-input.cpp a.out
post.txt
[arnuld@arch programs]$ ./a.out <10.01.cpp
No input file ?
[arnuld@arch programs]$ ./a.out < 10.01.cpp
No input file ?
[arnuld@arch programs]$ ./a.out
No input file ?
[arnuld@arch programs]$ cat 10.01.cpp | ./a.out
No input file ?
[arnuld@arch programs]$

In all those cases except the third, you are redirecting a file to your
program's standard input stream and giving it no command line argument.
Try: ./a.out 10.01.cpp
 
A

arnuld

Here is one of the solutions I came up with. Note especially how easy
the "count_if" line is. "count_if... size_greater_than( 3 )" makes a lot
of sense grammatically. Also note that I didn't make a separate
'make_plural' function. It is not that easy to pluralize a word so I
tend to do it on a case-by-case basis.

struct size_greater_than : unary_function< string, bool >
{
size_t x;
size_greater_than( size_t x ): x( x ) { }
bool operator()( const string& s ) const {
return s.size() > x;
}
};

your solution is pretty simpler than my version but I did not understand
the very 1st function, the struct you created. What exactly it is doing ?
 
J

James Kanze

On Sun, 16 Dec 2007 09:25:43 +0100, Alf P. Steinbach wrote:
int main( int argc, char* argv[] )
{
using namespace std;
switch( argc )
{
case 1:
{
doThings( cin );
return EXIT_SUCCESS;
}

[...]
So if "case 1" holds then it means there is no input file, then why
EXIT_SUCCESS ?

Because he's adopted the usual Unix tradition of using standard
in if no input file has been specified.

More generally, I'd recommend something like:

if ( argc <= 1 ) {
doThings( std::cin ) ;
} else {
for ( int i = 1 ; i < argc ; ++ i ) {
std::ifstream input( argv[ 9 ] ) ;
if ( input ) {
doThings( input ) ;
} else {
// signal and memorize error...
}
}
}

(Also, of course, if doThings involves writing to a file, you'll
have to either flush() it or close() it at then, and then test
the status, and have the return code reflect that.)
 
D

Daniel T.

arnuld said:
your solution is pretty simpler than my version but I did not understand
the very 1st function, the struct you created. What exactly it is doing ?

Look at the sight where it is used:

size_t count = count_if( words.begin(), words.end(),
size_greater_than( 3 ) );

The 3rd argument in count_if requires a 'functor'. A functor is anything
that you can call like a function. That means actual functions or
classes/structs that have the operator() defined.

The struct above is just such a beast. For example, the count_if code
probably looks something like this:

template < typeanme FwIt, typename Fn >
size_t count_if( FwIt first, FwIt last, Fn fn ) {
size_t result = 0;
while ( first != last ) {
if ( fn( *first ) )
++result;
++first;
}
return result;
}

In our case, the compiler will use the template to write something like
this:

size_t count_if( vector<string>::iterator first,
vector<string>::iterator last, size_greater_than fn ) {
size_t result = 0;
while ( first != last ) {
if ( fn( *first ) )
++result;
++first;
}
return result;
}

So it will be treating the 'size_greater_that' object as if it is a
function that takes a string parameter... which is exactly what our
size_greater_than struct deals with.

Read "The C++ Programming Language" section 18.4 "Function Objects" for
a longer explanation. (If I remember right, you do have that book...)
 
J

Jerry Coffin

[ ... ]
* Write a program to count word size of greater than or equal to 4
including printing the list of unique words in the input. Test your
program by running it on program's source file.

People have already talked quite a bit about the part you really asked
about, but I thought I'd add a slightly different approach to the task
itself:

#include <iostream>
#include <set>
#include <algorithm>
#include <string>

class shorter_than {
size_t x;
public:
shorter_than(size_t c) : x(c) {}

bool operator()(std::string const &s) {
return s.length() < x;
}
};

int main() {
std::set<std::string> words;

std::remove_copy_if(
std::istream_iterator<std::string>(std::cin),
std::istream_iterator<std::string>(),
std::inserter(words, words.begin()),
shorter_than(4));

std::cout << words.size() << " unique words:\n";
std::copy(words.begin(), words.end(),
std::eek:stream_iterator<std::string>(std::cout, "\n"));
return 0;
}

std::remove_copy_if copies one container to another, leaving out any
that meet the specified criteria. std::set only allows one copy of a
specific item to be inserted, so each one is automatically unique,
without explicitly removing duplicates.
 
D

Daniel T.

Jerry Coffin said:
(e-mail address removed) says...

[ ... ]
* Write a program to count word size of greater than or equal to 4
including printing the list of unique words in the input. Test your
program by running it on program's source file.

People have already talked quite a bit about the part you really asked
about, but I thought I'd add a slightly different approach to the task
itself:

#include <iostream>
#include <set>
#include <algorithm>
#include <string>

class shorter_than {
size_t x;
public:
shorter_than(size_t c) : x(c) {}

bool operator()(std::string const &s) {
return s.length() < x;
}
};

int main() {
std::set<std::string> words;

std::remove_copy_if(
std::istream_iterator<std::string>(std::cin),
std::istream_iterator<std::string>(),
std::inserter(words, words.begin()),
shorter_than(4));

std::cout << words.size() << " unique words:\n";
std::copy(words.begin(), words.end(),
std::eek:stream_iterator<std::string>(std::cout, "\n"));
return 0;
}

std::remove_copy_if copies one container to another, leaving out any
that meet the specified criteria. std::set only allows one copy of a
specific item to be inserted, so each one is automatically unique,
without explicitly removing duplicates.

I thought of the above myself, but I'm not sure if it satisfies the
problem statement...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

sorting the input 16
counting input words 1
sort input 7
counting repeated words in input 10
TF-IDF 1
C++ Primer ex 3.14 8
Can not create a Vector of Strings 10
Reverse a String 19

Members online

Forum statistics

Threads
473,882
Messages
2,569,948
Members
46,267
Latest member
TECHSCORE

Latest Threads

Top