Break ifstream input into words?

Discussion in 'C++' started by dmurray14, Oct 29, 2006.

  1. dmurray14

    dmurray14 Guest

    Hey guys,

    I'm a C++ newbie here - I've messed with VB, but I mostly stick to web
    languages, so I find C++ to be very confusing at times. Basically, I am
    trying to import a text file, but I want to do it word by word. I am
    confused as to how to do this. Typically, I would think it would make
    sense to try and input the words into strings, but for this application
    I need to use character arrays and pointers. So what's the best way to
    go about this? I know what I need to do - go character by character and
    dump into an array until we get to either a space or some other form of
    punctuation, but I'm having trouble getting this into code. If any of
    you could share some ideas on how to go about this, it would be much
    appreciated! I'm assuming its going to be something like a while loop
    that imports characters while != a space, comma, period, etc, then
    stops when it gets to that. But again - not sure how to do this by
    character - I'm used to strings.

    Thanks a lot, much appreciated!

    dmurray14, Oct 29, 2006
    1. Advertisements

  2. Input the words into strings, and then copy it to the character arrays, or
    alllocate space for the pointers and copy to it. Or just pass the c_str ()
    of the strings to the functions that takes constant c-style strings, if
    =?ISO-8859-15?Q?Juli=E1n?= Albo, Oct 29, 2006
    1. Advertisements

  3. dmurray14

    Salt_Peter Guest

    Here is a suggestion. Before trying something like this, get familiar
    with std::string and containers like std::vectors. Its considerably
    more difficult and error prone to deal with char arrays and pointers.
    I'ld avoid pointers altogether and leave new/delete allocations to
    ancient history.
    A good book would help, consult this newsgroup for some recommended

    #include <iostream>
    #include <string>
    #include <vector>

    int main()
    std::string s("a short string");

    std::vector< std::string > vs; // a vector of strings
    vs.push_back("another string");
    vs.push_back("the last string");

    for(size_t i = 0; i < vs.size(); ++i)
    std::cout << vs << "\n";
    return 0;
    Salt_Peter, Oct 29, 2006
  4. dmurray14

    dmurray14 Guest

    Thanks, I really appreciate it. I would love to use strings, and I
    understand everyone knows it's the better way to go, however the
    guidelines I was given to work with included using character arrays and
    assigning pointers to them. Basically, the program I create needs to
    gather a list of words from an input file and do an analysis of how and
    when they appear. The current plan is to put these words into a linked
    list, alphabetize them, and go from there. However, I'm stuck at
    getting the words seperated. Once I can get them into char arrays, I
    should be fine. It's just the breaking down of the file into words that
    I'm having huge problems with. I'm guessing that if I go with the
    character array, the best way to do this will be to go through the file
    character by character, looking for spaces and punctuation and putting
    words into their nodes in a linked list. However, I don't know how to
    do this - I'm not familiar with how to look at each character in a
    file, and then how to create a character array ("word") with the data.

    Again, thanks for all your help, hopefully this makes it a little more
    clear what I'm going for!

    dmurray14, Oct 29, 2006
  5. Dan - what don't you get, it's simple. (massive tongue in cheek here).

    Hope this helps.

    #include <string>
    #include <iostream>
    #include <vector>
    #include <algorithm>

    const char * GetStr( std::string & i_str )
    return i_str.c_str();

    int main()
    std::string str;
    std::vector<std::string> vec;

    while ( std::cin >> str )
    vec.push_back( str );

    std::vector<const char *> vec2( vec.size() );

    std::transform( vec.begin(), vec.end(), vec2.begin(), GetStr );

    const char ** array = & vec2[0];
    const int num_array = vec2.size();

    // stuff is in array - don't touch vec *or* vec2 while you use
    // "array" otherwise you could be using dangling pointers

    for ( int i = 0; i < num_array; ++ i )
    std::cout << array << "\n";

    Gianni Mariani, Oct 29, 2006
  6. dmurray14

    Jim Langston Guest

    If you really have to, just read each word into a std::string, then copy it
    to a c-string.

    std::string MyString;
    MyISteam >> MyString;
    That will get one "word", although when it hits the end of line it won't
    read anymore.
    One way to deal with this is to read a line at a time, put that into a
    stringstream, then read the words out.

    std::string Line;
    while ( std::getline( MyIStream, Line ) )
    std::stringstream LineStream;
    LineStream << Line;
    std::string Word;
    while ( LineStream >> Word )
    // Do something with the word contained in the std::string here.
    Since you need CStyle strings...
    char Word[100];
    strcpy( Word, Word.c_str() );
    // Now Word has the word in it, what do you want to do with it?
    Jim Langston, Oct 29, 2006
  7. dmurray14

    Adrian Guest

    Here is an example using strings to split out the words. C++ has some nice
    features to make it easy. If you really need char * after then you can
    convert them once you have the separate word. Otherwise there is lots of
    checking involved. If you really want to use char * have a look at the
    standard library function strtok()

    Otherwise this works quite well
    #include <fstream>
    #include <iostream>
    #include <vector>
    #include <string>

    void split(std::string &line, const std::string &separators,
    std::vector<std::string> &words);

    int main()
    std::ifstream in("file.txt");
    std::string line;
    std::vector<std::string> word_list;
    const std::string word_separators(" ,.;:?!");

    while(in && getline(in, line))
    split(line, word_separators, word_list);

    for(std::vector<std::string>::const_iterator i=word_list.begin();
    i!=word_list.end(); i++)
    std::cout << "Word: [" << *i << "]" << std::endl;

    void split(std::string &line, const std::string &separators,
    std::vector<std::string> &words)
    int n = line.length();
    int start, stop;
    start = line.find_first_not_of(separators);
    while((start >= 0) && (start < n))
    stop=line.find_first_of(separators, start);
    if((stop < 0) || (stop > n))
    stop = n;
    words.push_back(line.substr(start, stop - start));
    start=line.find_first_not_of(separators, stop+1);
    Adrian, Oct 29, 2006
  8. BTW - someone else is going to tell you to stop ""TOP POSTING"". Not
    me, I don't mind but others will, be ready !
    That sentence makes no sense - you can't assign pointers to a character
    array. It might sound like I'm being pedant but it seems like there may
    be some confusion so you need to ask.
    a) why linked list ? Why not insert them directly into a std::map ?
    b) exactly what information are you looking for ? Positional information
    as well (like line number? location in file ? etc ..)
    std::istream "knows" how to do this when you read into a std::string.
    Gianni Mariani, Oct 29, 2006
  9. Jim Langston wrote:
    ****** WARNING ******* - Security hole - right here!
    It can't be stressed enough. Don't do an unbounded copy into any array
    variable, especially one on the stack and even more especially from user
    input. You will be 0wn3d.

    Never use "strcpy" or "sprintf" or the like.
    Gianni Mariani, Oct 30, 2006
  10. dmurray14

    dmurray14 Guest

    Wow, you guys are a huge help! Seriously, this is great.. I really
    appreciate it. Let me respond to Gianni's post:

    I don't know what "TOP POSTING" is, so sorry if I did something wrong.
    OK, sorry if it didn't make sense. Basically I need to store the words
    in a character array inside a node of a linked list.
    Those were the requirements for the assignment. After the words are
    stored, I need to figure out how many times the words appear, which
    should be something that I can handle. Just stuck on getting from the
    ifstream down to words in a node, in a character array.

    Which is why I would love to be able to use strings, but again, that
    wasn't the assignment. Probably on purpose, unfortunately.
    dmurray14, Oct 30, 2006
  11. dmurray14

    Daniel T. Guest

    Based on the conversation so far, I get the impression that this is a
    homework assignment, hence the limit on what you are allowed to use of
    the language.

    There are a couple of ways you can attack this problem depending on what
    parts of the language you *can* use. For example, you can't use
    std::string, but can you use std::vector? (A vector<char> can make a
    handy string replacement.) Can you write your own classes? (You can hack
    out a small string class of your own.)

    Do this, write the program so it works on a text file that contains only
    one word. Let me see what you end up with and I'll help you extend it
    from there.
    Daniel T., Oct 30, 2006
  12. dmurray14

    ma740988 Guest

    Intesting. Assume for the moment that two processors ( one card with
    two processors on it ) communicate with each other via a struct called
    test. The struct test an area of memory called 'shared memory' that
    both processors sees. Details aside, here's my ( ingoring the shared
    memory business) current approach to this.

    struct test {
    char input1 [ max ];
    char input2 [ max ];
    char input3 [ max ];
    char input4 [ max ];

    # include <sstream>
    int main()
    double const velocity ( -44.222 ) ;
    std::eek:stringstream oss;
    oss << " SNR value is " << velocity << std::endl;
    test t;
    strcpy ( t.input1, oss.str().c_str() ) ;
    std::cout << t.input1 << std::endl;

    How would I copy the contents to input1 without the use of strcpy?

    An aside:
    In my application the code - on processor A - is akin to:

    int const address_of_test ( 0x3F000000 );

    test * ptr_test ( 0 );
    ptr_test = ( test *)( address_of_test ) ;
    double const velocity ( -44.222 ) ;
    std::eek:stringstream oss;
    oss << " SNR value is " << velocity << std::endl;
    strcpy ( ptr_test->input1, oss.str().c_str() ) ;

    NOTE: address_of_test is where the struct is created and both a and b
    have access to said location. I'm thinking placement new would be
    better here, but I need to do some more reading.
    ma740988, Oct 30, 2006
  13. dmurray14

    Daniel T. Guest

    Do you need to write the linked list?
    Daniel T., Oct 30, 2006
  14. dmurray14

    dmurray14 Guest

    It is in fact work related to a class I'm taking, yes. So far, my plan
    of attack is as follows: use getline to dump each line of the file into
    a character array. Then, I'm going to run through the character array
    looking for spaces, and pull out words every time I get to a space,
    sending them into their own nodes. From there it should be easy to make
    the framework to check to see whether the word has appeared before and
    if so,just add to the count.

    I don't think we're supposed to be making classes, no. The idea is
    likely to stick to the basics so we get the concepts. Hopefully this
    will work out...
    dmurray14, Oct 30, 2006
  15. dmurray14

    Daniel T. Guest

    OK, so start with a file that has only one word in it. You need to read
    in the word and output what word it is and that it was used once. Can
    you get that working?

    If you do the above, that will (a) give me an idea of what you are
    allowed to use of the language, and (b) give me an idea of how good you
    are and therefore what you probably need help with.
    Daniel T., Oct 30, 2006
  16. dmurray14

    BobR Guest

    dmurray14 wrote in message ...
    That part in your post was top-posting. The rest of that post was ok

    We don't like to see the cart before the horse, the answer before the
    question, etc..

    Mr. Steinbach puts it:
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?

    Another thing to mention:
    Trim (delete) anything in a prior post that you are not responding to. Like:
    if you are posting your newly corrected program, delete the old posted
    program. We can simply refer to that post up-thread if we need to review it.
    It wastes bandwith and needlessly takes up space on our hard drives.

    These are not laws, just courtesy.
    BobR, Oct 30, 2006
  17. dmurray14

    BobR Guest

    ma740988 wrote in message ...
    Use 'strncpy' or 'std::copy'.
    BobR, Oct 30, 2006
  18. dmurray14

    dmurray14 Guest

    Thanks...seems like it would work but I'm going crazy, it isn't.

    I have it copying into a character array just fine. I can print it out
    and everything is exactly as it should be. Now all that's left to do is
    split the lines into words, and I JUST CAN'T figure it out for the life
    of me! I've tried strcmp, it tells me it needs Char* (which I thought
    was what I was giving it, but apparently no good), and I've tried even
    doing something like this, which doesn't work:

    //cp = "Where in the world is carmen sandiego"

    char char1[] = "W";
    char char2[] = cp[0];

    strcmp(char1, char2);

    Even that won't work. I can't even get it to compare the first letter,
    let alone the whole word! I am now completely lost as to how to break
    these strings apart into words. Please help!

    dmurray14, Oct 30, 2006
  19. dmurray14

    Daniel T. Guest

    You are having problems because you are trying to solve the problem the
    wrong way. You need to take a "vertical slice" of the problem instead.
    Get it to work when the file only has one word in it first, from
    beginning to end, including loading the word in the linked list (was
    that class provided by your teacher?) Show us the code and we can help
    you from there.
    Daniel T., Oct 30, 2006
  20. dmurray14

    BobR Guest

    dmurray14 wrote in message ...
    You can compare two chars directly.

    char cp[] = "Where in the world is carmen sandiego";
    char char1 = 'W'; // note singlequote
    if( cp[0] == char1 ){
    std::cout <<" cp[0] == char1 "<<std::endl;
    std::cout <<" cp[0] != char1 "<<std::endl;
    // out: cp[0] == char1

    char cp2[] = "Where in the world is Carmen Sandiego";
    if( strcmp( cp, cp2) == 0 ){
    std::cout <<" cp == cp2 "<<std::endl;
    std::cout <<" cp != cp2 "<<std::endl;
    // out: cp != cp2

    int dif = std::strncmp( cp, cp2, 5);
    std::cout <<" strncmp( cp, cp2, 5) ="<<dif<<std::endl;
    // out: strcmp( cp, cp3, 5) =0

    Some other things that may help you (in header <cctype>)
    BobR, Oct 30, 2006
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.