Read in & count characters from a text file

Discussion in 'Java' started by Jay Cee, Aug 4, 2007.

  1. Jay Cee

    Jay Cee Guest

    Hi All,
    Relatively new to java (ex VB) and could do with some help.
    I need to read a text file character by character (can do),
    and count each character as it appears, i.e
    "A small sample text file" would have 1-A , 2-s, 2-m ,etc etc. and output
    the results.

    I have a few issues which I cannot seem to solve easily,
    1/
    I thought it would be a good idea to save the characters in a hashmap in
    name-value pairs as they are read , map.put(tempStr,"1" )
    I found I had to convert the character to a string before it would save to
    the map.Ideally I would like to save as a character.

    2/
    Before adding each character to the map
    check first if it already exists
    and if found increment the value portion of the name value pair
    else
    if not found insert into map with value of 1.

    My problems seems to be I cannot "check the map" if the character exists and
    if it does exist how do I get at the value to increment it.

    Here is what I have so far,

    import java.io.*;
    import java.util.*;
    class TextTest
    {
    public static Map map = new HashMap();
    private static TreeMap treeMap;
    public static void main(String[] args) throws IOException
    {

    FileInputStream in = new FileInputStream("textfile.txt");
    int ch;
    int total = 0;
    int count = 1;

    while ((ch = in.read()) != -1)
    {
    total ++;
    String tempStr = (Integer.toString(ch)); //Only way to save the
    "char" in the map was to convert it to a string.
    System.out.print((char)ch );

    if (map.containsKey(tempStr))
    {
    map.put(tempStr,"value" ); //How can i extract
    the value,increment it and save back to the map
    }
    else
    {
    map.put(tempStr, "value"); //I need to save the
    integer 1 here in the value part of the map
    }
    }
    treeMap = new TreeMap(map); //sort the map
    System.out.println("Total =" + total);
    System.out.print(treeMap);
    }
    }
     
    Jay Cee, Aug 4, 2007
    #1
    1. Advertising

  2. Jay Cee

    Stefan Ram Guest

    "Jay Cee" <> writes:
    >My problems seems to be I cannot "check the map" if the
    >character exists and if it does exist how do I get at the value
    >to increment it.


    You might use something like

    class NumericMapUtils
    { public static <D> void addTo /* autovivificate the value to 0 */
    ( final java.util.Map<D,java.lang.Integer> map, final D d, final int i )
    { map.put( d, i +( map.containsKey( d )? map.get( d ): 0 )); }}

    and a sorted map

    java.util.TreeMap<java.lang.Character,java.lang.Integer> map;

    then add each text like

    NumericMapUtils.addTo<java.lang.String>( map, 'a', 1 );
    NumericMapUtils.addTo<java.lang.String>( map, 'c', 1 );
    NumericMapUtils.addTo<java.lang.String>( map, 'n', 1 );
    NumericMapUtils.addTo<java.lang.String>( map, 'x', 1 );

    (I have not tested this.)

    Then iterate: »for( final java.lang.Character key: map.keySet() )«

    For files of arbitrary size, use java.math.BigInteger instead
    of java.lang.Integer.
     
    Stefan Ram, Aug 4, 2007
    #2
    1. Advertising

  3. Jay Cee

    Stefan Ram Guest

    Supersedes: <-berlin.de>

    "Jay Cee" <> writes:
    >My problems seems to be I cannot "check the map" if the
    >character exists and if it does exist how do I get at the value
    >to increment it.


    You might use something like

    class NumericMapUtils
    { public static <D> void addTo /* autovivificate the value to 0 */
    ( final java.util.Map<D,java.lang.Integer> map, final D d, final int i )
    { map.put( d, i +( map.containsKey( d )? map.get( d ): 0 )); }}

    and a sorted map

    java.util.TreeMap<java.lang.Character,java.lang.Integer> map;

    then add each text like

    NumericMapUtils.addTo<java.lang.Character>( map, 'a', 1 );
    NumericMapUtils.addTo<java.lang.Character>( map, 'c', 1 );
    NumericMapUtils.addTo<java.lang.Character>( map, 'n', 1 );
    NumericMapUtils.addTo<java.lang.Character>( map, 'x', 1 );

    (I have not tested this. Possibly, the "<java.lang.Character>"
    type argument can be omitted.)

    Then iterate: »for( final java.lang.Character key: map.keySet() )«

    For files of arbitrary size, use java.math.BigInteger instead
    of java.lang.Integer.

    Supersedes: <-berlin.de>
     
    Stefan Ram, Aug 4, 2007
    #3
  4. Jay Cee

    Eric Sosman Guest

    Jay Cee wrote:
    > Hi All,
    > Relatively new to java (ex VB) and could do with some help.
    > I need to read a text file character by character (can do),
    > and count each character as it appears, i.e
    > "A small sample text file" would have 1-A , 2-s, 2-m ,etc etc. and output
    > the results.
    >
    > I have a few issues which I cannot seem to solve easily,
    > 1/
    > I thought it would be a good idea to save the characters in a hashmap in
    > name-value pairs as they are read , map.put(tempStr,"1" )
    > I found I had to convert the character to a string before it would save to
    > the map.Ideally I would like to save as a character.


    Maps (all Collections, in fact) deal only with objects,
    so you cannot store primitive values like char in them. But
    you can use a Character object, which expresses your intent
    more directly than a String does.

    Similarly, the mapped values must also be objects. I
    think an Integer would be a better choice than a String; if
    you expect counts greater than two billion use a Long.

    > 2/
    > Before adding each character to the map
    > check first if it already exists
    > and if found increment the value portion of the name value pair
    > else
    > if not found insert into map with value of 1.
    >
    > My problems seems to be I cannot "check the map" if the character exists and
    > if it does exist how do I get at the value to increment it.


    The map has a containsKey() method that tells you whether
    there is or isn't an entry for a key you're interested in.

    If you're using an Integer (or Long) as the counter, you
    can't just increment it: like String, an Integer cannot be
    changed once it's created. Instead, you need to retrieve the
    existing Integer from the map and replace it with a larger one.

    ... and since you need to retrieve the Integer anyhow, the
    containsKey() method doesn't seem worth while: Just ask the map
    for the Integer corresponding to such-and-such a Character. If
    there is one, replace it. If there's not, you'll get a null
    back from the map and this can be your signal to start a new
    counter at unity:

    Character key = Character.valueOf( (char)ch );
    Integer val = (Integer)map.get(key);
    if (val == null)
    val = Integer.valueOf(1);
    else
    val = Integer.valueOf(val.intValue() + 1);
    map.put(key, val);

    Another approach would be to invent your own Counter class
    that looks a lot like an Integer but is mutable: it has methods
    like set() or increment() that change its value. Then the code
    might look like

    Character key = Character.valueOf( (char)ch );
    Counter cnt = (Counter)map.get(key);
    if (cnt == null)
    map.put(key, new Counter()); // initial value zero
    cnt.increment();

    > Here is what I have so far,
    >
    > import java.io.*;
    > import java.util.*;
    > class TextTest
    > {
    > public static Map map = new HashMap();
    > private static TreeMap treeMap;
    > public static void main(String[] args) throws IOException
    > {
    >
    > FileInputStream in = new FileInputStream("textfile.txt");


    A word of warning: This is legal, but may not be what you
    intend. InputStreams are for files made of bytes; Readers are
    for files made of characters. If an InputStream encounters a
    character that has been encoded in several bytes, it will deliver
    those bytes to you individually. If a Reader encounters such a
    thing, it will decode the multi-byte sequence and deliver you
    the single corresponding character.

    By the way, this sort of code is fine if your objective is
    to learn about Maps and the like. But if your goal is really
    to count char values (or byte values), an array of 65536 (or
    256) ints or longs will be easier:

    counts[ch]++;

    --
    Eric Sosman
    lid
     
    Eric Sosman, Aug 4, 2007
    #4
  5. Jay Cee wrote:
    > Hi All,
    > Relatively new to java (ex VB) and could do with some help.
    > I need to read a text file character by character (can do),
    > and count each character as it appears, i.e
    > "A small sample text file" would have 1-A , 2-s, 2-m ,etc etc. and output
    > the results.
    >
    > I have a few issues which I cannot seem to solve easily,
    > 1/
    > I thought it would be a good idea to save the characters in a hashmap in
    > name-value pairs as they are read , map.put(tempStr,"1" )
    > I found I had to convert the character to a string before it would save to
    > the map.Ideally I would like to save as a character.

    ....

    Although it can certainly be done with a map, I might not use one for
    this. There are only 65,536 possible values for a Java char, so why not
    an array?

    char[] counts = new char[Character.MAX_VALUE+1];
    ....
    counts[ch]++;
    ....

    Patricia
     
    Patricia Shanahan, Aug 4, 2007
    #5
  6. Jay Cee

    Jay Cee Guest

    Hi Patricia
    Yours seems the simplest way to go forward with this but do I have to
    iterate through the array each time I read in a character ? This will
    probably be ok for this instance but if I wanted to do a character count on
    a large document(a book?) surely this would be slower than a hashmap. Is
    there an array that can hold [char,integer] , I will have to do some
    more research.

    Eric thank you for the explanation and the "A word of warning". I have been
    getting this issue of more than 1 char read in and I was wondering why , I
    wonder no longer :)

    Stefan thank you for the swift reply , I will have to do some reading on the
    NumericMapUtils and autovivificate which is not a word I have come across in
    my life until today!!


    Jay


    "Patricia Shanahan" <> wrote in message
    news:f92rr1$2r7l$...
    > Jay Cee wrote:
    >> Hi All,
    >> Relatively new to java (ex VB) and could do with some help.
    >> I need to read a text file character by character (can do),
    >> and count each character as it appears, i.e
    >> "A small sample text file" would have 1-A , 2-s, 2-m ,etc etc. and output
    >> the results.
    >>
    >> I have a few issues which I cannot seem to solve easily,
    >> 1/
    >> I thought it would be a good idea to save the characters in a hashmap in
    >> name-value pairs as they are read , map.put(tempStr,"1" )
    >> I found I had to convert the character to a string before it would save
    >> to the map.Ideally I would like to save as a character.

    > ...
    >
    > Although it can certainly be done with a map, I might not use one for
    > this. There are only 65,536 possible values for a Java char, so why not
    > an array?
    >
    > char[] counts = new char[Character.MAX_VALUE+1];
    > ...
    > counts[ch]++;
    > ...
    >
    > Patricia
     
    Jay Cee, Aug 4, 2007
    #6
  7. Jay Cee wrote:
    > Hi Patricia
    > Yours seems the simplest way to go forward with this but do I have to
    > iterate through the array each time I read in a character ? This will
    > probably be ok for this instance but if I wanted to do a character count on
    > a large document(a book?) surely this would be slower than a hashmap. Is
    > there an array that can hold [char,integer] , I will have to do some
    > more research.


    Sorry, I made a mistake making it a char[], which confuses matters.

    You want an array type that is big enough for each element to hold the
    maximum number of instances of any one character you expect to see in
    the input. Since you are using an int for the total, int must be good
    enough:

    int[] counts = new int[Character.MAX_VALUE+1];

    Each character has its very own entry. For example, decimal 65
    corresponds to 'A', so if you see an 'A' in the input, counts[65] would
    increment by one. Use element 65 for 'A' regardless of what has happened
    before.

    The only time you need to iterate through the array is at the end, to
    report the non-zero counts.

    for(int i = 0; i<counts.length; i++){
    if(counts > 0){
    char ch = (char)i;
    System.out.println("character "+ch+" count "+counts);
    }
    }

    Patricia
     
    Patricia Shanahan, Aug 4, 2007
    #7
  8. Jay Cee

    Roedy Green Guest

    On Sat, 4 Aug 2007 22:09:10 +0100, "Jay Cee"
    <> wrote, quoted or indirectly quoted someone
    who said :

    >I thought it would be a good idea to save the characters in a hashmap in
    >name-value pairs as they are read , map.put(tempStr,"1" )


    You would use HashMap<String,Integer> You have to keep creating new
    Integer objects, one bigger. It is rather clumsy and slow, though
    probably quite adequate to the task.

    Chances are your file contains some limited set of chars, likely only
    chars 0..255. So instead you could use a int[256] to store the
    counts. You index by character. You simply use the ++ operator. It
    is quite a bit simpler. In the worst case you need an array [65535] if
    you have no control over the chars.
    --
    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Aug 5, 2007
    #8
  9. Jay Cee

    Roedy Green Guest

    >Yours seems the simplest way to go forward with this but do I have to
    >iterate through the array each time I read in a character ?

    You index. Most people don't know you can index by chars
    e.g. int x = count[ 'A' ]; is legit java. The char gets promoted to
    the corresponding Unicode int.
    --
    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Aug 5, 2007
    #9
  10. Jay Cee

    cyprian Guest

    On Aug 4, 11:51 pm, Roedy Green <>
    wrote:
    > >Yours seems the simplest way to go forward with this but do I have to
    > >iterate through the array each time I read in a character ?

    >
    > You index. Most people don't know you can index by chars
    > e.g. int x = count[ 'A' ]; is legit java. The char gets promoted to
    > the corresponding Unicode int.
    > --
    > Roedy Green Canadian Mind Products
    > The Java Glossaryhttp://mindprod.com


    to do a character count on a text file, try reading it in through a
    stream, buffer the stream and do read() on the buffered stream. It
    just returns the number of characters read, unicode, code point
    insensitive.then try doing your map thing on it. I was counting some
    words myself recently. http://genericjava.blogspot.com/2007/08/can-i-count-ways-let-me.htm,
    on the other hand you could do readLine() on the buffered stream and
    insert the result into a string buffer and play with the string buffer
    directly. Try doing a regexp construct if possible. Use the string
    buffer as framework for mapping characters to your map and counting
    them char by char and making the count the value for each character
    key.
     
    cyprian, Aug 22, 2007
    #10
  11. Jay Cee

    Roedy Green Guest

    On Wed, 22 Aug 2007 03:45:14 -0700, cyprian <>
    wrote, quoted or indirectly quoted someone who said :

    >to do a character count on a text file,


    And if all you want is a count of chars in the file, use file.length.
    It will give you the byte count without reading even a single byte,
    which is the same as the char count for most files, and quite a
    reasonable measure of "bigness".
    --
    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Aug 23, 2007
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    765
  2. Umesh
    Replies:
    17
    Views:
    782
    James Kanze
    Apr 26, 2007
  3. Umesh
    Replies:
    25
    Views:
    1,558
    James Kanze
    Apr 26, 2007
  4. nirvana
    Replies:
    4
    Views:
    313
  5. efelnavarro09
    Replies:
    2
    Views:
    941
    efelnavarro09
    Jan 26, 2011
Loading...

Share This Page