Read in & count characters from a text file

J

Jay Cee

Hi All,
Relatively new to java (ex VB) and could do with some help.
I need to read a text file character by character (can do),
and count each character as it appears, i.e
"A small sample text file" would have 1-A , 2-s, 2-m ,etc etc. and output
the results.

I have a few issues which I cannot seem to solve easily,
1/
I thought it would be a good idea to save the characters in a hashmap in
name-value pairs as they are read , map.put(tempStr,"1" )
I found I had to convert the character to a string before it would save to
the map.Ideally I would like to save as a character.

2/
Before adding each character to the map
check first if it already exists
and if found increment the value portion of the name value pair
else
if not found insert into map with value of 1.

My problems seems to be I cannot "check the map" if the character exists and
if it does exist how do I get at the value to increment it.

Here is what I have so far,

import java.io.*;
import java.util.*;
class TextTest
{
public static Map map = new HashMap();
private static TreeMap treeMap;
public static void main(String[] args) throws IOException
{

FileInputStream in = new FileInputStream("textfile.txt");
int ch;
int total = 0;
int count = 1;

while ((ch = in.read()) != -1)
{
total ++;
String tempStr = (Integer.toString(ch)); //Only way to save the
"char" in the map was to convert it to a string.
System.out.print((char)ch );

if (map.containsKey(tempStr))
{
map.put(tempStr,"value" ); //How can i extract
the value,increment it and save back to the map
}
else
{
map.put(tempStr, "value"); //I need to save the
integer 1 here in the value part of the map
}
}
treeMap = new TreeMap(map); //sort the map
System.out.println("Total =" + total);
System.out.print(treeMap);
}
}
 
S

Stefan Ram

Jay Cee said:
My problems seems to be I cannot "check the map" if the
character exists and if it does exist how do I get at the value
to increment it.

You might use something like

class NumericMapUtils
{ public static <D> void addTo /* autovivificate the value to 0 */
( final java.util.Map<D,java.lang.Integer> map, final D d, final int i )
{ map.put( d, i +( map.containsKey( d )? map.get( d ): 0 )); }}

and a sorted map

java.util.TreeMap<java.lang.Character,java.lang.Integer> map;

then add each text like

NumericMapUtils.addTo<java.lang.String>( map, 'a', 1 );
NumericMapUtils.addTo<java.lang.String>( map, 'c', 1 );
NumericMapUtils.addTo<java.lang.String>( map, 'n', 1 );
NumericMapUtils.addTo<java.lang.String>( map, 'x', 1 );

(I have not tested this.)

Then iterate: »for( final java.lang.Character key: map.keySet() )«

For files of arbitrary size, use java.math.BigInteger instead
of java.lang.Integer.
 
S

Stefan Ram

Supersedes: <[email protected]>

Jay Cee said:
My problems seems to be I cannot "check the map" if the
character exists and if it does exist how do I get at the value
to increment it.

You might use something like

class NumericMapUtils
{ public static <D> void addTo /* autovivificate the value to 0 */
( final java.util.Map<D,java.lang.Integer> map, final D d, final int i )
{ map.put( d, i +( map.containsKey( d )? map.get( d ): 0 )); }}

and a sorted map

java.util.TreeMap<java.lang.Character,java.lang.Integer> map;

then add each text like

NumericMapUtils.addTo<java.lang.Character>( map, 'a', 1 );
NumericMapUtils.addTo<java.lang.Character>( map, 'c', 1 );
NumericMapUtils.addTo<java.lang.Character>( map, 'n', 1 );
NumericMapUtils.addTo<java.lang.Character>( map, 'x', 1 );

(I have not tested this. Possibly, the "<java.lang.Character>"
type argument can be omitted.)

Then iterate: »for( final java.lang.Character key: map.keySet() )«

For files of arbitrary size, use java.math.BigInteger instead
of java.lang.Integer.

Supersedes: <[email protected]>
 
E

Eric Sosman

Jay said:
Hi All,
Relatively new to java (ex VB) and could do with some help.
I need to read a text file character by character (can do),
and count each character as it appears, i.e
"A small sample text file" would have 1-A , 2-s, 2-m ,etc etc. and output
the results.

I have a few issues which I cannot seem to solve easily,
1/
I thought it would be a good idea to save the characters in a hashmap in
name-value pairs as they are read , map.put(tempStr,"1" )
I found I had to convert the character to a string before it would save to
the map.Ideally I would like to save as a character.

Maps (all Collections, in fact) deal only with objects,
so you cannot store primitive values like char in them. But
you can use a Character object, which expresses your intent
more directly than a String does.

Similarly, the mapped values must also be objects. I
think an Integer would be a better choice than a String; if
you expect counts greater than two billion use a Long.
2/
Before adding each character to the map
check first if it already exists
and if found increment the value portion of the name value pair
else
if not found insert into map with value of 1.

My problems seems to be I cannot "check the map" if the character exists and
if it does exist how do I get at the value to increment it.

The map has a containsKey() method that tells you whether
there is or isn't an entry for a key you're interested in.

If you're using an Integer (or Long) as the counter, you
can't just increment it: like String, an Integer cannot be
changed once it's created. Instead, you need to retrieve the
existing Integer from the map and replace it with a larger one.

... and since you need to retrieve the Integer anyhow, the
containsKey() method doesn't seem worth while: Just ask the map
for the Integer corresponding to such-and-such a Character. If
there is one, replace it. If there's not, you'll get a null
back from the map and this can be your signal to start a new
counter at unity:

Character key = Character.valueOf( (char)ch );
Integer val = (Integer)map.get(key);
if (val == null)
val = Integer.valueOf(1);
else
val = Integer.valueOf(val.intValue() + 1);
map.put(key, val);

Another approach would be to invent your own Counter class
that looks a lot like an Integer but is mutable: it has methods
like set() or increment() that change its value. Then the code
might look like

Character key = Character.valueOf( (char)ch );
Counter cnt = (Counter)map.get(key);
if (cnt == null)
map.put(key, new Counter()); // initial value zero
cnt.increment();
Here is what I have so far,

import java.io.*;
import java.util.*;
class TextTest
{
public static Map map = new HashMap();
private static TreeMap treeMap;
public static void main(String[] args) throws IOException
{

FileInputStream in = new FileInputStream("textfile.txt");

A word of warning: This is legal, but may not be what you
intend. InputStreams are for files made of bytes; Readers are
for files made of characters. If an InputStream encounters a
character that has been encoded in several bytes, it will deliver
those bytes to you individually. If a Reader encounters such a
thing, it will decode the multi-byte sequence and deliver you
the single corresponding character.

By the way, this sort of code is fine if your objective is
to learn about Maps and the like. But if your goal is really
to count char values (or byte values), an array of 65536 (or
256) ints or longs will be easier:

counts[ch]++;
 
P

Patricia Shanahan

Jay said:
Hi All,
Relatively new to java (ex VB) and could do with some help.
I need to read a text file character by character (can do),
and count each character as it appears, i.e
"A small sample text file" would have 1-A , 2-s, 2-m ,etc etc. and output
the results.

I have a few issues which I cannot seem to solve easily,
1/
I thought it would be a good idea to save the characters in a hashmap in
name-value pairs as they are read , map.put(tempStr,"1" )
I found I had to convert the character to a string before it would save to
the map.Ideally I would like to save as a character.
....

Although it can certainly be done with a map, I might not use one for
this. There are only 65,536 possible values for a Java char, so why not
an array?

char[] counts = new char[Character.MAX_VALUE+1];
....
counts[ch]++;
....

Patricia
 
J

Jay Cee

Hi Patricia
Yours seems the simplest way to go forward with this but do I have to
iterate through the array each time I read in a character ? This will
probably be ok for this instance but if I wanted to do a character count on
a large document(a book?) surely this would be slower than a hashmap. Is
there an array that can hold [char,integer] , I will have to do some
more research.

Eric thank you for the explanation and the "A word of warning". I have been
getting this issue of more than 1 char read in and I was wondering why , I
wonder no longer :)

Stefan thank you for the swift reply , I will have to do some reading on the
NumericMapUtils and autovivificate which is not a word I have come across in
my life until today!!


Jay


Patricia Shanahan said:
Jay said:
Hi All,
Relatively new to java (ex VB) and could do with some help.
I need to read a text file character by character (can do),
and count each character as it appears, i.e
"A small sample text file" would have 1-A , 2-s, 2-m ,etc etc. and output
the results.

I have a few issues which I cannot seem to solve easily,
1/
I thought it would be a good idea to save the characters in a hashmap in
name-value pairs as they are read , map.put(tempStr,"1" )
I found I had to convert the character to a string before it would save
to the map.Ideally I would like to save as a character.
...

Although it can certainly be done with a map, I might not use one for
this. There are only 65,536 possible values for a Java char, so why not
an array?

char[] counts = new char[Character.MAX_VALUE+1];
...
counts[ch]++;
...

Patricia
 
P

Patricia Shanahan

Jay said:
Hi Patricia
Yours seems the simplest way to go forward with this but do I have to
iterate through the array each time I read in a character ? This will
probably be ok for this instance but if I wanted to do a character count on
a large document(a book?) surely this would be slower than a hashmap. Is
there an array that can hold [char,integer] , I will have to do some
more research.

Sorry, I made a mistake making it a char[], which confuses matters.

You want an array type that is big enough for each element to hold the
maximum number of instances of any one character you expect to see in
the input. Since you are using an int for the total, int must be good
enough:

int[] counts = new int[Character.MAX_VALUE+1];

Each character has its very own entry. For example, decimal 65
corresponds to 'A', so if you see an 'A' in the input, counts[65] would
increment by one. Use element 65 for 'A' regardless of what has happened
before.

The only time you need to iterate through the array is at the end, to
report the non-zero counts.

for(int i = 0; i<counts.length; i++){
if(counts > 0){
char ch = (char)i;
System.out.println("character "+ch+" count "+counts);
}
}

Patricia
 
R

Roedy Green

I thought it would be a good idea to save the characters in a hashmap in
name-value pairs as they are read , map.put(tempStr,"1" )

You would use HashMap<String,Integer> You have to keep creating new
Integer objects, one bigger. It is rather clumsy and slow, though
probably quite adequate to the task.

Chances are your file contains some limited set of chars, likely only
chars 0..255. So instead you could use a int[256] to store the
counts. You index by character. You simply use the ++ operator. It
is quite a bit simpler. In the worst case you need an array [65535] if
you have no control over the chars.
 
R

Roedy Green

Yours seems the simplest way to go forward with this but do I have to
iterate through the array each time I read in a character ?
You index. Most people don't know you can index by chars
e.g. int x = count[ 'A' ]; is legit java. The char gets promoted to
the corresponding Unicode int.
 
C

cyprian

Yours seems the simplest way to go forward with this but do I have to
iterate through the array each time I read in a character ?

You index. Most people don't know you can index by chars
e.g. int x = count[ 'A' ]; is legit java. The char gets promoted to
the corresponding Unicode int.

to do a character count on a text file, try reading it in through a
stream, buffer the stream and do read() on the buffered stream. It
just returns the number of characters read, unicode, code point
insensitive.then try doing your map thing on it. I was counting some
words myself recently. http://genericjava.blogspot.com/2007/08/can-i-count-ways-let-me.htm,
on the other hand you could do readLine() on the buffered stream and
insert the result into a string buffer and play with the string buffer
directly. Try doing a regexp construct if possible. Use the string
buffer as framework for mapping characters to your map and counting
them char by char and making the count the value for each character
key.
 
R

Roedy Green

to do a character count on a text file,

And if all you want is a count of chars in the file, use file.length.
It will give you the byte count without reading even a single byte,
which is the same as the char count for most files, and quite a
reasonable measure of "bigness".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top