java.util.Scanner and EOF/EOT issues

S

sasuke

Hello all.

When developing console based applications for helping beginners learn
Java, I have noticed a peculiar problem with the Scanner class, the
utility class which was introduced in Java 5 to ease the activity of
reading from input streams and processing character streams. Its
behaves in a strange manner when presented with a EOF token[CTRL + Z]
or EOT token [CTRL + C] to terminate the program. When presented with
EOF, the program goes into an infinite loop and when presented with
EOT, the program prints out the prompt a good many times before the VM
process is actually terminated. This scenario doesn't appear when a
BufferedReader is used for processing the user input.

To prove this point, I present a sample code which asks the user for
some input and tells him whether his input can be successfully parsed
to a number or not. My development environment is Java 5 release 12 on
a Windows XP SP2 box.

<code>
import java.util.*;
import java.io.*;

public class GGScanTest {

// Pass an argument to start the Scanner Test and run without
// any arguments to run the BufferedReader test
public static void main(final String[] args) {
if(args.length > 0) {
System.out.println("Starting java.util.Scanner test");
new GGScanTest().startTest();
} else {
System.out.println("Starting java.io.BufferedReader test");
new GGScanTest().startBufTest();
}
}

public void startTest() {
Scanner in = new Scanner(System.in);
int i = 0;
while(true) {
System.out.print("Enter something: ");
if(in.hasNext()) {
if(in.hasNextInt()) {
i = in.nextInt();
System.out.println("You entered a number.");
} else {
in.next();
System.out.println("You didn't enter a number.");
}
}
}
}

public void startBufTest() {
BufferedReader in = new BufferedReader(new InputStreamReader
(System.in));
int i = 0;
String str = null;
while(true) {
System.out.print("Enter something: ");
try {
str = in.readLine();
if(str == null) System.exit(0);
i = Integer.parseInt(str);
System.out.println("You entered a number.");
} catch(Exception e) {
System.out.println("You didn't enter a number.");
}
}
}

}
</code>

My guess is that when presented with a EOF token or CTRL + C something
messes with the internal state of the Scanner causing it to go
haywire. I couldn't find anything in the Scanner API which could
*reset* this corrupt state or make it EOF/EOT aware. Is this a known
issue with a known workaround/solution?

Comments, explanations most appreciated.

../sasuke
 
E

Eric Sosman

sasuke said:
Hello all.

When developing console based applications for helping beginners learn
Java, I have noticed a peculiar problem with the Scanner class, the
utility class which was introduced in Java 5 to ease the activity of
reading from input streams and processing character streams. Its
behaves in a strange manner when presented with a EOF token[CTRL + Z]
or EOT token [CTRL + C] to terminate the program. When presented with
EOF, the program goes into an infinite loop and when presented with
EOT, the program prints out the prompt a good many times before the VM
process is actually terminated. This scenario doesn't appear when a
BufferedReader is used for processing the user input.
[... code snipped; see up-thread ...]

The infinite loop isn't in Scanner, but in your own code.
Your startTest() method contains a `while(true)' loop that has
nothing to terminate it: when the Scanner's hasNext() method
returns false, the loop just keeps on iterating. When you
type ^Z the Scanner gets end-of-file and reports it, but your
code just keeps on checking hasNext(). When you type ^C a
similar thing happens, but the JVM is in the process of shutting
down and will eventually exit and take your loop with it.

Try changing your code so that it does something different
when hasNext() returns false instead of true.
 
S

sasuke

Eric said:
The infinite loop isn't in Scanner, but in your own code.
Your startTest() method contains a `while(true)' loop that has
nothing to terminate it: when the Scanner's hasNext() method
returns false, the loop just keeps on iterating. When you
type ^Z the Scanner gets end-of-file and reports it, but your
code just keeps on checking hasNext(). When you type ^C a
similar thing happens, but the JVM is in the process of shutting
down and will eventually exit and take your loop with it.

Try changing your code so that it does something different
when hasNext() returns false instead of true.

Indeed, thanks a lot for the reply. I was under the assumption that no
matter what is thrown at the STDIN, it will be treated as a token i.e.
pressing CTRL + Z would generate a true for hasNext() and invoking next
() would skip that but it seems I was wrong.

Another problem I stumbled across when using a Scanner. It seems as
though pressing multiple returns [ENTER] has no effect when using a
Scanner for accepting user input via STDIN but when using
BufferedReader, a single empty return without entering anything is
equivalent to a blank string being entered. My guess is that since the
Scanner is based on the concept of delimiters, pressing multiple
returns has the effect of hasNext() returning false until something
other than a return is entered by the user. Can this behavior be
altered to make scanner behave the same way as BufferedReader?

I tried setting the Scanner delimiter to a blank string but it results
in the following program flow:

Delimiter: **
Enter something:
You didn't enter a number.
Enter something: You didn't enter a number. <- doesn't wait for input
Enter something:
You didn't enter a number.
Enter something: You didn't enter a number.
Enter something: s
You didn't enter a number.
Enter something: You didn't enter a number.
Enter something: You didn't enter a number.
Enter something:

Here is the modified code:

<code>
public class GGScanTest {

// Pass an argument to start the Scanner Test and run without
// any arguments to run the BufferedReader test
public static void main(final String[] args) {
if(args.length > 0) {
System.out.println("Starting java.util.Scanner test");
new GGScanTest().startTest();
} else {
System.out.println("Starting java.io.BufferedReader test");
new GGScanTest().startBufTest();
}
}

public void startTest() {
Scanner in = new Scanner(System.in);
in.useDelimiter(Pattern.compile(""));
System.out.println("Delimiter: *" + in.delimiter() + "*");
int i = 0;
while(true) {
System.out.print("Enter something: ");
if(in.hasNext()) {
if(in.hasNextInt()) {
i = in.nextInt();
System.out.println("You entered a number.");
} else {
in.next();
System.out.println("You didn't enter a number.");
}
} else {
System.out.println("Nothing to consume, exit.");
System.exit(0);
}
}
}

public void startBufTest() {
BufferedReader in = new BufferedReader(new InputStreamReader
(System.in));
int i = 0;
String str = null;
while(true) {
System.out.print("Enter something: ");
try {
str = in.readLine();
if(str == null) System.exit(0);
i = Integer.parseInt(str);
System.out.println("You entered a number.");
} catch(Exception e) {
System.out.println("You didn't enter a number.");
}
}
}

}
</code>
 
S

sasuke

 sasuke said:
Eric Sosman wrote:
Indeed, thanks a lot for the reply. I was under the assumption that no
matter what is thrown at the STDIN, it will be treated as a token i.e.
pressing CTRL + Z would generate a true for hasNext() and invoking next
() would skip that but it seems I was wrong.
Another problem I stumbled across when using a Scanner. It seems as
though pressing multiple returns [ENTER] has no effect when using a
Scanner for accepting user input via STDIN but when using
BufferedReader, a single empty return without entering anything is
equivalent to a blank string being entered. My guess is that since the
Scanner is based on the concept of delimiters, pressing multiple
returns has the effect of hasNext() returning false until something
other than a return is entered by the user.

[...]

I see "The default whitespace delimiter used by a scanner is as
recognized by Character.isWhitespace," which includes line delimiters.
You might try to specify the DOTALL flag in your pattern:

<http://java.sun.com/javase/6/docs/api/java/util/Scanner.html>

Hello John,

That's what I initially thought but `DOTALL' flag still causes the
same behavior.

Delimiter: *.*
Enter something:
You didn't enter a number.
Enter something:
You didn't enter a number.
Enter something: You didn't enter a number.
Enter something:
You didn't enter a number.
Enter something: You didn't enter a number.
Enter something:
You didn't enter a number.
Enter something: You didn't enter a number.
Enter something:
You didn't enter a number.
Enter something: You didn't enter a number.
Enter something: s
You didn't enter a number.
Enter something: You didn't enter a number.
Enter something: You didn't enter a number.
Enter something: ^Z
Nothing to consume, exit.

../sasuke
 
A

Andreas Leitgeb

sasuke said:
Delimiter: *.*
Enter something:
You didn't enter a number.
Enter something:
You didn't enter a number.
Enter something: You didn't enter a number.
Enter something:
...

It might be helpful, if you also wrote what exactly you entered.

Anyway, "*.*" doesn't look like a valid regular expression to me.

Finally: One line of input may contain more than one token.
With the default delimiter, an input of "a b c" would be three
tokens, only one of which is consumed at each iteration. So, it
seems natural, that sometimes it doesn't really wait for input
after the "Enter something:" prompt.

Btw., the OS passes your input linewise to your application, so
by the time you typed "a ", the Scanner sees not even a character
of your input. Only after you hit the Return-key (after "a b c"),
it then sees these three tokens in a row.
 
S

sasuke

It might be helpful, if you also wrote what exactly you entered.

Anyway, "*.*" doesn't look like a valid regular expression to me.

Finally: One line of input may contain more than one token.
With the default delimiter, an input of "a b c" would be three
tokens, only one of which is consumed at each iteration. So, it
seems natural, that sometimes it doesn't really wait for input
after the "Enter something:" prompt.

Btw., the OS passes your input linewise to your application, so
by the time you typed "a ", the Scanner sees not even a character
of your input. Only after you hit the Return-key (after "a b c"),
it then sees these three tokens in a row.

Hello Andreas,

Apologies for not clearing things up properly.

For the above given program output, each time I press only the RETURN
key with no input whatsoever. The * in the *.* actually come from the
source code line:
System.out.println("Delimiter: *" + in.delimiter() + "*");
Thus the actual regexp used in this case is a DOT [any character].

And yes, I am aware of the fact that unless I press the RETURN key,
the Java program doesn't receive anything unless I have some sort of
getch()/kbhit() [C language] mechanism in place which listens to key
strokes.

The problem I am facing is that the BufferedReader properly responds
to a RETURN key by passing in a blank string to my program whereas
replicating the same functionality using Scanner has got me stumped.

Suggestions appreciated.

../sasuke
 
A

Andreas Leitgeb

sasuke said:
Apologies for not clearing things up properly.
Plus my own apologies for not having looked at your code before ...
For the above given program output, each time I press only the RETURN
key with no input whatsoever. [...]
Thus the actual regexp used in this case is a DOT [any character].

So, you gave it exactly the same input each time, and the first
few times it interpreted each line as a token and then it started
to see two tokens per line?

That's indeed strange.
If it were such, that it sees two token for every line, then it may just
be that it sees the "cr" separately from the "lf" (if you're on windows)
But that would have to happen equally for every line you enter...
Perhaps you inadvertedly typed Ctrl- or Shift- Enter mixed with
plain Enter? (just guessing into the blue - I don't even know if
that really would cause that effect.)

Anyway, using "." for the delimiter is *very* odd in itself. With
that you'd *never* get any non-empty token.

The Scanner always looks for next delimiter, and once it found it,
it deals with the token and its possible interpretations.

From your usage of hasNext() and hasNextInt() and your very odd
delimiter(), it looks to me as if you thought the Scanner would try
to scan a number on hasNextInt(), and leave the position right after
the last digit to be then consumed as next delimiter. But that isn't
so.

If input was "42*<Return>" and you were using the default delimiter, then
the next token would be "42*" and hasNextInt() would return false,
because "42*" just isn't a legal number.
If input was again "42*", but with your delimiter set to ".", then
"4" is detected as delimiter, and you get the empty string that
precedes it as a token. Then "2" is taken as next delimiter and
you get the empty string between 4 and 2 as next token. Ditto for
The problem I am facing is that the BufferedReader properly responds
to a RETURN key by passing in a blank string to my program whereas
replicating the same functionality using Scanner has got me stumped.
Suggestions appreciated.

Why each Return-press can produce varying numbers of tokens is also
beyond me. I haven't got any idea about that phenomenon.
But maybe I just still didn't read your session-dump correctly.
If it seems so to you, then post it again, and insert some marker
to where the cursor was each time you hit the Return key. Like:

Enter Line:<Return>
got token... <--- no return pressed here
Enter Line:<Return>
got token... <--- no return pressed here
Enter Line:got token... <--- no return pressed here
 
S

sasuke

Andreas said:
If it were such, that it sees two token for every line, then it may just
be that it sees the "cr" separately from the "lf" (if you're on windows)
But that would have to happen equally for every line you enter...
Perhaps you inadvertedly typed Ctrl- or Shift- Enter mixed with
plain Enter? (just guessing into the blue - I don't even know if
that really would cause that effect.)

Anyway, using "." for the delimiter is *very* odd in itself. With
that you'd *never* get any non-empty token.

The Scanner always looks for next delimiter, and once it found it,
it deals with the token and its possible interpretations.

From your usage of hasNext() and hasNextInt() and your very odd
delimiter(), it looks to me as if you thought the Scanner would try
to scan a number on hasNextInt(), and leave the position right after
the last digit to be then consumed as next delimiter. But that isn't
so.

Hello Andreas,

Yes, I do know that. To get around that case, I have an else block
accompanying the hasNextInt() check. For e.g. let's assume that the
hasNext() method returns true for the user input 42*, the hasNextInt()
returns false for the same and the control moves to the else block
declaring that 42* is not a valid integer.
If input was "42*<Return>" and you were using the default delimiter, then
the next token would be "42*" and hasNextInt() would return false,
because "42*" just isn't a legal number.
If input was again "42*", but with your delimiter set to ".", then
"4" is detected as delimiter, and you get the empty string that
precedes it as a token. Then "2" is taken as next delimiter and
you get the empty string between 4 and 2 as next token. Ditto for


Why each Return-press can produce varying numbers of tokens is also
beyond me. I haven't got any idea about that phenomenon.
But maybe I just still didn't read your session-dump correctly.
If it seems so to you, then post it again, and insert some marker
to where the cursor was each time you hit the Return key. Like:

Enter Line:<Return>
got token... <--- no return pressed here
Enter Line:<Return>
got token... <--- no return pressed here
Enter Line:got token... <--- no return pressed here

I am on a windows box [XP SP2]; the session dump when having DOT as
the delimiter is:

Starting java.util.Scanner test
Enter something:<RETURN>
You didn't enter a number <- no user interaction
Enter something:<RETURN>
You didn't enter a number <- no user interaction
Enter something: You didn't enter a number <- no user interaction
Enter something: a<RETURN>
You didn't enter a number <- no user interaction
Enter something: You didn't enter a number <- no user interaction
Enter something: You didn't enter a number <- no user interaction
Enter something: ab<RETURN>
You didn't enter a number <- no user interaction
Enter something: You didn't enter a number <- no user interaction
Enter something: You didn't enter a number <- no user interaction
Enter something: You didn't enter a number <- no user interaction
Enter something: <CURRENT-CURSOR-POS>

The session dump when using the default delimiter[any whitespace char]
is as follows:

Enter something:<RETURN>
<RETURN>
<RETURN>
<RETURN>
<RETURN>
<RETURN>
a<RETURN>
You didn't enter a number <- no user interaction
Enter something:<RETURN>
<RETURN>
c<RETURN>
You didn't enter a number <- no user interaction
Enter something: <CURRENT-CURSOR-POS>

I guess I am finally beginning to understand the limitations of a
"delimiter" based approach used by the Scanner class since even after
setting the delimiter to NEWLINE, the code doesn't work as expected
i.e. same as the BufferedReader class. Also, like you mentioned, the
difference in newline formats for three different operating systems
might be one of the reasons of confusion here.

Workarounds, comments, suggestions or thoughts most appreciated.

../sasuke
 
A

Andreas Leitgeb

sasuke said:
I am on a windows box [XP SP2]; the session dump when having DOT as
the delimiter is:
Starting java.util.Scanner test ***
Enter something:<RETURN>
You didn't enter a number <- no user interaction
Enter something:<RETURN>
You didn't enter a number <- no user interaction
Enter something: You didn't enter a number <- no user interaction
***

This is what really strikes me as odd.
First time you hit <Return> the result is different from second
time. Perhaps you typed other keys that you didn't think would
become input to the app, but do?

Along with the string "You didn't enter a number", also write
out the value of: java.util.Arrays.toString(next().getBytes())
(and remove the old separate call to next(), of course).
I guess I am finally beginning to understand the limitations of a
"delimiter" based approach used by the Scanner class since even after
setting the delimiter to NEWLINE,

The magic here is, whether you allow repeated newlines by the pattern,
or just one. If your delimiter was "\\n+" , then each search for
delimiters consumes as many newlines as it gets. This does not seem
to be what you're after.

If you use exactly "\\r?\\n|\\r" (no plusses or asterisks) as delimiter,
then the Scanner *should* work remarkably like BufferedReader's readLine();
If it still does not, I'd like to see the byte-dump of each token.
Workarounds, comments, suggestions or thoughts most appreciated.

If your goal is just reading whole lines, and you want Scanner, then
you can also have a look at Scanner.nextLine().
You can also create a Scanner on a String, so you could use a
BufferedReader to read a line, and then pass the line string to
a new Scanner, and ask that if it hasNextInt() ...
This is however much less flexible than a plain Scanner.

Most of the times it is very convenient that the Scanner may read
multiple lines to find a token. E.g. if you have the data in a
file and the file contains an empty line every block of ten data
lines.... The Scanner with default delimitor will just read over
the empty lines without any complaint and without returning
possibly unwanted empty tokens for empty lines.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top