Character Parser

K

Katie

Hi,

I want to create a character parser in java. I basically want to parse
a text file removing extra spaces and carriage returns. Ive used
stream tokenizers before, but what if i want the token to be every
character rather than a delimiter.

Thanks for your time and help
:)
 
D

Daniel Pitts

Hi,

I want to create a character parser in java. I basically want to parse
a text file removing extra spaces and carriage returns. Ive used
stream tokenizers before, but what if i want the token to be every
character rather than a delimiter.

Thanks for your time and help
:)

In that case, you don't want tokenizing.
You don't even want parsing!
You want to read the data one character at a time.

<http://java.sun.com/j2se/1.5.0/docs/api/java/io/Reader.html>

Look at the method called read(char[])
 
R

richliu2005

For best performance, you may want to use a java.nio.ByteBuffer. I've
had to read in a 2GB file and using a a BufferedInputStream and a
ByteBuffer was the only viable solution. Other APIs could not handle
such a large file.

If your file is small(using a BufferedInputStream/ByteBuffer would not
offer significant gains) and simplicity outweighs performance, then
you can always use one of the replace methods in the String class.
 
A

Alex Hunsley

Daniel said:
Hi,

I want to create a character parser in java. I basically want to parse
a text file removing extra spaces and carriage returns. Ive used
stream tokenizers before, but what if i want the token to be every
character rather than a delimiter.

Thanks for your time and help
:)

In that case, you don't want tokenizing.
You don't even want parsing!
You want to read the data one character at a time.

<http://java.sun.com/j2se/1.5.0/docs/api/java/io/Reader.html>

Look at the method called read(char[])

For efficiency, I suggest using BufferedReader, which is the same deal
(but it buffers chunks of data behind the scenes - less disk accesses,
so faster!)

lex
 
A

Alex Hunsley

For best performance, you may want to use a java.nio.ByteBuffer. I've
had to read in a 2GB file and using a a BufferedInputStream and a
ByteBuffer was the only viable solution. Other APIs could not handle
such a large file.

Which other APIs do you mean?
Shouldn't the OP should be using a Reader or BufferedReader (designed
for char data) rather than something that reads bytes?
The end effect may be the same, of course...

lex
 
B

Boaz.Jan

Which other APIs do you mean?
Shouldn't the OP should be using a Reader or BufferedReader (designed
for char data) rather than something that reads bytes?
The end effect may be the same, of course...

lex

i had a similar task to do some time ago
i needed to compare lexographcily two enormous files simultaneously.
you can use a CharArrayReader to read an char[] (you might wanna make
an additional method for reading a complete line instead of a portion
of the text)
now you can break all the text file to chars
if you do need buffering i recommend you learn the sourcecode behind
BufferedReader and make your own Reader class that can return a char[]
(i couldnt find one in jse api ... i havnt invested alot of time on
it)

for holding your already parsed text you can crate a StringBuffer and
simply by iterating the char[] you decide if you want to append the
givan char to the StringBuffer or not

http://java.sun.com/j2se/1.5.0/docs/api/java/io/CharArrayReader.html

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuffer.html

a more faster mutable sequence of characters for non-sync tasks (just
like StringBuffer but faster)
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html

and maybe you can find some thing here
http://java.sun.com/docs/books/tutorial/essential/io/scanning.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top