Character Parser

Discussion in 'Java' started by Katie, Feb 15, 2007.

  1. Katie

    Katie Guest

    Hi,

    I want to create a character parser in java. I basically want to parse
    a text file removing extra spaces and carriage returns. Ive used
    stream tokenizers before, but what if i want the token to be every
    character rather than a delimiter.

    Thanks for your time and help
    :)
     
    Katie, Feb 15, 2007
    #1
    1. Advertising

  2. Katie

    Daniel Pitts Guest

    On Feb 15, 1:37 pm, "Katie" <> wrote:
    > Hi,
    >
    > I want to create a character parser in java. I basically want to parse
    > a text file removing extra spaces and carriage returns. Ive used
    > stream tokenizers before, but what if i want the token to be every
    > character rather than a delimiter.
    >
    > Thanks for your time and help
    > :)


    In that case, you don't want tokenizing.
    You don't even want parsing!
    You want to read the data one character at a time.

    <http://java.sun.com/j2se/1.5.0/docs/api/java/io/Reader.html>

    Look at the method called read(char[])
     
    Daniel Pitts, Feb 16, 2007
    #2
    1. Advertising

  3. Katie

    Guest

    For best performance, you may want to use a java.nio.ByteBuffer. I've
    had to read in a 2GB file and using a a BufferedInputStream and a
    ByteBuffer was the only viable solution. Other APIs could not handle
    such a large file.

    If your file is small(using a BufferedInputStream/ByteBuffer would not
    offer significant gains) and simplicity outweighs performance, then
    you can always use one of the replace methods in the String class.


    On Feb 15, 1:37 pm, "Katie" <> wrote:
    > Hi,
    >
    > I want to create a character parser in java. I basically want to parse
    > a text file removing extra spaces and carriage returns. Ive used
    > stream tokenizers before, but what if i want the token to be every
    > character rather than a delimiter.
    >
    > Thanks for your time and help
    > :)
     
    , Feb 17, 2007
    #3
  4. Katie

    Alex Hunsley Guest

    Daniel Pitts wrote:
    > On Feb 15, 1:37 pm, "Katie" <> wrote:
    >> Hi,
    >>
    >> I want to create a character parser in java. I basically want to parse
    >> a text file removing extra spaces and carriage returns. Ive used
    >> stream tokenizers before, but what if i want the token to be every
    >> character rather than a delimiter.
    >>
    >> Thanks for your time and help
    >> :)

    >
    > In that case, you don't want tokenizing.
    > You don't even want parsing!
    > You want to read the data one character at a time.
    >
    > <http://java.sun.com/j2se/1.5.0/docs/api/java/io/Reader.html>
    >
    > Look at the method called read(char[])


    For efficiency, I suggest using BufferedReader, which is the same deal
    (but it buffers chunks of data behind the scenes - less disk accesses,
    so faster!)

    lex
     
    Alex Hunsley, Feb 18, 2007
    #4
  5. Katie

    Alex Hunsley Guest

    wrote:
    > For best performance, you may want to use a java.nio.ByteBuffer. I've
    > had to read in a 2GB file and using a a BufferedInputStream and a
    > ByteBuffer was the only viable solution. Other APIs could not handle
    > such a large file.


    Which other APIs do you mean?
    Shouldn't the OP should be using a Reader or BufferedReader (designed
    for char data) rather than something that reads bytes?
    The end effect may be the same, of course...

    lex
     
    Alex Hunsley, Feb 18, 2007
    #5
  6. Katie

    Guest

    On Feb 18, 12:20 pm, Alex Hunsley <> wrote:
    > wrote:
    > > For best performance, you may want to use a java.nio.ByteBuffer. I've
    > > had to read in a 2GB file and using a a BufferedInputStream and a
    > > ByteBuffer was the only viable solution. Other APIs could not handle
    > > such a large file.

    >
    > Which other APIs do you mean?
    > Shouldn't the OP should be using a Reader or BufferedReader (designed
    > for char data) rather than something that reads bytes?
    > The end effect may be the same, of course...
    >
    > lex


    i had a similar task to do some time ago
    i needed to compare lexographcily two enormous files simultaneously.
    you can use a CharArrayReader to read an char[] (you might wanna make
    an additional method for reading a complete line instead of a portion
    of the text)
    now you can break all the text file to chars
    if you do need buffering i recommend you learn the sourcecode behind
    BufferedReader and make your own Reader class that can return a char[]
    (i couldnt find one in jse api ... i havnt invested alot of time on
    it)

    for holding your already parsed text you can crate a StringBuffer and
    simply by iterating the char[] you decide if you want to append the
    givan char to the StringBuffer or not

    http://java.sun.com/j2se/1.5.0/docs/api/java/io/CharArrayReader.html

    http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuffer.html

    a more faster mutable sequence of characters for non-sync tasks (just
    like StringBuffer but faster)
    http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html

    and maybe you can find some thing here
    http://java.sun.com/docs/books/tutorial/essential/io/scanning.html
     
    , Feb 18, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bernd Oninger
    Replies:
    0
    Views:
    790
    Bernd Oninger
    Jun 9, 2004
  2. ZOCOR

    XML Parser VS HTML Parser

    ZOCOR, Oct 3, 2004, in forum: Java
    Replies:
    11
    Views:
    845
    Paul King
    Oct 5, 2004
  3. Bernd Oninger
    Replies:
    0
    Views:
    842
    Bernd Oninger
    Jun 9, 2004
  4. Joel Hedlund
    Replies:
    2
    Views:
    554
    Joel Hedlund
    Nov 11, 2006
  5. Joel Hedlund
    Replies:
    0
    Views:
    326
    Joel Hedlund
    Nov 11, 2006
Loading...

Share This Page