Parse a text file with quoted delimiters?

F

flarosa

Hi,

Is there an easy way to parse a line of text which may contain quoted
instances of the delimiting character?

For example,
1200,Bob's Ties,400 Atwood Avenue
1201,"Mary, Jane and Associates",250 Washington St.

In the 2nd line, I'd want the whole string "Mary, Jane and Associates"
to parse as one token.

The two simple ways I know of parsing text in java - StringTokenizer
and String.split() - would end up parsing the 2nd line into four tokens
instead of 3.

Thanks,
Frank
 
B

Ben

flarosa said:
Hi,

Is there an easy way to parse a line of text which may contain quoted
instances of the delimiting character?

For example,
1200,Bob's Ties,400 Atwood Avenue
1201,"Mary, Jane and Associates",250 Washington St.

In the 2nd line, I'd want the whole string "Mary, Jane and Associates"
to parse as one token.

The two simple ways I know of parsing text in java - StringTokenizer
and String.split() - would end up parsing the 2nd line into four tokens
instead of 3.

Thanks,
Frank

Look at the regex API it has everything you need.
 
T

Thomas Fritsch

flarosa said:
Is there an easy way to parse a line of text which may contain quoted
instances of the delimiting character?

For example,
1200,Bob's Ties,400 Atwood Avenue
1201,"Mary, Jane and Associates",250 Washington St.

In the 2nd line, I'd want the whole string "Mary, Jane and Associates"
to parse as one token.

The two simple ways I know of parsing text in java - StringTokenizer
and String.split() - would end up parsing the 2nd line into four tokens
instead of 3.
See <http://java.sun.com/j2se/1.4.2/docs/api/java/io/StreamTokenizer.html>.
With that you can do things like:

String line = ...;
StreamTokenizer tok = new StreamTokenizer(new StringReader(line));
tok.resetSyntax();
tok.wordChars('\u0000','\uFFFF');
tok.whitespaceChars(',', ',');
tok.quoteChar('\"');
while (tok.nextToken() != StreamTokenizer.TT_EOF) {
String word = tok.sval;
System.out.println(word);
}
 
O

Oliver Wong

flarosa said:
Hi,

Is there an easy way to parse a line of text which may contain quoted
instances of the delimiting character?

For example,
1200,Bob's Ties,400 Atwood Avenue
1201,"Mary, Jane and Associates",250 Washington St.

In the 2nd line, I'd want the whole string "Mary, Jane and Associates"
to parse as one token.

The two simple ways I know of parsing text in java - StringTokenizer
and String.split() - would end up parsing the 2nd line into four tokens
instead of 3.

It looks like you're dealing with CSV (Comma Seperated Value) files. If
so, you might want to look for a CSV parsing library rather than reinventing
the wheel by trying to implement your own version.

- Oliver
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top