How to convert CSV row to Java object?

L

laredotornado

Hi,

I'm using Java 1.6. Does anyone know of any library or other Java
classes out there that would take a CSV row and convert them into a
Java object that I've created?

Thanks for any advice, - Dave
 
L

Leonardo Azpurua

Tom Anderson said:

Perhaps you might help me.

Since all the buzz with XML started several years ago, I've been scratching
my head trying to understand what actual advantages it might bring when used
to process files whose data structures are known and agreed upon in advance.
So far, with the exception of being able to boast that your application is
buzzword-compliant, I have found none. On the other hand, I have found
several "cons": depending on a library that is probably less efficient than
a traditional line oriented simple parser, overloading your project with
dependencies plus you have to learn a new formalization for which not many
good input editors exist (so, you must write code to turn source input into
XML, which is alsao more complex than just concatenating and expanding
strings).

And in this particular case, what is the point in having to learn XML to
write a barely readable definition instead of using Java (or any language),
which you already know and that may achieve the same results with probably
the same writing (and probably less thinking) effort?

I mean, if you have a CSV file, you may just read the lines, split them,
convert the data items that need to be converted and store the individual
values in an object.

Isn't all that XMLing a sort of overkill?

Thanks.
 
L

Lew

Your sarcasm notwithstanding, the sourceforge [sic] Java CSV tool does not

He was not being sarcastic.
read a CSV row into an object -- it only parses the file, but will not
convert each row into an object. -

And what sort of object do you suggest it create, hm?

And when you say "it only parses the file", into what does it parse the file,
pray tell?

I note that markspace's link points to many places, not just "the" SourceForge
tool.

Looking at OpenCSV, one of the many sites to which markspace helpfully linked,
I see that it does indeed parse CSV input into an object, and the natural
representation of a CSV at that.

You should look at the information about CSV files on mindprod.com. One think
Roedy makes clear is that CSV is not simple or precise or straightforward.
 
L

Lew

Leonardo said:
Perhaps you might help me.

Since all the buzz with XML started several years ago, I've been scratching
my head trying to understand what actual advantages it might bring when used
to process files whose data structures are known and agreed upon in advance.
So far, with the exception of being able to boast that your application is
buzzword-compliant, I have found none. On the other hand, I have found
several "cons": depending on a library that is probably less efficient than
a traditional line oriented simple parser, overloading your project with
dependencies plus you have to learn a new formalization for which not many
good input editors exist (so, you must write code to turn source input into
XML, which is alsao more complex than just concatenating and expanding
strings).

The advantages of XML are that it provides semantically void, human-readable,
precise and straightforward representation of structured information. Its
associated formalisms provide a ready-made panoply of ways to express
specification and transformation rules. It is less susceptible to alignment
error and such.
And in this particular case, what is the point in having to learn XML to
write a barely readable definition instead of using Java (or any language),
which you already know and that may achieve the same results with probably
the same writing (and probably less thinking) effort?

I think that would be the wrong reason to learn XML. I would suggest rather
learning XML to write eminently readable and unambiguous contracts and
embodiments. Certainly context and content are more readable and more readily
associated in an XML document than a raw CSV file.

I never endorse doing less thinking.
I mean, if you have a CSV file, you may just read the lines, split them,
convert the data items that need to be converted and store the individual
values in an object.

Isn't all that XMLing a sort of overkill?

Depends, but often not. I mean, if you have an XML file, all you have to do
is just plop on one of the many standard frameworks onto them, let it convert
the items for you that need to be converted and store for you individual
values pretty much anywhere you want. It also lets you separate concerns
beautifully between, say, parsing and processing of content.
 
S

Stefan Ram

Leonardo Azpurua said:
I mean, if you have a CSV file, you may just read the lines, split them,
convert the data items that need to be converted and store the individual
values in an object.

I can give the specification of XML:

http://www.w3.org/TR/xml11/

So, if anyone now would give me the specification of CSV,
we can go on and compare the two.

Otherwise, one can state than an advantage of XML is that
it is specified.
 
M

markspace

Your sarcasm notwithstanding, the sourceforge Java CSV tool does not
read a CSV row into an object -- it only parses the file, but will not
convert each row into an object. -


Sarcasm or no, I feel you've been remiss not to detail the sort of
object you require. There's a rather broad range of possibilities for
constructing Java objects, really too many possibilities for us to guess
what it is you want.

Out of that list I gave you, do you see anything that looks like it
might be useful, even partially? What about such libraries,
specifically, do you find wanting?
 
J

Jim Janney

Leonardo Azpurua said:
Perhaps you might help me.

Since all the buzz with XML started several years ago, I've been scratching
my head trying to understand what actual advantages it might bring when used
to process files whose data structures are known and agreed upon in advance.
So far, with the exception of being able to boast that your application is
buzzword-compliant, I have found none. On the other hand, I have found
several "cons": depending on a library that is probably less efficient than
a traditional line oriented simple parser, overloading your project with
dependencies plus you have to learn a new formalization for which not many
good input editors exist (so, you must write code to turn source input into
XML, which is alsao more complex than just concatenating and expanding
strings).

And in this particular case, what is the point in having to learn XML to
write a barely readable definition instead of using Java (or any language),
which you already know and that may achieve the same results with probably
the same writing (and probably less thinking) effort?

I mean, if you have a CSV file, you may just read the lines, split them,
convert the data items that need to be converted and store the individual
values in an object.

Isn't all that XMLing a sort of overkill?

Thanks.

I see two main advantages to XML. First, it allows you to structure
data hierarchically, unlike CSV or properties files which are flat.
Second, there's a huge number of freely available Java libraries to
handle parsing and processing XML. Write a schema and load it into a
schema-aware editor and even editing isn't that bad.

Yes, it's big and bloated and ugly and if I had my 'druthers I'd be
using S-expressions instead, but storage is cheap and processors are
fast and for me the advantages outweigh my personal preferences.
 
T

Tom Anderson

I can give the specification of XML:

http://www.w3.org/TR/xml11/

So, if anyone now would give me the specification of CSV,
we can go on and compare the two.

http://www.ietf.org/rfc/rfc4180.txt

I don't know how good compliance to this spec is, to put it mildly.

But then, half the XML out there today uses namespaces, and there is no
coherent spec for namespaces. Yes, there is a spec, but you can't actually
use it in practice without violating the main XML spec.
Otherwise, one can state than an advantage of XML is that
it is specified.

Except that nobody ever gives you an XML file. They give you a file in
some particular application of XML - XHTML or DocBook or WSDL or whatever.
You need a separate spec for that to be able to do anything useful with
it. When someone gives you a CSV, you're in much the same situation.

tom
 
M

Mike Schilling

Tom Anderson said:
http://www.ietf.org/rfc/rfc4180.txt

I don't know how good compliance to this spec is, to put it mildly.

But then, half the XML out there today uses namespaces, and there is no
coherent spec for namespaces. Yes, there is a spec, but you can't actually
use it in practice without violating the main XML spec.

Huh?
 
A

Arne Vajhøj

http://www.ietf.org/rfc/rfc4180.txt

I don't know how good compliance to this spec is, to put it mildly.

But then, half the XML out there today uses namespaces, and there is no
coherent spec for namespaces. Yes, there is a spec, but you can't
actually use it in practice without violating the main XML spec.

People seems to be using name spaces in XML accepted by all
XML parsers all over the world.
Except that nobody ever gives you an XML file. They give you a file in
some particular application of XML - XHTML or DocBook or WSDL or
whatever. You need a separate spec for that to be able to do anything
useful with it. When someone gives you a CSV, you're in much the same
situation.

Not a very accurate description.

The rules for XML apply to all XML formats.

And there are standards for describing the XML formats.

Arne
 
A

Arne Vajhøj

Since all the buzz with XML started several years ago, I've been scratching
my head trying to understand what actual advantages it might bring when used
to process files whose data structures are known and agreed upon in advance.
So far, with the exception of being able to boast that your application is
buzzword-compliant, I have found none. On the other hand, I have found
several "cons": depending on a library that is probably less efficient than
a traditional line oriented simple parser, overloading your project with
dependencies plus you have to learn a new formalization for which not many
good input editors exist (so, you must write code to turn source input into
XML, which is alsao more complex than just concatenating and expanding
strings).

And in this particular case, what is the point in having to learn XML to
write a barely readable definition instead of using Java (or any language),
which you already know and that may achieve the same results with probably
the same writing (and probably less thinking) effort?

I mean, if you have a CSV file, you may just read the lines, split them,
convert the data items that need to be converted and store the individual
values in an object.

Isn't all that XMLing a sort of overkill?

If you:
- only use "rectangular" data
- don't believe in documentation
- don't believe in type safeness
then I can not see any reason why not just use CSV instead of XML.

But a lot of people have needs for data with more advanced
structures than rows x columns, like the ability to document
the format in schema/DTD and the ability to check both
format and data values against the definition (checking
data values requires schema).

Arne
 
M

Mike Schilling

Arne Vajhøj said:
On 28-08-2010 06:12, Tom Anderson wrote:

Not a very accurate description.

The rules for XML apply to all XML formats.

And there are standards for describing the XML formats.

That is, XML provides precise rules for meta-syntax, escaping, whitespace,
line-termination, and character encodings, while CSV does not.
 
T

Tom Anderson


From section 5 of the namespace spec [1]:

Note that DTD-based validation is not namespace-aware in the following
sense: a DTD constrains the elements and attributes that may appear in a
document by their uninterpreted names, not by (namespace name, local
name) pairs. To validate a document that uses namespaces against a DTD,
the same prefixes must be used in the DTD as in the instance.

Basically, DTD doesn't know about namespaces. It works against the names
as written in the file, not the way they're interpreted via the namespace
mechanism. That means that this document:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml">
<xhtml:head>
<xhtml:title></xhtml:title>
</xhtml:head>
<xhtml:body>
</xhtml:body>
</xhtml:html>

Is not valid, because of all those "xhtml:" strings.

Basically, if you have a DTD which doesn't hardcode a namespace prefix
(and it would be a bit self-defeating to), you can't use that document
type with a namespace prefix. You can use it with the xmlns= prefixless
declaration, but there's no way to mix and match elements from two
namespaces while staying valid.

I guess that in fact (a) most people use a single namespace for a whole
document, often with a prefixless namespace and (b) virtually nobody uses
DTDs, so this is not as bad as it seems (to me). But i still boggle at the
fact that the namespace guys managed to write a spec that fails so
completely to interoperate with the spec it's built on top of.

tom

[1] http://www.w3.org/TR/REC-xml-names/#ns-using
 
L

Leonardo Azpurua

Arne Vajhøj said:
If you:
- only use "rectangular" data
- don't believe in documentation
- don't believe in type safeness
then I can not see any reason why not just use CSV instead of XML.

But a lot of people have needs for data with more advanced
structures than rows x columns, like the ability to document
the format in schema/DTD and the ability to check both
format and data values against the definition (checking
data values requires schema).

Hi,

I can't see the need for such a radical dismissal.

I can concede that XML is very useful for many purposes, most of them
related to the non-rectangularity of data.

If I were to publish complex datasets that would be consumed by several
applications, many of which I wouldn't ever just know about, I would
probably use XML.

But it is not the case for roughly 95% of current usages of XML. Most data
trasfers are Point to Point, and mosty of them are based on a predefined
schema shared by both parties. And most of them are rectangular. So, 95% of
XML usages are overkill due to buzz.

I believe in documentation. I just don't believe that every single piece of
data needs to be documented.

And I believe in type safeness, but I don't see what real advantage XML
provides to type safeness when compared with a sound method for parsing
delimited text files.

Regards!
 
M

Mike Schilling

Tom Anderson said:

From section 5 of the namespace spec [1]:

Note that DTD-based validation is not namespace-aware in the following
sense: a DTD constrains the elements and attributes that may appear in a
document by their uninterpreted names, not by (namespace name, local
name) pairs. To validate a document that uses namespaces against a DTD,
the same prefixes must be used in the DTD as in the instance.

Basically, DTD doesn't know about namespaces. It works against the names
as written in the file, not the way they're interpreted via the namespace
mechanism. That means that this document:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml">
<xhtml:head>
<xhtml:title></xhtml:title>
</xhtml:head>
<xhtml:body>
</xhtml:body>
</xhtml:html>

Is not valid, because of all those "xhtml:" strings.

Basically, if you have a DTD which doesn't hardcode a namespace prefix
(and it would be a bit self-defeating to), you can't use that document
type with a namespace prefix. You can use it with the xmlns= prefixless
declaration, but there's no way to mix and match elements from two
namespaces while staying valid.

I guess that in fact (a) most people use a single namespace for a whole
document, often with a prefixless namespace

This often isn't possible, e.g. with a WSDL that includes XMLSchema
definitions, or any SOAP document, since they contain both SOAP and
user-defined elements.
and (b) virtually nobody uses DTDs, so this is not as bad as it seems (to
me).

That's the big one. DTDs were a botch; otherwise, XML would have been rev'ed
to include namespace-aware DTDs.
But i still boggle at the fact that the namespace guys managed to write a
spec that fails so completely to interoperate with the spec it's built on
top of.

But "fails to interoperate with an optional, obsolescent feature" isn't the
same as "can't actually use it in practice without violating the main XML
spec.".
 
J

John B. Matthews

"Leonardo Azpurua said:
I can't see the need for such a radical dismissal.

I can concede that XML is very useful for many purposes, most of them
related to the non-rectangularity of data.

If I were to publish complex datasets that would be consumed by
several applications, many of which I wouldn't ever just know about,
I would probably use XML.

But it is not the case for roughly 95% of current usages of XML. Most
data trasfers are Point to Point, and mosty of them are based on a
predefined schema shared by both parties. And most of them are
rectangular. So, 95% of XML usages are overkill due to buzz.

I believe in documentation. I just don't believe that every single
piece of data needs to be documented.

And I believe in type safeness, but I don't see what real advantage
XML provides to type safeness when compared with a sound method for
parsing delimited text files.

I confess I don't _always_ validate, but it's one more line of defense
against junk sneaking into my database; and I can rely on someone else's
"sound method for parsing."

<http://onjava.com/pub/a/onjava/2004/09/15/schema-validation.html>

Yes, I use CSV, too. :)

<http://www.h2database.com/javadoc/org/h2/tools/Csv.html>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top