How to convert CSV row to Java object?

Lew · Aug 29, 2010

Leonardo said:
But it is not the case for roughly 95% of current usages of XML. Most data
trasfers are Point to Point, and mosty of them are based on a predefined
schema shared by both parties. And most of them are rectangular. So, 95% of
XML usages are overkill due to buzz.

Bullcrap.

In order to be "overkill", XML would have to be worse than the alternative.
The only thing worse about XML over CSV is that it takes more bandwidth to
transmit. Bandwidth is cheaper than development time, maintenance time,
bugfix time, reliability problems, and upgradability, by all of which latter
measures XML is "underkill" compared to CSV and other formats.

markspace · Aug 29, 2010

I can't see the need for such a radical dismissal.

It's just the normal noises in here.

But it is not the case for roughly 95% of current usages of XML. Most data

97% of all statistics are made up.

Most of the data I work with is hierachical in nature. HTML/XHTML
certainly is. Most data in a database is relational and therefore can
be thought of as forming tree structures. Serialized objects are often
hierarchical. Config files frequently have sections with variable
levels of configuration per each item. I can't think of a natural
"rectangular" data format off the top of my head. It's all embedded in
some hierarchical data structure.

Luuk · Aug 29, 2010

Op 29-08-10 20:34, Lew schreef:

Bullcrap.

In order to be "overkill", XML would have to be worse than the
alternative. The only thing worse about XML over CSV is that it takes
more bandwidth to transmit. Bandwidth is cheaper than development time,
maintenance time, bugfix time, reliability problems, and upgradability,
by all of which latter measures XML is "underkill" compared to CSV and
other formats.

more Bullcrap....

the only thing that saves bandwith (and development time,
maintenance time, bugfix time, reliability problems, and upgradability
) is proper documentation.

A well documented CSV-file is always better than a not-documented other
solution....

Tom Anderson · Aug 29, 2010

Most data in a database is relational and therefore can be thought of as
forming tree structures.

I tend to think of relational and hierarchical structures as being rather
different. Are you saying this from the point of view that a table is
trivially a hierarchy, the whole being split into rows which are then
split into fields, or something else?

Serialized objects are often hierarchical. Config files frequently have
sections with variable levels of configuration per each item. I can't
think of a natural "rectangular" data format off the top of my head.
It's all embedded in some hierarchical data structure.

Any kind of data where you have a lot of items, each comprising the same
set of fields, or one of a small number of sets of fields, is tabular, and
a natural fit for CSV, as it is for a relational database. A set of users,
of customers, of products, of prices ... pretty much anywhere you'd use a
collection in code, you've got tabular data. If anything, i think a
tabular nature is more common than a tree nature.

tom

Tom Anderson · Aug 29, 2010

Tom Anderson said:
Tom Anderson said:

But then, half the XML out there today uses namespaces, and there is no
coherent spec for namespaces. Yes, there is a spec, but you can't
actually use it in practice without violating the main XML spec.

Huh?

Click to expand...

Basically, DTD doesn't know about namespaces. It works against the
names as written in the file, not the way they're interpreted via the
namespace mechanism. [...] virtually nobody uses DTDs, so this is not
as bad as it seems (to me).

Click to expand...

That's the big one. DTDs were a botch; otherwise, XML would have been
rev'ed to include namespace-aware DTDs.

But i still boggle at the fact that the namespace guys managed to write
a spec that fails so completely to interoperate with the spec it's
built on top of.

Click to expand...

But "fails to interoperate with an optional, obsolescent feature" isn't
the same as "can't actually use it in practice without violating the
main XML spec.".

True.

I personally don't accept that DTDs were a botch; they're limited, but i
don't see any inherent problems with them. I certainly don't see that the
profusion of different possible replacements that we suffered wth for
years after they went out of fashion was an improvement. Schema is better,
but I don't see how it fixes any fundamental flaws, or indeed any flaws
that couldn't have been fixed with an incremental improvement of DTD.

tom

Lew · Aug 29, 2010

Luuk said:
the only thing that saves bandwith (and development time,
maintenance time, bugfix time, reliability problems, and upgradability )
is proper documentation.

A well documented CSV-file is always better than a not-documented other
solution....

Very true. There are times when CSV is better, times when XML suits, and
times for some other choice, but all of them benefit from Luuk's point.

Leonardo Azpurua · Aug 30, 2010

Lew said:
Bullcrap.

In order to be "overkill", XML would have to be worse than the
alternative. The only thing worse about XML over CSV is that it takes more
bandwidth to transmit. Bandwidth is cheaper than development time,
maintenance time, bugfix time, reliability problems, and upgradability, by
all of which latter measures XML is "underkill" compared to CSV and other
formats.

Crap is crap.

Crappy XML will induce crap at runtime.

If you have reliable classes to produce/consume CSV flows, the debugging
effort, reliability and upgradability will be exactly the same that you
would achieve using a reliable XML parser/writer.

If you "upgrade" a data flow, it means that you also "upgrade" the code that
produces and the code that consumes that flow. The only requirement to
upgrade CSV is "add new items at the end of each row".

Most XML parsers that I have had to use are "sequential".

Crap uses to be a function of complexity. And XML is inherently more complex
that plain text.

XML is fat. It contains data, usually in human readable format, plus tons of
descriptive information and conformance widgets that have to be read, parsed
and processed along with the data. CSV contains data, usually in human
readable format, and nothing else.

XML consumes bandwidth and *lots* of processing time. And yes, both are
cheap. But the idle time or a user waiting for a process to finish is not
always cheap, and CSV is undisputably faster than anything else.

Regards!

Lew · Aug 30, 2010

Leonardo said:
Crap is crap.

Crappy XML will induce crap at runtime.

So will crappy CSV, or crappy anything else. What are you proving?

If you have reliable classes to produce/consume CSV flows, the debugging
effort, reliability and upgradability will be exactly the same that you
would achieve using a reliable XML parser/writer.

It's a lot easier to have reliable XML parsing - the spec is better
defined and the libraries are abundant, even built in to Java.

If you "upgrade" a data flow, it means that you also "upgrade" the code that
produces and the code that consumes that flow. The only requirement to
upgrade CSV is "add new items at the end of each row".

Baloney. There's a lot more than that involved in changing a CSV
format and you know it. Do try to avoid the blatantly nonsensical,
won't you?

Most XML parsers that I have had to use are "sequential".

Huh?

What in blazes does that mean, and how does that differ from CSV
processing or any other transmission format?

Crap uses to be a function of complexity. And XML is inherently more complex
that plain text.

XML *is* plain text, at least as much as CSV is. And CSV is complex,
to the point where there is no one standard way of doing it.

Crap is not a function of complexity. Crap is a result of
unintelligent tactics. It can happen in simple or complex cases.

As it happens, XML parsing is simple, not complex, or at least can
be. Furthermore, XML and schema design, while no more simple than any
other kind of design, creates benefits of type safety, format
validation, portability, self-documentation and development speed
simply not available from CSV formats.

XML is fat. It contains data, usually in human readable format, plus tons of
descriptive information and conformance widgets that have to be read, parsed
and processed along with the data. CSV contains data, usually in human
readable format, and nothing else.

You consider CSV to be "human readable"? Not by itself, it isn't.
XML has the advantage of being readable within itself.

As for those "conformance widgets", if you remove them what do you use
instead to assure conformance? You don't lose those components, you
just move them into a manual and harder-to-maintain layer.

Or are you suggesting just to abandon conformance enforcement
altogether? That would be a huge advantage, wouldn't it?

XML consumes bandwidth and *lots* of processing time. And yes, both are
cheap. But the idle time or a user waiting for a process to finish is not
always cheap, and CSV is undisputably faster than anything else.

How much "idle time or a user waiting for a process to finish" is
there with XML transmissions - a quarter second? An eighth of a
second? I can always use those gaps to refill my coffee mug.

When you say that "CSV is undisputably [sic] faster than anything
else", you speak only of transmission and CPU time. The statement is
not true. There are formats that consume much less bandwidth than
CSV. The differences usually are too small to weigh against the
advantages provided by XML.

markspace · Aug 31, 2010

I tend to think of relational and hierarchical structures as being
rather different. Are you saying this from the point of view that a
table is trivially a hierarchy, the whole being split into rows which
are then split into fields, or something else?

Something else. Let's see if my idea holds up:

Suppose we have data like those you mentioned, customers and products.
You have a list of customers:

Cust1
Cust2
Cust3

Each customer has presumably some purchases:

Cust1 --+-- Invoice1A
+-- Invoice1B
+-- Invoice1C

Each invoice lists various products that were sold:

Cust1 --+-- Invoice1A
+----- Product_a
+-- Invoice1B
+----- Product_b
+----- Product_c
+-- Invoice1C
+----- Product_a
+----- Product_x

That to me is a tree. If SQL forces a tabular format, that's just an
artifact of SQL; the fact that SQL would retrieve this data as a table
is immaterial. It doesn't change the fact that the data itself is a
tree. (Actually, the data forms a forest once you add multiple
customers, but that's a slightly less common term, so I used "tree"
instead for easy reading.)

Note in my post I did say "relational," trying to imply that there were
joins involved in this scenario. Maybe on that bit I was unclear.

Tom Anderson · Aug 31, 2010

Something else. Let's see if my idea holds up:

Suppose we have data like those you mentioned, customers and products. You
have a list of customers:

Cust1
Cust2
Cust3

Each customer has presumably some purchases:

Cust1 --+-- Invoice1A
+-- Invoice1B
+-- Invoice1C

Each invoice lists various products that were sold:

Cust1 --+-- Invoice1A
+----- Product_a
+-- Invoice1B
+----- Product_b
+----- Product_c
+-- Invoice1C
+----- Product_a
+----- Product_x

That to me is a tree.

It's a tree to me too. You could draw something very similar for
category/product/SKU as well; it might turn into a DAG if you allow
products to be in multiple categories, but it's broadly treelike.

If SQL forces a tabular format, that's just an artifact of SQL; the
fact that SQL would retrieve this data as a table is immaterial. It
doesn't change the fact that the data itself is a tree.

No, the tree is a way of looking at the data, just as a table is. One
might feel one or the other was more natural, or find one or the other
more practical, but they're ultimately just views. I'm not saying you're
wrong about XML being a good fit for this kind of data - if it is
aesthetically or practically beneficial to treat it as a tree, then XML is
a better choice than CSV - just about this being an essential feature of
the data.

Rather, i'd say the structure follows from the access pattern, rather than
the data itself. Your tree structure is useful for browsing order history
or computing account balance. But what if i wanted to ask questions about
inventory levels or who's buying my products, or to order some products to
be picked from a warehouse? Then, i would probably want a tree like:

Product_a
+ Cust1
+ Invoice1A
+ Invoice1C
Product_b
+ Cust1
+ Invoice1B
+ Cust2
+ Invoice2A
Product_c
+ Cust1
+ Invoice1B
Product_x
+ Cust1
+ Invoice1C
+ Cust3
+ Invoice3A
+ Invoice3B

So that for each product, i can quickly found out how much i've sold to
who, and when, or decide how much needs to be picked, and into what
crates.

The nice thing about a tree model is that it imposes an interpretation on
data. The nice thing about a tabular model is that it doesn't.

Note in my post I did say "relational," trying to imply that there were joins
involved in this scenario. Maybe on that bit I was unclear.

The 'relation' in 'relational' doesn't refer to joins - a 'relation' is a
term from predicate logic which means a predicate function taking several
parameters. For example, if we're talking about invoices, the relation
might be:

is_an_invoice(invoice_number, customer_number, invoice_date)

Which might be true of (Invoice1A, Cust1, Tuesday), but false of
(Invoice1A, Cust2, Tuesday). In practice, a relation can be represented as
a set of same-shaped tuples (of all the values satisfying the predicate),
which looks enough like a table that it's what Dr Codd built his model
upon.

tom

Tom Anderson · Aug 31, 2010

Huh?

What in blazes does that mean, and how does that differ from CSV
processing or any other transmission format?

In the light of the remark about CSV, i interpret it as follows. Consider
this format:

<invoice number="Invoice1A">
<customer number="Cust1"/>
<product id="Product_a" qty="1" priceEach="75.00"/>
</invoice>

You write a parser for this. The parser works by getting the invoice
element, then going through its children and dealing with them.
Now it changes to:

<invoice number="Invoice1A">
<customer number="Cust1"/>
<product id="Product_a" qty="1" priceEach="75.00"/>
<payment status="paid"/>
</invoice>

Your parser will now encounter that payment element when it wasn't
expecting one, and may break. If you've been cautious, it won't, but in
many natural ways of writing a parser, it will. For example, if i was
doing StAX, i'd be looping over product elements until i hit an end tag
for invoice, and if i saw something else, i'd probably explode. StAX makes
it *particularly* awkward to skip over unknown elements, because you have
to walk over their innards too. If i was doing DOM and iteration, i'd have
the same problem. If i was doing it with DOM and XPath, it would probably
be okay, but personally, i wouldn't write a parser like that for this
problem. If i was doing SAX, it would depend on what i did in startElement
with unrecognised elements. I'd probably blow up.

Mind you, much the same applies to CSV. In the CSV (actually
pipe-separated values) parsing code in the system i work on now, we check
the number of values in each line, and if it isn't what we expect, we
reject it. Someone else makes the data, and is supposed to tell us if the
format changes, so for us, it's better to scream bloody murder about
changes, so we can point the finger at them, than try to deal with it.

XML *is* plain text, at least as much as CSV is.

Hmm. Things like XHTML are certainly highly readable, and my little
invoice thing above is on a par with CSV, but have you tried reading a
WSDL file lately? The few bytes of useful information are obscured by a
mountain of namespaces, wrapper elements, and god knows what.

And CSV is complex, to the point where there is no one standard way of
doing it.

That's not complexity, it's lack of standardisation. They're orthogonal.
Text files are bone simple, but they aren't standardised - CR vs LF, word
wrapping, meaning of a trailing space on a line, line break at the end,
etc.

tom

Wojtek · Sep 1, 2010

Tom Anderson wrote :

In the light of the remark about CSV, i interpret it as follows. Consider
this format:

<invoice number="Invoice1A">
<customer number="Cust1"/>
<product id="Product_a" qty="1" priceEach="75.00"/>
</invoice>

You write a parser for this. The parser works by getting the invoice element,
then going through its children and dealing with them.
Now it changes to:

<invoice number="Invoice1A">
<customer number="Cust1"/>
<product id="Product_a" qty="1" priceEach="75.00"/>
<payment status="paid"/>
</invoice>

Your parser will now encounter that payment element when it wasn't expecting
one, and may break. If you've been cautious, it won't, but in many natural
ways of writing a parser, it will. For example, if i was doing StAX, i'd be
looping over product elements until i hit an end tag for invoice, and if i
saw something else, i'd probably explode. StAX makes it *particularly*
awkward to skip over unknown elements, because you have to walk over their
innards too. If i was doing DOM and iteration, i'd have the same problem. If
i was doing it with DOM and XPath, it would probably be okay, but personally,
i wouldn't write a parser like that for this problem. If i was doing SAX, it
would depend on what i did in startElement with unrecognised elements. I'd
probably blow up.

Really?

All my SAX parsers would simply ignore the extra element. Since the
element name would not exist in the for-else tree, then it would be
ignored.

Besides, this is why I ALWAYS put a version number in the top element:

<invoice number="Invoice1A" version="1">

So that the parser can die right there, or else if the version is older
can call the correct code to parse the XML

Arved Sandstrom · Sep 3, 2010

Lew said:
Your sarcasm notwithstanding, the sourceforge [sic] Java CSV tool
does not

Click to expand...

He was not being sarcastic.
[ SNIP ]
Looking at OpenCSV, one of the many sites to which markspace
helpfully linked, I see that it does indeed parse CSV input into an
object, and the natural representation of a CSV at that.

You should look at the information about CSV files on mindprod.com. One
think Roedy makes clear is that CSV is not simple or precise or
straightforward.

Well, "pure" or simple CSV - where only commas have any special meaning in a
line - is simple. As soon as you start adding any extra rules, including
anything at all to do with quotes or backslashes, then it stops being
simple; implementations that obey one set of rules or the other also stop
being compatible.

The thing is, when discussing delimiter-separated fields, commas are often a
poor choice for many sets of data, and a lot of these varying and somewhat
complicated rules exist precisely because commas are used. Much better just
to select a sensible delimiter.

AHS

Tom Anderson · Sep 4, 2010

Well, "pure" or simple CSV - where only commas have any special meaning
in a line - is simple. As soon as you start adding any extra rules,
including anything at all to do with quotes or backslashes, then it
stops being simple;

I'd say that a format with these rules:

1. Rows are terminated by newlines
2. Within a row, values are separated by commas
3. A backslash followed by some character means that character as part of
a value, not as a syntactic element
3a. A backslash followed by end of file means end of file

Was still very simple. Code to read it looks like:

Reader in;
List<List<String>> rows = new ArrayList<List<String>>();
List<String> row = new ArrayList<String>();
StringBuilder buf = new StringBuilder();
int ch;
while ((ch = in.read()) != -1) {
if (ch == '\n') {
row.add(buf.toString());
buf.setLength(0);
rows.add(row);
row = new ArrayList<String>();
}
else if (ch == ',') {
row.add(buf.toString());
buf.setLength(0);
}
else if (ch == '\\') {
if ((ch = in.read()) != -1) {
buf.append((char)ch);
}
}
else {
buf.append((char)ch);
}
}
// if your last line is not properly terminated, you will have a nonempty
// row here; you might like to add that to rows, or you might not

My first cut at that also included a few lines to skip empty rows, and so
make last-line handling robust for free. It's so tempting to add bells and
whistles to something this simple.

The trouble is that the originators of CSV didn't choose backslash
escaping, they chose quoting, and doomed future generations to a world of
pain. ESR talks about this:

http://www.faqs.org/docs/artu/ch05s02.html

implementations that obey one set of rules or the other also stop being
compatible.

That's perhaps the major problem.

The thing is, when discussing delimiter-separated fields, commas are
often a poor choice for many sets of data, and a lot of these varying
and somewhat complicated rules exist precisely because commas are used.
Much better just to select a sensible delimiter.

True. I've always liked tabs, but they aren't very editor-friendly. A
system i talk to at work uses pipes. Some of the obscure ASCII controls
like RS could be good choices as long as you know the path the data is
travelling across is properly 7-bit clean.

tom

Arne Vajhøj · Sep 6, 2010

I can't see the need for such a radical dismissal.

Mine or your own:

#Since all the buzz with XML started several years ago, I've been scratching
#my head trying to understand what actual advantages it might bring when
used
#to process files whose data structures are known and agreed upon in
advance.
#So far, with the exception of being able to boast that your application is
#buzzword-compliant, I have found none.

?

I can concede that XML is very useful for many purposes, most of them
related to the non-rectangularity of data.

Given that is probably around 99% of data, then that seems as
a rather big point!

But it is not the case for roughly 95% of current usages of XML. Most data
trasfers are Point to Point, and mosty of them are based on a predefined
schema shared by both parties. And most of them are rectangular. So, 95% of
XML usages are overkill due to buzz.

I can assure you that it is not 95% of XML exchange point to point that
is rectangular.

Any competent data person will put in meta information, security
information etc. besides the rectangular pay load data.

I believe in documentation. I just don't believe that every single piece of
data needs to be documented.

Well - today most companies want everything documented and not just some
parts documented - for good reasons - they can not expect only the
people that document stuff to leave the company (and even though
developers not professional enough to document their
data formats may have problems finding a new job, then they can still
walk out in front of a bus).

And I believe in type safeness, but I don't see what real advantage XML
provides to type safeness when compared with a sound method for parsing
delimited text files.

Most parsers does a very poor job at validating data compared to
a XML schema.

If they actually do the same, then it will be awfully expensive
in development.

And besides as soon as it is cross organization, then having
the receivers code being the documentation for the format
is a DOA idea.

Arne

Arne Vajhøj · Sep 6, 2010

Huh?

Click to expand...

From section 5 of the namespace spec [1]:

Note that DTD-based validation is not namespace-aware in the following
sense: a DTD constrains the elements and attributes that may appear in a
document by their uninterpreted names, not by (namespace name, local
name) pairs. To validate a document that uses namespaces against a DTD,
the same prefixes must be used in the DTD as in the instance.

Basically, DTD doesn't know about namespaces. It works against the names
as written in the file, not the way they're interpreted via the
namespace mechanism. That means that this document:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml">
<xhtml:head>
<xhtml:title></xhtml:title>
</xhtml:head>
<xhtml:body>
</xhtml:body>
</xhtml:html>

Is not valid, because of all those "xhtml:" strings.

It is not valid because it is invalid XHTML with a HTML (not XHTML)
DOCTYPE.

Basically, if you have a DTD which doesn't hardcode a namespace prefix
(and it would be a bit self-defeating to), you can't use that document
type with a namespace prefix. You can use it with the xmlns= prefixless
declaration, but there's no way to mix and match elements from two
namespaces while staying valid.

I guess that in fact (a) most people use a single namespace for a whole
document, often with a prefixless namespace and (b) virtually nobody
uses DTDs, so this is not as bad as it seems (to me). But i still boggle
at the fact that the namespace guys managed to write a spec that fails
so completely to interoperate with the spec it's built on top of.

How does the fact that you can not use multiple namespaces together
with a DTD correlate to your claim "Yes, there is a spec, but you can't
actually use it in practice without violating the main XML spec" ?

It is not as if DTD's are required for all XML formats.

Arne

Arne VajhÃ¸j · Sep 6, 2010

Op 29-08-10 20:34, Lew schreef:

more Bullcrap....

the only thing that saves bandwith (and development time,
maintenance time, bugfix time, reliability problems, and upgradability )
is proper documentation.

A well documented CSV-file is always better than a not-documented other
solution....

A well documented X format is better than a not documented Y format.

Which is one of the reasons why XML is best - it has a very well
defined way to make documentation and plenty of tools for validation.

Arne

Arne Vajhøj · Sep 6, 2010

Crap is crap.

Crappy XML will induce crap at runtime.

If you have reliable classes to produce/consume CSV flows, the debugging
effort, reliability and upgradability will be exactly the same that you
would achieve using a reliable XML parser/writer.

Not true.

Java comes out of the box with classes for schema validation,
mapping between XML and Java objects etc.

Those does not exist for CSV in out of the box Java.

If you "upgrade" a data flow, it means that you also "upgrade" the code that
produces and the code that consumes that flow. The only requirement to
upgrade CSV is "add new items at the end of each row".

Most XML parsers that I have had to use are "sequential".

Crap uses to be a function of complexity. And XML is inherently more complex
that plain text.

XML parsing code is usually less complex than CSV parsing code.

XML is fat. It contains data, usually in human readable format, plus tons of
descriptive information and conformance widgets that have to be read, parsed
and processed along with the data. CSV contains data, usually in human
readable format, and nothing else.

XML consumes bandwidth and *lots* of processing time. And yes, both are
cheap. But the idle time or a user waiting for a process to finish is not
always cheap, and CSV is undisputably faster than anything else.

You can also write assembler code that is faster than Java code.

Have you noticed how all companies make their software in
assembler to make it fast?

Or maybe not ...

Arne

Arne Vajhøj · Sep 6, 2010

In the light of the remark about CSV, i interpret it as follows.
Consider this format:

<invoice number="Invoice1A">
<customer number="Cust1"/>
<product id="Product_a" qty="1" priceEach="75.00"/>
</invoice>

You write a parser for this. The parser works by getting the invoice
element, then going through its children and dealing with them.
Now it changes to:

<invoice number="Invoice1A">
<customer number="Cust1"/>
<product id="Product_a" qty="1" priceEach="75.00"/>
<payment status="paid"/>
</invoice>

Your parser will now encounter that payment element when it wasn't
expecting one, and may break. If you've been cautious, it won't, but in
many natural ways of writing a parser, it will. For example, if i was
doing StAX, i'd be looping over product elements until i hit an end tag
for invoice, and if i saw something else, i'd probably explode. StAX
makes it *particularly* awkward to skip over unknown elements, because
you have to walk over their innards too. If i was doing DOM and
iteration, i'd have the same problem. If i was doing it with DOM and
XPath, it would probably be okay, but personally, i wouldn't write a
parser like that for this problem. If i was doing SAX, it would depend
on what i did in startElement with unrecognised elements. I'd probably
blow up.

Everything XPath based and most DOM stuff would actually work.

Mind you, much the same applies to CSV. In the CSV (actually
pipe-separated values) parsing code in the system i work on now, we
check the number of values in each line, and if it isn't what we expect,
we reject it. Someone else makes the data, and is supposed to tell us if
the format changes, so for us, it's better to scream bloody murder about
changes, so we can point the finger at them, than try to deal with it.

For XML you just ask the parser to validate against a schema and voila
you get the exception when something has changed.

Arne

Arne Vajhøj · Sep 6, 2010

It's a tree to me too. You could draw something very similar for
category/product/SKU as well; it might turn into a DAG if you allow
products to be in multiple categories, but it's broadly treelike.

No, the tree is a way of looking at the data, just as a table is. One
might feel one or the other was more natural, or find one or the other
more practical, but they're ultimately just views. I'm not saying you're
wrong about XML being a good fit for this kind of data - if it is
aesthetically or practically beneficial to treat it as a tree, then XML
is a better choice than CSV - just about this being an essential feature
of the data.

But:
* if you program in Java then you will want to have stuff in
an OO model not in a relational model - XML supports that
better
* unlike tables in a database that are kept together by the
database software then multiple CSV files on disk is not
automatically kept together - a single XML file is

Arne

How to convert CSV to parquet file without RLE_DICTIONARY encoding?	0	Sep 2, 2022
How to sort a CSV file with merge sort JAVA	7	May 6, 2021
How to read from a .csv file in Java?	1	Nov 6, 2023
Batch Convert HTML to UTF-8 Files	2	Oct 2, 2023
How to implement a html parser in java?	1	Dec 28, 2023
Python programmer looking at transitioning to Java. Any advice or resources?	3	Oct 18, 2022
Java matrix problem	3	Sep 10, 2023
What's the best way to extract 2 values from a CSV file from each row systematically?	6	Sep 23, 2013

How to convert CSV row to Java object?

Lew

markspace

Luuk

Tom Anderson

Tom Anderson

Lew

Leonardo Azpurua

Lew

markspace

Tom Anderson

Tom Anderson

Wojtek

Arved Sandstrom

Tom Anderson

Arne Vajhøj

Arne Vajhøj

Arne VajhÃ¸j

Arne Vajhøj

Arne Vajhøj

Arne Vajhøj

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads