Create Excel file streamming

J

Jileshl

I wish to create Excel file, such that it should be able to
incrementally write data to the file system ... and not hold
everything in memory... i.e unlike HSSF POI.

Any Suggestions ?
 
S

Silvio Bierman

I wish to create Excel file, such that it should be able to
incrementally write data to the file system ... and not hold
everything in memory... i.e unlike HSSF POI.

Any Suggestions ?

Excel supports the traditional XLS format and a newer XML based format.
The former is very difficult to stream since it is some form of
Microsfts compound document format which is not very stream-oriented. It
might be possible but I do not know of any available tools that will
support this. To generate this format in a non-streamed fashion you can
use Apache POI but it will build the complete document in memory so you
can not stream it before it has been completely generated.
The XML format is easy to stream, of course. However, older versions of
Excel do not support it.

Silvio Bierman
 
J

Jileshl

How can I use that.... I am using Java.. to create the Excel file, the
problem I am facing is .. I have large data which has to be stored
into memory and den it creates the Excel file...

I over come this memory issue I have to stream the data to the clients
end.....how should I go about it.??
 
J

Jileshl

Excel supports the traditional XLS format and a newer XML based format.
The former is very difficult to stream since it is some form of
Microsfts compound document format which is not very stream-oriented. It
might be possible but I do not know of any available tools that will
support this. To generate this format in a non-streamed fashion you can
use Apache POI but it will build the complete document in memory so you
can not stream it before it has been completely generated.
The XML format is easy to stream, of course. However, older versions of
Excel do not support it.

Silvio Bierman

Any Other Options ???
 
S

Silvio Bierman

Any Other Options ???

There is one option that more or less works. You set the content-type to
"application/vnd.ms-excel" but then simply write plain-text content or
CR+LF separated recrods of TAB separated values.

This is of course streamable but there is no way of telling Excel to
treat data as anything other than plain text. In addition, if Unicode is
an issue this will probably break down very quickly.

We are faced with the same problem. Too many users have too old Excel
versions running for us to use the XML format. We are using POI now and
suffer from high memory loads.

We have created a set of record-streaming interfaces inside our software
because we support many other formats then Excel (SPSS, Triple-S, text
formats, XML formats) so the production code is format agnostic. At the
output level we plug a record streamer implementation into it and the
data comes out. Most of the plugins are actually stream oriented, Excel
is currently the only exception. However, once we decide we can go for
the XML version (which we may do later this year) we just have to
re-implement the Excel plugin and we are done. Perhaps you could device
a similar construct to leave your options open.

Regards,

Silvio
 
J

Jileshl

Thanks Silvio ,

Can you throw more light on "record streamer implementation".. how to
go about to achieve that!!

Are you creating a wrapper around the HSSF POI, or any other similar
POI's or its completely you own???
 
S

Silvio Bierman

Thanks Silvio ,

Can you throw more light on "record streamer implementation".. how to
go about to achieve that!!

Are you creating a wrapper around the HSSF POI, or any other similar
POI's or its completely you own???

The interfaces are completely our own. Names are something like
RecordMetaData, RecordWriter and RecordReader. Additional interfaces are
non-streaming (buffering) interfaces like RecordContainer (off course
extending RecordWriter and being able to produce a RecordReader) that
hold multiple records at once to allow set-operations on records by
allowing keyed/indexed access to records.

RecordMetaData describes the underlying datas logical column structure
and optional formatting parameters/hints. A RecordWriter is a generic
interface that you write records upon so you could imagine a class
ExcelRecordWriter that implements RecordWriter and has a constructor
with an OutputStream parameter. You could also imagine a class like (we
don't have one like this but it is just an example)
ResultSetRecordReader that implements RecordReader and is constructed
from a ResultSet.

May sound a bit vague but doing this has enabled us to completely
separate data generation code from data formatting code and also enabled
us to write many utility conversion and filtering wrappers. Having these
all defined in terms of abstract record-logic interfaces allow us to
combine all this freely regardless of where the data comes from or what
formats it is converted into.

Regards,

Silvio
 
L

Lord Zoltar

There is one option that more or less works. You set the content-type to
  "application/vnd.ms-excel" but then simply write plain-text content or
CR+LF separated recrods of TAB separated values.

This is of course streamable but there is no way of telling Excel to
treat data as anything other than plain text. In addition, if Unicode is
an issue this will probably break down very quickly.

We are faced with the same problem. Too many users have too old Excel
versions running for us to use the XML format. We are using POI now and
suffer from high memory loads.

We have created a set of record-streaming interfaces inside our software
because we support many other formats then Excel (SPSS, Triple-S, text
formats, XML formats) so the production code is format agnostic. At the
output level we plug a record streamer implementation into it and the
data comes out. Most of the plugins are actually stream oriented, Excel
is currently the only exception. However, once we decide we can go for
the XML version (which we may do later this year) we just have to
re-implement the Excel plugin and we are done. Perhaps you could device
a similar construct to leave your options open.

Regards,

Silvio

It's also possible to output HTML tables and Excel will usually read
them fine. This is good if you need basic formatting (such as colour,
font, alignment, etc...) in your "Excel" file. To test it out, save an
HTML table in a file named "test.xls" and open it in Excel.
If you want to go further, save this "test.xls" once it's open in
Excel (you don't actually have to make any changes, just hit "Save")
and open the file in a text editor. You should see, in plain text, all
the extra markup that Excel adds to the file to make it more
"complete".
 
J

Jileshl

That sounds good..
however... wht I am thinking at this moment is .. since I am planning
to re-write my code.... from current HSSF.... to this new thing..
I might allow users to give me the option .. of which EXCEL version
they use and generate accordingly ..


In case of versions from 2003 and XML Generation of Excel file could
be better option.. thus by keeping a track of popular stuff
downloaded.... as soon as the XML download for older Version is
nearing to null I can completely remove my old code..

Well, now for the XML support is concern... Do we have any existing
stuff something like POI on which we can then do the abstraction and
build our own ??

Best Regards,

Jilesh
 
S

Silvio Bierman

That sounds good..
however... wht I am thinking at this moment is .. since I am planning
to re-write my code.... from current HSSF.... to this new thing..
I might allow users to give me the option .. of which EXCEL version
they use and generate accordingly ..


In case of versions from 2003 and XML Generation of Excel file could
be better option.. thus by keeping a track of popular stuff
downloaded.... as soon as the XML download for older Version is
nearing to null I can completely remove my old code..

Well, now for the XML support is concern... Do we have any existing
stuff something like POI on which we can then do the abstraction and
build our own ??

Best Regards,

Jilesh
On Feb 26, 4:44 am, Silvio Bierman <[email protected]>
wrote:

Hello Jilesh,

I am not aware of any tools that would help you generate MS Office
compliant XML. There are specifications of the XML layout available
somewhere on the MS website. I know because I have found them when
looking for them some time ago. I am afraid I can not provide you with
links but you should be able to Google for it.

From what I remember it was quite straightforward. It was no pretty XML
but that may just have been ugly examples using ill-styled name-space
constructs.

Good luck,

Silvio
 
J

Jileshl

Hello Jilesh,

I am not aware of any tools that would help you generate MS Office
compliant XML. There are specifications of the XML layout available
somewhere on the MS website. I know because I have found them when
looking for them some time ago. I am afraid I can not provide you with
links but you should be able to Google for it.

From what I remember it was quite straightforward. It was no pretty XML
but that may just have been ugly examples using ill-styled name-space
constructs.

Good luck,

Silvio

Cool, thanks........


Any Idea, of Generating the SpreadSheets in OpenOffice. ??
 
J

Jileshl

Hello Jilesh,

I am not aware of any tools that would help you generate MS Office
compliant XML. There are specifications of the XML layout available
somewhere on the MS website. I know because I have found them when
looking for them some time ago. I am afraid I can not provide you with
links but you should be able to Google for it.

From what I remember it was quite straightforward. It was no pretty XML
but that may just have been ugly examples using ill-styled name-space
constructs.

Good luck,

Silvio

Hey, I just checked with SpreadSheetML.. i.e creating an XML with the
Ms-EXCEL name-space and specification... it says it supports version
above 2003 but doesn't support new version released for MAC.....
Are you sure you gonna go with this ???
 
S

Silvio Bierman

On Feb 27, 1:24 am, Silvio Bierman <[email protected]>
wrote:

Hey, I just checked with SpreadSheetML.. i.e creating an XML with the
Ms-EXCEL name-space and specification... it says it supports version
above 2003 but doesn't support new version released for MAC.....
Are you sure you gonna go with this ???


Yes, I was aware of this remark but I do not think it is important. Most
Apple users who use MS office use it as a native Windows application
inside a guest Windows OS that runs on a virtual machine created inside
virtualization software that Apple calls Dimensions (if I recall
correctly). If I am not mistaken MS Office is no longer available for
native MacOS and has not been for a couple of years, just like IE.

Regards,

Silvio
 
T

Thomas Kellerer

(e-mail address removed), 26.02.2008 05:42:
Any Other Options ???

Ah yes: create a tab-separated text file with the extension .xls and Excel will happily open it.

Thomas
 
J

Jileshl

Yeah that does sounds good.. but then it has its own quirky behavior
in the sense if the data is not UTF-16LE and tab separated.. excel
will not directly open it.. you have to do allsorts of imports etc..
its messy.. and the biggest thing.. it does not support styles.. it
can be broken anytime on any ms excel version... its extremely wafer
thin logic to make excel work with csv..

Jilesh
 
J

Jileshl

Yeah that does sounds good.. but then it has its own quirky behavior
in the sense if the data is not UTF-16LE and tab separated.. excel
will not directly open it.. you have to do allsorts of imports etc..
its messy.. and the biggest thing.. it does not support styles.. it
can be broken anytime on any ms excel version... its extremely wafer
thin logic to make excel work with csv..

Jilesh

Any Idea with apache HSSFserializer with the Cocoon project..?? can
that be useful ?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top