Problem Writing Binary Data Stream To File

bmcdougald · May 17, 2006

I have written a servlet that makes an HTTP connection to our report
repository system and returns data to the calling browser in either
text or binary format. The binary formats returned are either Adobe
PDF's or Excel spreadsheets. This is working well and presents data
correctly to a user's browser and/or initiates an HTTP download session
in the browser, which, again, stores the data correctly as text or
binary.

Now, I want to write a batch driven backend java program (not servlet)
that will call the the report repository via the same method above, but
store the data as a file (text or PDF) in a directory somewhere on the
server. I can get the text data portion of the program to write to a
flat file correctly, but the PDF causes the Adobe Reader to fail when
the file is opened. When I inspect the .pdf file to see the contents,
it is all text, not binary as you would expect.

I used the same method for reading Bytes in this program function that
I used in my servlet. Only thing different is that the
DataOutputStream object pipes to a file instead of the
HttpServletResponse object in the servlet, and I'm not setting any
response headers with mime types and such.

What am I missing?

Here is my function code:

static void callUrl( String sType, String sSession,
String sRptName, String sRid,
String sIndexes, String sIPAddr, String sFolder){

byte[] buffer = new byte[8192]; //8k page
boolean binaryFlag = false;

BufferedInputStream in;
BufferedReader ir;

String sUrl;
String inString;
String sFilename = "", sExt = "";

sUrl="http://" + sIPAddr +
"/webaccess/bmc-ctd-wa-cgi.exe?0=report&sid=" + sSession +
"&rid=" + sRid + "&index=" + sIndexes +
"&mode=External&errorflowelem=onerrorxml%2Etxt";

try{

URL url = new URL(sUrl);
URLConnection uc = url.openConnection();

uc.setDoOutput(true);
uc.setDoInput(true);
uc.setAllowUserInteraction(false);

/*
* Is Report a PDF - set binary flag true
*/

if(sType.equals("P")){
binaryFlag = true;
sFilename=sRptName +".pdf";
sExt="PDF";
sUrl += "&18=Txt_P_2_Pdf_D";
}

/*
* Is Report a TXT file - set binary flag flase
*/

if(sType.equals("T")){
binaryFlag = false;
sExt="TXT";
sFilename=sRptName +".TXT";
}

sFilename = "x:/users/GS/"+sFolder+"/"+sExt+"/"+sFilename;

File aFile = new File(sFilename);

aFile.createNewFile();

/* Build data stream pipe to output file */

DataOutputStream myStream = new DataOutputStream(
new FileOutputStream(aFile));

if (binaryFlag == true){

/*
* write binary data to output file *** NOT WORKING ***
*/

in = new BufferedInputStream(url.openStream());

while (true) {

int nBytes = in.read(buffer);

if (nBytes < 0) break; // EOF ?

myStream.write(buffer,0,nBytes); // write binary
data to file

}

in.close();

}else{

/*
* write plain text data to output file *** WORKING ***
*/

ir = new BufferedReader(
new InputStreamReader(url.openStream()));

while ( (inString = ir.readLine()) != null ) {

myStream.writeChars(inString);

}

ir.close();

}

myStream.flush();
myStream.close();

}catch (MalformedURLException malformed)
{
System.out.println("Malformed URL");
malformed.printStackTrace();
} // end catch malformed
catch (IOException ioe)
{
System.out.println("BAD IO");
ioe.printStackTrace();
} // end catch IO
catch (Exception e)
{
System.out.println("Exception");
e.printStackTrace();
} //
}

Rhino · May 17, 2006

I have written a servlet that makes an HTTP connection to our report
repository system and returns data to the calling browser in either
text or binary format. The binary formats returned are either Adobe
PDF's or Excel spreadsheets. This is working well and presents data
correctly to a user's browser and/or initiates an HTTP download session
in the browser, which, again, stores the data correctly as text or
binary.

Now, I want to write a batch driven backend java program (not servlet)
that will call the the report repository via the same method above, but
store the data as a file (text or PDF) in a directory somewhere on the
server. I can get the text data portion of the program to write to a
flat file correctly, but the PDF causes the Adobe Reader to fail when
the file is opened. When I inspect the .pdf file to see the contents,
it is all text, not binary as you would expect.

I used the same method for reading Bytes in this program function that
I used in my servlet. Only thing different is that the
DataOutputStream object pipes to a file instead of the
HttpServletResponse object in the servlet, and I'm not setting any
response headers with mime types and such.

What am I missing?

Here is my function code:

static void callUrl( String sType, String sSession,
String sRptName, String sRid,
String sIndexes, String sIPAddr, String sFolder){

byte[] buffer = new byte[8192]; //8k page
boolean binaryFlag = false;

BufferedInputStream in;
BufferedReader ir;

String sUrl;
String inString;
String sFilename = "", sExt = "";

sUrl="http://" + sIPAddr +
"/webaccess/bmc-ctd-wa-cgi.exe?0=report&sid=" + sSession +
"&rid=" + sRid + "&index=" + sIndexes +
"&mode=External&errorflowelem=onerrorxml%2Etxt";

try{

URL url = new URL(sUrl);
URLConnection uc = url.openConnection();

uc.setDoOutput(true);
uc.setDoInput(true);
uc.setAllowUserInteraction(false);

/*
* Is Report a PDF - set binary flag true
*/

if(sType.equals("P")){
binaryFlag = true;
sFilename=sRptName +".pdf";
sExt="PDF";
sUrl += "&18=Txt_P_2_Pdf_D";
}

/*
* Is Report a TXT file - set binary flag flase
*/

if(sType.equals("T")){
binaryFlag = false;
sExt="TXT";
sFilename=sRptName +".TXT";
}

sFilename = "x:/users/GS/"+sFolder+"/"+sExt+"/"+sFilename;

File aFile = new File(sFilename);

aFile.createNewFile();

/* Build data stream pipe to output file */

DataOutputStream myStream = new DataOutputStream(
new FileOutputStream(aFile));

if (binaryFlag == true){

/*
* write binary data to output file *** NOT WORKING ***
*/

in = new BufferedInputStream(url.openStream());

while (true) {

int nBytes = in.read(buffer);

if (nBytes < 0) break; // EOF ?

myStream.write(buffer,0,nBytes); // write binary
data to file

}

in.close();

}else{

/*
* write plain text data to output file *** WORKING ***
*/

ir = new BufferedReader(
new InputStreamReader(url.openStream()));

while ( (inString = ir.readLine()) != null ) {

myStream.writeChars(inString);

}

ir.close();

}

myStream.flush();
myStream.close();

}catch (MalformedURLException malformed)
{
System.out.println("Malformed URL");
malformed.printStackTrace();
} // end catch malformed
catch (IOException ioe)
{
System.out.println("BAD IO");
ioe.printStackTrace();
} // end catch IO
catch (Exception e)
{
System.out.println("Exception");
e.printStackTrace();
} //
}

I haven't looked at your code, I just read the description of your problem.
You may want to consider using iText to compose your PDF. I've used it for a
few things and it creates PDFs that are easily read by Adobe Acrobat and/or
browsers. It's also quite easy to use and well-supported by the developers.
You can find out more at http://www.lowagie.com/iText/. I should mention
that I haven't tried to access the documents created with iText from a
servlet but I'd be very surprised if there was a problem in doing that.

Why re-invent the wheel?

bmcdougald · May 17, 2006

Thanks, I'll look at this product.

However, just to clarify, the datastream coming from the report
repository system is already in either binary PDF or ascii TXT format.
My standalone program just redirects that datastream into a file,
rather than passing it along to a browser as was the case in my
servlet.

Rhino · May 17, 2006

Thanks, I'll look at this product.

However, just to clarify, the datastream coming from the report
repository system is already in either binary PDF or ascii TXT format.
My standalone program just redirects that datastream into a file,
rather than passing it along to a browser as was the case in my
servlet.

I just suggested iText because, if you use it, you shouldn't have the
reading problems you're getting now. I can imagine that may be too radical a
solution for you and that you'd prefer to fix the code you posted.
Unfortunately, I don't have enough experience with the kind of thing you're
doing to figure out what's going wrong for you.

Matt Humphrey · May 18, 2006

I have written a servlet that makes an HTTP connection to our report
repository system and returns data to the calling browser in either
text or binary format. The binary formats returned are either Adobe
PDF's or Excel spreadsheets. This is working well and presents data
correctly to a user's browser and/or initiates an HTTP download session
in the browser, which, again, stores the data correctly as text or
binary.

Now, I want to write a batch driven backend java program (not servlet)
that will call the the report repository via the same method above, but
store the data as a file (text or PDF) in a directory somewhere on the
server. I can get the text data portion of the program to write to a
flat file correctly, but the PDF causes the Adobe Reader to fail when
the file is opened. When I inspect the .pdf file to see the contents,
it is all text, not binary as you would expect.

I used the same method for reading Bytes in this program function that
I used in my servlet. Only thing different is that the
DataOutputStream object pipes to a file instead of the
HttpServletResponse object in the servlet, and I'm not setting any
response headers with mime types and such.

What am I missing?

Nothing jumps out at me as to why this is not working. Aside from thinking
that sType does not really equal "P", this case raises some questions to me.
How much data is written to the pdf in comparison to the actual size of the
real pdf? (Say, if you download from the URL directly and SaveAs.) Do you
get exactly the right output file names? What type does the content-header
say the PDF is? When you say the PDF looks like text (which it can) how
closely does it resemble the real pdf? Is it totally different text (and
what does it say?) or are some characters corrupt? Are \r \n preserved
correctly?

I would try out your code, but without knowing exactly what your parameters
are I can't really test it. Is the web site publically accessible? I wonder
if webserver is returning a different result because of some other content
acceptor or parameter is not specified.

Cheers,
Matt Humphrey (e-mail address removed) http://www.iviz.com/

EJP · May 18, 2006

I can't see anything wrong with your binary I/O code on a quick
inspection, but what I am really wondering is why do you bother with
having two I/O routines when you could just use the binary version for
both? That way you can debug it with text as well as binary PDF.

iText doesn't have anything to do with this as you already have the PDF.

bmcdougald · May 18, 2006

When I download the file via the Browser/Servlet the PDF is approx 72K.
When I create the .pdf through the standalone, it is approx 116K and
all text.

I do get the right output file names, so the "P" flag is working.

The text in the PDF is formatted correctly. However, when I do a
"type" command on the contents of the 72K file I get the PDF header
followed by a bunch of non-sensical characters, as expected with a
binary file. When I type the 116k version, I see the actual text with
no PDF header.

This portion of the website is not publically available at the moment.

I wrote a quick program to open up a good .pdf file as an input stream
and write it back out to another file using my binary I/O routine and
it works fine. Ouput file is binary, same file size, and opens
perfectly in Adobe.

Matt Humphrey · May 18, 2006

When I download the file via the Browser/Servlet the PDF is approx 72K.
When I create the .pdf through the standalone, it is approx 116K and
all text.

I do get the right output file names, so the "P" flag is working.

The text in the PDF is formatted correctly. However, when I do a
"type" command on the contents of the 72K file I get the PDF header
followed by a bunch of non-sensical characters, as expected with a
binary file. When I type the 116k version, I see the actual text with
no PDF header.

This portion of the website is not publically available at the moment.

I wrote a quick program to open up a good .pdf file as an input stream
and write it back out to another file using my binary I/O routine and
it works fine. Ouput file is binary, same file size, and opens
perfectly in Adobe.

This suggests that the web server is giving you different content or somehow
transforming the content. I'm not really up-to-date on what the
possibilities are for that problem. When you run your program, can you read
off the content header of what it thinks it is sending you? try
getContentEncoding () and getContentType () and see what they say.

Cheers,
Matt Humphrey (e-mail address removed) http://www.iviz.com/

bmcdougald · May 18, 2006

This could be true. In my servlet, I am intercepting the datastream
from the repository system. I then turn around use the
HttpServletResponse to stream the data down to the browser. I also set
the content-type and header properties of this object accordingly if it
is PDF or plain text.

bmcdougald · May 18, 2006

Content-Type is coming down as text/plain for the PDF, encoding is
null. Don't know why, it's the same URL I'm calling from my servlet.
For grins, maybe I should try using the getContentType from my servlet
and see what it is getting.

Oliver Wong · May 19, 2006

How does the "actual text" compare to the semantic contents of the PDF
file? Is it gibberish, or is it the text of the PDF, without formatting and
images and all that extra stuff?

This suggests that the web server is giving you different content or
somehow transforming the content. I'm not really up-to-date on what the
possibilities are for that problem. When you run your program, can you
read off the content header of what it thinks it is sending you? try
getContentEncoding () and getContentType () and see what they say.

I didn't fully follow the OP's original problem, but as part of the HTTP
protocol, the browser specifies what types of contents it's capable of
handling, and the webserver can customize it's output based on that.

So for example, a webserver might check if the browser claims it can
support PNG, and if so, it sends its images as PNG files. If not, it could
on-the-fly convert the PNG file to a JPG file before transmitting it (and
then cache the JPG files to speed up further requests).

It's conceivable that the webserver might check if the browser reports
PDF as one of its allowed types and if not, to actually parse the contents
of the PDF file, and convert it to an ASCII file.

However, I read something about "running it stand alone" which didn't
make sense to me in this context, so perhaps this isn't what's going on at
all. =)

- Oliver

Matt Humphrey · May 20, 2006

I didn't fully follow the OP's original problem, but as part of the
HTTP protocol, the browser specifies what types of contents it's capable
of handling, and the webserver can customize it's output based on that.

It's a program that reads either text or PDF documents from an existing web
server.

I had to look it up, but I think content negotiation is done by having the
browser specify an "Accept" header which contains the desired types. It's a
sophisticated scheme, but I think it can simply be set to "application/pdf"
to get the right result.

So for example, a webserver might check if the browser claims it can
support PNG, and if so, it sends its images as PNG files. If not, it could
on-the-fly convert the PNG file to a JPG file before transmitting it (and
then cache the JPG files to speed up further requests).

It's conceivable that the webserver might check if the browser reports
PDF as one of its allowed types and if not, to actually parse the contents
of the PDF file, and convert it to an ASCII file.

However, I read something about "running it stand alone" which didn't
make sense to me in this context, so perhaps this isn't what's going on at
all. =)

Actually, what you've described is what I think is happening, especially
since the OP reported that the web server's response to the PDF request has
content type "text/plain". I'm just not experienced with content
negotiation details.

Cheers,
Matt Humphrey (e-mail address removed) http://www.iviz.com/

Andy Flowers · May 20, 2006

I have written a servlet that makes an HTTP connection to our report
repository system and returns data to the calling browser in either
text or binary format. The binary formats returned are either Adobe
PDF's or Excel spreadsheets. This is working well and presents data
correctly to a user's browser and/or initiates an HTTP download session
in the browser, which, again, stores the data correctly as text or
binary.

Now, I want to write a batch driven backend java program (not servlet)
that will call the the report repository via the same method above, but
store the data as a file (text or PDF) in a directory somewhere on the
server. I can get the text data portion of the program to write to a
flat file correctly, but the PDF causes the Adobe Reader to fail when
the file is opened. When I inspect the .pdf file to see the contents,
it is all text, not binary as you would expect.

I used the same method for reading Bytes in this program function that
I used in my servlet. Only thing different is that the
DataOutputStream object pipes to a file instead of the
HttpServletResponse object in the servlet, and I'm not setting any
response headers with mime types and such.

What am I missing?

Here is my function code:

Have you considered using the Jakarta HttpClient classes,
http://jakarta.apache.org/commons/httpclient/

Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
Writing to file	3	Nov 30, 2011
Issue with passing fetched data to POST form. How can I?	0	Jul 23, 2023
problems reading binary file	14	Jun 9, 2010
Timing problem	4	May 1, 2023
Writing binary data (CString) to file	0	Nov 19, 2008
Collect Excel Data from Website	5	Apr 30, 2022
Write data to file	2	Feb 1, 2010

Problem Writing Binary Data Stream To File

bmcdougald

Rhino

bmcdougald

Rhino

Matt Humphrey

EJP

bmcdougald

Matt Humphrey

bmcdougald

bmcdougald

Oliver Wong

Matt Humphrey

Andy Flowers

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads