Problem Writing Binary Data Stream To File

B

bmcdougald

I have written a servlet that makes an HTTP connection to our report
repository system and returns data to the calling browser in either
text or binary format. The binary formats returned are either Adobe
PDF's or Excel spreadsheets. This is working well and presents data
correctly to a user's browser and/or initiates an HTTP download session
in the browser, which, again, stores the data correctly as text or
binary.

Now, I want to write a batch driven backend java program (not servlet)
that will call the the report repository via the same method above, but
store the data as a file (text or PDF) in a directory somewhere on the
server. I can get the text data portion of the program to write to a
flat file correctly, but the PDF causes the Adobe Reader to fail when
the file is opened. When I inspect the .pdf file to see the contents,
it is all text, not binary as you would expect.

I used the same method for reading Bytes in this program function that
I used in my servlet. Only thing different is that the
DataOutputStream object pipes to a file instead of the
HttpServletResponse object in the servlet, and I'm not setting any
response headers with mime types and such.

What am I missing?


Here is my function code:

static void callUrl( String sType, String sSession,
String sRptName, String sRid,
String sIndexes, String sIPAddr, String sFolder){


byte[] buffer = new byte[8192]; //8k page
boolean binaryFlag = false;

BufferedInputStream in;
BufferedReader ir;

String sUrl;
String inString;
String sFilename = "", sExt = "";


sUrl="http://" + sIPAddr +
"/webaccess/bmc-ctd-wa-cgi.exe?0=report&sid=" + sSession +
"&rid=" + sRid + "&index=" + sIndexes +
"&mode=External&errorflowelem=onerrorxml%2Etxt";

try{

URL url = new URL(sUrl);
URLConnection uc = url.openConnection();

uc.setDoOutput(true);
uc.setDoInput(true);
uc.setAllowUserInteraction(false);


/*
* Is Report a PDF - set binary flag true
*/


if(sType.equals("P")){
binaryFlag = true;
sFilename=sRptName +".pdf";
sExt="PDF";
sUrl += "&18=Txt_P_2_Pdf_D";
}


/*
* Is Report a TXT file - set binary flag flase
*/


if(sType.equals("T")){
binaryFlag = false;
sExt="TXT";
sFilename=sRptName +".TXT";
}


sFilename = "x:/users/GS/"+sFolder+"/"+sExt+"/"+sFilename;

File aFile = new File(sFilename);

aFile.createNewFile();

/* Build data stream pipe to output file */

DataOutputStream myStream = new DataOutputStream(
new FileOutputStream(aFile));


if (binaryFlag == true){

/*
* write binary data to output file *** NOT WORKING ***
*/

in = new BufferedInputStream(url.openStream());


while (true) {


int nBytes = in.read(buffer);

if (nBytes < 0) break; // EOF ?

myStream.write(buffer,0,nBytes); // write binary
data to file


}


in.close();

}else{


/*
* write plain text data to output file *** WORKING ***
*/

ir = new BufferedReader(
new InputStreamReader(url.openStream()));

while ( (inString = ir.readLine()) != null ) {

myStream.writeChars(inString);

}


ir.close();

}

myStream.flush();
myStream.close();


}catch (MalformedURLException malformed)
{
System.out.println("Malformed URL");
malformed.printStackTrace();
} // end catch malformed
catch (IOException ioe)
{
System.out.println("BAD IO");
ioe.printStackTrace();
} // end catch IO
catch (Exception e)
{
System.out.println("Exception");
e.printStackTrace();
} //
}
 
R

Rhino

I have written a servlet that makes an HTTP connection to our report
repository system and returns data to the calling browser in either
text or binary format. The binary formats returned are either Adobe
PDF's or Excel spreadsheets. This is working well and presents data
correctly to a user's browser and/or initiates an HTTP download session
in the browser, which, again, stores the data correctly as text or
binary.

Now, I want to write a batch driven backend java program (not servlet)
that will call the the report repository via the same method above, but
store the data as a file (text or PDF) in a directory somewhere on the
server. I can get the text data portion of the program to write to a
flat file correctly, but the PDF causes the Adobe Reader to fail when
the file is opened. When I inspect the .pdf file to see the contents,
it is all text, not binary as you would expect.

I used the same method for reading Bytes in this program function that
I used in my servlet. Only thing different is that the
DataOutputStream object pipes to a file instead of the
HttpServletResponse object in the servlet, and I'm not setting any
response headers with mime types and such.

What am I missing?


Here is my function code:

static void callUrl( String sType, String sSession,
String sRptName, String sRid,
String sIndexes, String sIPAddr, String sFolder){


byte[] buffer = new byte[8192]; //8k page
boolean binaryFlag = false;

BufferedInputStream in;
BufferedReader ir;

String sUrl;
String inString;
String sFilename = "", sExt = "";


sUrl="http://" + sIPAddr +
"/webaccess/bmc-ctd-wa-cgi.exe?0=report&sid=" + sSession +
"&rid=" + sRid + "&index=" + sIndexes +
"&mode=External&errorflowelem=onerrorxml%2Etxt";

try{

URL url = new URL(sUrl);
URLConnection uc = url.openConnection();

uc.setDoOutput(true);
uc.setDoInput(true);
uc.setAllowUserInteraction(false);


/*
* Is Report a PDF - set binary flag true
*/


if(sType.equals("P")){
binaryFlag = true;
sFilename=sRptName +".pdf";
sExt="PDF";
sUrl += "&18=Txt_P_2_Pdf_D";
}


/*
* Is Report a TXT file - set binary flag flase
*/


if(sType.equals("T")){
binaryFlag = false;
sExt="TXT";
sFilename=sRptName +".TXT";
}


sFilename = "x:/users/GS/"+sFolder+"/"+sExt+"/"+sFilename;

File aFile = new File(sFilename);

aFile.createNewFile();

/* Build data stream pipe to output file */

DataOutputStream myStream = new DataOutputStream(
new FileOutputStream(aFile));


if (binaryFlag == true){

/*
* write binary data to output file *** NOT WORKING ***
*/

in = new BufferedInputStream(url.openStream());


while (true) {


int nBytes = in.read(buffer);

if (nBytes < 0) break; // EOF ?

myStream.write(buffer,0,nBytes); // write binary
data to file


}


in.close();

}else{


/*
* write plain text data to output file *** WORKING ***
*/

ir = new BufferedReader(
new InputStreamReader(url.openStream()));

while ( (inString = ir.readLine()) != null ) {

myStream.writeChars(inString);

}


ir.close();

}

myStream.flush();
myStream.close();


}catch (MalformedURLException malformed)
{
System.out.println("Malformed URL");
malformed.printStackTrace();
} // end catch malformed
catch (IOException ioe)
{
System.out.println("BAD IO");
ioe.printStackTrace();
} // end catch IO
catch (Exception e)
{
System.out.println("Exception");
e.printStackTrace();
} //
}

I haven't looked at your code, I just read the description of your problem.
You may want to consider using iText to compose your PDF. I've used it for a
few things and it creates PDFs that are easily read by Adobe Acrobat and/or
browsers. It's also quite easy to use and well-supported by the developers.
You can find out more at http://www.lowagie.com/iText/. I should mention
that I haven't tried to access the documents created with iText from a
servlet but I'd be very surprised if there was a problem in doing that.

Why re-invent the wheel?
 
B

bmcdougald

Thanks, I'll look at this product.

However, just to clarify, the datastream coming from the report
repository system is already in either binary PDF or ascii TXT format.
My standalone program just redirects that datastream into a file,
rather than passing it along to a browser as was the case in my
servlet.
 
R

Rhino

Thanks, I'll look at this product.

However, just to clarify, the datastream coming from the report
repository system is already in either binary PDF or ascii TXT format.
My standalone program just redirects that datastream into a file,
rather than passing it along to a browser as was the case in my
servlet.

I just suggested iText because, if you use it, you shouldn't have the
reading problems you're getting now. I can imagine that may be too radical a
solution for you and that you'd prefer to fix the code you posted.
Unfortunately, I don't have enough experience with the kind of thing you're
doing to figure out what's going wrong for you.
 
M

Matt Humphrey

I have written a servlet that makes an HTTP connection to our report
repository system and returns data to the calling browser in either
text or binary format. The binary formats returned are either Adobe
PDF's or Excel spreadsheets. This is working well and presents data
correctly to a user's browser and/or initiates an HTTP download session
in the browser, which, again, stores the data correctly as text or
binary.

Now, I want to write a batch driven backend java program (not servlet)
that will call the the report repository via the same method above, but
store the data as a file (text or PDF) in a directory somewhere on the
server. I can get the text data portion of the program to write to a
flat file correctly, but the PDF causes the Adobe Reader to fail when
the file is opened. When I inspect the .pdf file to see the contents,
it is all text, not binary as you would expect.

I used the same method for reading Bytes in this program function that
I used in my servlet. Only thing different is that the
DataOutputStream object pipes to a file instead of the
HttpServletResponse object in the servlet, and I'm not setting any
response headers with mime types and such.

What am I missing?

Nothing jumps out at me as to why this is not working. Aside from thinking
that sType does not really equal "P", this case raises some questions to me.
How much data is written to the pdf in comparison to the actual size of the
real pdf? (Say, if you download from the URL directly and SaveAs.) Do you
get exactly the right output file names? What type does the content-header
say the PDF is? When you say the PDF looks like text (which it can) how
closely does it resemble the real pdf? Is it totally different text (and
what does it say?) or are some characters corrupt? Are \r \n preserved
correctly?

I would try out your code, but without knowing exactly what your parameters
are I can't really test it. Is the web site publically accessible? I wonder
if webserver is returning a different result because of some other content
acceptor or parameter is not specified.

Cheers,
Matt Humphrey (e-mail address removed) http://www.iviz.com/
 
E

EJP

I can't see anything wrong with your binary I/O code on a quick
inspection, but what I am really wondering is why do you bother with
having two I/O routines when you could just use the binary version for
both? That way you can debug it with text as well as binary PDF.

iText doesn't have anything to do with this as you already have the PDF.
 
B

bmcdougald

When I download the file via the Browser/Servlet the PDF is approx 72K.
When I create the .pdf through the standalone, it is approx 116K and
all text.

I do get the right output file names, so the "P" flag is working.

The text in the PDF is formatted correctly. However, when I do a
"type" command on the contents of the 72K file I get the PDF header
followed by a bunch of non-sensical characters, as expected with a
binary file. When I type the 116k version, I see the actual text with
no PDF header.

This portion of the website is not publically available at the moment.

I wrote a quick program to open up a good .pdf file as an input stream
and write it back out to another file using my binary I/O routine and
it works fine. Ouput file is binary, same file size, and opens
perfectly in Adobe.
 
M

Matt Humphrey

When I download the file via the Browser/Servlet the PDF is approx 72K.
When I create the .pdf through the standalone, it is approx 116K and
all text.

I do get the right output file names, so the "P" flag is working.

The text in the PDF is formatted correctly. However, when I do a
"type" command on the contents of the 72K file I get the PDF header
followed by a bunch of non-sensical characters, as expected with a
binary file. When I type the 116k version, I see the actual text with
no PDF header.

This portion of the website is not publically available at the moment.

I wrote a quick program to open up a good .pdf file as an input stream
and write it back out to another file using my binary I/O routine and
it works fine. Ouput file is binary, same file size, and opens
perfectly in Adobe.

This suggests that the web server is giving you different content or somehow
transforming the content. I'm not really up-to-date on what the
possibilities are for that problem. When you run your program, can you read
off the content header of what it thinks it is sending you? try
getContentEncoding () and getContentType () and see what they say.

Cheers,
Matt Humphrey (e-mail address removed) http://www.iviz.com/
 
B

bmcdougald

This could be true. In my servlet, I am intercepting the datastream
from the repository system. I then turn around use the
HttpServletResponse to stream the data down to the browser. I also set
the content-type and header properties of this object accordingly if it
is PDF or plain text.
 
B

bmcdougald

Content-Type is coming down as text/plain for the PDF, encoding is
null. Don't know why, it's the same URL I'm calling from my servlet.
For grins, maybe I should try using the getContentType from my servlet
and see what it is getting.
 
O

Oliver Wong

How does the "actual text" compare to the semantic contents of the PDF
file? Is it gibberish, or is it the text of the PDF, without formatting and
images and all that extra stuff?
This suggests that the web server is giving you different content or
somehow transforming the content. I'm not really up-to-date on what the
possibilities are for that problem. When you run your program, can you
read off the content header of what it thinks it is sending you? try
getContentEncoding () and getContentType () and see what they say.

I didn't fully follow the OP's original problem, but as part of the HTTP
protocol, the browser specifies what types of contents it's capable of
handling, and the webserver can customize it's output based on that.

So for example, a webserver might check if the browser claims it can
support PNG, and if so, it sends its images as PNG files. If not, it could
on-the-fly convert the PNG file to a JPG file before transmitting it (and
then cache the JPG files to speed up further requests).

It's conceivable that the webserver might check if the browser reports
PDF as one of its allowed types and if not, to actually parse the contents
of the PDF file, and convert it to an ASCII file.

However, I read something about "running it stand alone" which didn't
make sense to me in this context, so perhaps this isn't what's going on at
all. =)

- Oliver
 
M

Matt Humphrey

I didn't fully follow the OP's original problem, but as part of the
HTTP protocol, the browser specifies what types of contents it's capable
of handling, and the webserver can customize it's output based on that.

It's a program that reads either text or PDF documents from an existing web
server.

I had to look it up, but I think content negotiation is done by having the
browser specify an "Accept" header which contains the desired types. It's a
sophisticated scheme, but I think it can simply be set to "application/pdf"
to get the right result.
So for example, a webserver might check if the browser claims it can
support PNG, and if so, it sends its images as PNG files. If not, it could
on-the-fly convert the PNG file to a JPG file before transmitting it (and
then cache the JPG files to speed up further requests).

It's conceivable that the webserver might check if the browser reports
PDF as one of its allowed types and if not, to actually parse the contents
of the PDF file, and convert it to an ASCII file.

However, I read something about "running it stand alone" which didn't
make sense to me in this context, so perhaps this isn't what's going on at
all. =)

Actually, what you've described is what I think is happening, especially
since the OP reported that the web server's response to the PDF request has
content type "text/plain". I'm just not experienced with content
negotiation details.

Cheers,
Matt Humphrey (e-mail address removed) http://www.iviz.com/
 
A

Andy Flowers

I have written a servlet that makes an HTTP connection to our report
repository system and returns data to the calling browser in either
text or binary format. The binary formats returned are either Adobe
PDF's or Excel spreadsheets. This is working well and presents data
correctly to a user's browser and/or initiates an HTTP download session
in the browser, which, again, stores the data correctly as text or
binary.

Now, I want to write a batch driven backend java program (not servlet)
that will call the the report repository via the same method above, but
store the data as a file (text or PDF) in a directory somewhere on the
server. I can get the text data portion of the program to write to a
flat file correctly, but the PDF causes the Adobe Reader to fail when
the file is opened. When I inspect the .pdf file to see the contents,
it is all text, not binary as you would expect.

I used the same method for reading Bytes in this program function that
I used in my servlet. Only thing different is that the
DataOutputStream object pipes to a file instead of the
HttpServletResponse object in the servlet, and I'm not setting any
response headers with mime types and such.

What am I missing?


Here is my function code:

Have you considered using the Jakarta HttpClient classes,
http://jakarta.apache.org/commons/httpclient/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top