Extracting data from a document

G

GTN170777

Hi Guys,

Not a problem with my code, but something I would like to add, (ASP
VBScript) at the moment I have a form where a user uploads their details
including a document (Doc, PDF, TXT, Docx) The document is uploaded to a
folder on the server with the address being stored in the database and I'm
tracking the user id through sessions.

What I would like to do after the upload is redirect to a blank page, where
some script extracts the data from the document and inserts it into another
field on the database associated with the user id, I think this may be called
parsing, but I'm at a complete loss, I don't suppose you guys have any ideas
on this do you.

I think this would probably make a really neat little extension also...

Look forward to your responses.

G
 
M

Mike Brind [MVP]

GTN170777 said:
Hi Guys,

Not a problem with my code, but something I would like to add, (ASP
VBScript) at the moment I have a form where a user uploads their details
including a document (Doc, PDF, TXT, Docx) The document is uploaded to a
folder on the server with the address being stored in the database and I'm
tracking the user id through sessions.

What I would like to do after the upload is redirect to a blank page,
where
some script extracts the data from the document and inserts it into
another
field on the database associated with the user id, I think this may be
called
parsing, but I'm at a complete loss, I don't suppose you guys have any
ideas
on this do you.

I think this would probably make a really neat little extension also...

Given that Classic ASP is no longer being developed, you are unlikely to get
MS to consider any extensions to the framework. Also, how you obtain the
contents of the file will differ enormously. A simple text file is easy.
You just use the FileSystemObject to gain access to the text. A PDF is
totally different, and there are a number of third party components
available for messing around with PDFs. Microsoft haven't even provided a
native way to deal with PDFs in the .NET framework, which is the technology
they are now devoting all their development time on. You have to dig around
for third party stuff there too.

We use a number of third party components for text parsing, and some
conditional code to identify the filetype, and then choose the component
accordingly. However, they wouldn't be of any use to you as they are
employed in a Delphi forms app.
 
O

Old Pedant

In addition to what Mike Brind said...

You *can* use ASP/VBScript to "script" MS Word and then you can use various
scripted commands within Word to locate specific text, etc.

To say that's a pain in the neck is a gross understatement. The docs for
doing this are poor, the inherent problems manifold. [Perhaps the easiest
way to do this would be to open a document with Word and then ask to do a
"Save as..." to a ".txt" file and then parse the resultant all-text file.]

You'd probably be better off with PDF, thanks to a third party component
named "AspPDF", but be forewarned that it's not cheap and it, also, has a
pretty good learning curve needed.

You are after one of the holy grails of database developers: The ability to
do "data mining" on non-database, non-text files. And each file type has to
be approached separately, using different tools, it seems. People make good
money producing tools to do this stuff, and generally they don't sell the
tools--they just sell the [expensive] service of doing the data mining for
you.

In short, if you are a newbie programmer, this probably isn't a project you
want to try tackling, yet.
 
M

Mike Brind [MVP]

Old Pedant said:
In addition to what Mike Brind said...

You *can* use ASP/VBScript to "script" MS Word and then you can use
various
scripted commands within Word to locate specific text, etc.

To say that's a pain in the neck is a gross understatement. The docs for
doing this are poor, the inherent problems manifold. [Perhaps the easiest
way to do this would be to open a document with Word and then ask to do a
"Save as..." to a ".txt" file and then parse the resultant all-text file.]

The Delphi bods here use Word as a COM object and cause anything that isn't
a PDF to open in Word. That's ok on a desktop, where the user is able to
dismiss any dialogue or message boxes that might be instantiated, thus
allowing the app to close, but you can imagine what will happen if these
message boxes open on a web server (on Rack #364 in some unmanned room deep
in the bowels of some Data Centre God knows where...). That's one of the
primary reasons MS advise against automating Word in web applications.
 
G

GTN170777

Thanks for our input guys, you've made me re think the idea!!!, I guess for
the project that we're working on it would be a nice add on.... I'm sure the
geniuses at MS will come up with something that makes the process a little
less hair pulling in a couple of years or so, and that will be the time to
add it,..... till then it's a nice add on, that can wait.

Thanks both...

GTN

Mike Brind said:
Old Pedant said:
In addition to what Mike Brind said...

You *can* use ASP/VBScript to "script" MS Word and then you can use
various
scripted commands within Word to locate specific text, etc.

To say that's a pain in the neck is a gross understatement. The docs for
doing this are poor, the inherent problems manifold. [Perhaps the easiest
way to do this would be to open a document with Word and then ask to do a
"Save as..." to a ".txt" file and then parse the resultant all-text file.]

The Delphi bods here use Word as a COM object and cause anything that isn't
a PDF to open in Word. That's ok on a desktop, where the user is able to
dismiss any dialogue or message boxes that might be instantiated, thus
allowing the app to close, but you can imagine what will happen if these
message boxes open on a web server (on Rack #364 in some unmanned room deep
in the bowels of some Data Centre God knows where...). That's one of the
primary reasons MS advise against automating Word in web applications.
 
B

Bob Barrows [MVP]

GTN170777 said:
Thanks for our input guys, you've made me re think the idea!!!, I
guess for the project that we're working on it would be a nice add
on.... I'm sure the geniuses at MS will come up with something that
makes the process a little less hair pulling in a couple of years or
so,
Don't count on it. They've had 30+ yrs now ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top