Extracting data from a document

Discussion in 'ASP General' started by GTN170777, Jun 27, 2008.

  1. GTN170777

    GTN170777 Guest

    Hi Guys,

    Not a problem with my code, but something I would like to add, (ASP
    VBScript) at the moment I have a form where a user uploads their details
    including a document (Doc, PDF, TXT, Docx) The document is uploaded to a
    folder on the server with the address being stored in the database and I'm
    tracking the user id through sessions.

    What I would like to do after the upload is redirect to a blank page, where
    some script extracts the data from the document and inserts it into another
    field on the database associated with the user id, I think this may be called
    parsing, but I'm at a complete loss, I don't suppose you guys have any ideas
    on this do you.

    I think this would probably make a really neat little extension also...

    Look forward to your responses.

    G
     
    GTN170777, Jun 27, 2008
    #1
    1. Advertising

  2. "GTN170777" <> wrote in message
    news:...
    > Hi Guys,
    >
    > Not a problem with my code, but something I would like to add, (ASP
    > VBScript) at the moment I have a form where a user uploads their details
    > including a document (Doc, PDF, TXT, Docx) The document is uploaded to a
    > folder on the server with the address being stored in the database and I'm
    > tracking the user id through sessions.
    >
    > What I would like to do after the upload is redirect to a blank page,
    > where
    > some script extracts the data from the document and inserts it into
    > another
    > field on the database associated with the user id, I think this may be
    > called
    > parsing, but I'm at a complete loss, I don't suppose you guys have any
    > ideas
    > on this do you.
    >
    > I think this would probably make a really neat little extension also...
    >


    Given that Classic ASP is no longer being developed, you are unlikely to get
    MS to consider any extensions to the framework. Also, how you obtain the
    contents of the file will differ enormously. A simple text file is easy.
    You just use the FileSystemObject to gain access to the text. A PDF is
    totally different, and there are a number of third party components
    available for messing around with PDFs. Microsoft haven't even provided a
    native way to deal with PDFs in the .NET framework, which is the technology
    they are now devoting all their development time on. You have to dig around
    for third party stuff there too.

    We use a number of third party components for text parsing, and some
    conditional code to identify the filetype, and then choose the component
    accordingly. However, they wouldn't be of any use to you as they are
    employed in a Delphi forms app.

    --
    Mike Brind
    Micrisift MVP - ASP/ASP.NET
     
    Mike Brind [MVP], Jun 27, 2008
    #2
    1. Advertising

  3. GTN170777

    Old Pedant Guest

    In addition to what Mike Brind said...

    You *can* use ASP/VBScript to "script" MS Word and then you can use various
    scripted commands within Word to locate specific text, etc.

    To say that's a pain in the neck is a gross understatement. The docs for
    doing this are poor, the inherent problems manifold. [Perhaps the easiest
    way to do this would be to open a document with Word and then ask to do a
    "Save as..." to a ".txt" file and then parse the resultant all-text file.]

    You'd probably be better off with PDF, thanks to a third party component
    named "AspPDF", but be forewarned that it's not cheap and it, also, has a
    pretty good learning curve needed.

    You are after one of the holy grails of database developers: The ability to
    do "data mining" on non-database, non-text files. And each file type has to
    be approached separately, using different tools, it seems. People make good
    money producing tools to do this stuff, and generally they don't sell the
    tools--they just sell the [expensive] service of doing the data mining for
    you.

    In short, if you are a newbie programmer, this probably isn't a project you
    want to try tackling, yet.
     
    Old Pedant, Jun 27, 2008
    #3
  4. "Old Pedant" <> wrote in message
    news:...
    > In addition to what Mike Brind said...
    >
    > You *can* use ASP/VBScript to "script" MS Word and then you can use
    > various
    > scripted commands within Word to locate specific text, etc.
    >
    > To say that's a pain in the neck is a gross understatement. The docs for
    > doing this are poor, the inherent problems manifold. [Perhaps the easiest
    > way to do this would be to open a document with Word and then ask to do a
    > "Save as..." to a ".txt" file and then parse the resultant all-text file.]
    >


    The Delphi bods here use Word as a COM object and cause anything that isn't
    a PDF to open in Word. That's ok on a desktop, where the user is able to
    dismiss any dialogue or message boxes that might be instantiated, thus
    allowing the app to close, but you can imagine what will happen if these
    message boxes open on a web server (on Rack #364 in some unmanned room deep
    in the bowels of some Data Centre God knows where...). That's one of the
    primary reasons MS advise against automating Word in web applications.

    --
    Mike Brind
    Microsoft MVP - ASP/ASP.NET
     
    Mike Brind [MVP], Jun 28, 2008
    #4
  5. GTN170777

    GTN170777 Guest

    Thanks for our input guys, you've made me re think the idea!!!, I guess for
    the project that we're working on it would be a nice add on.... I'm sure the
    geniuses at MS will come up with something that makes the process a little
    less hair pulling in a couple of years or so, and that will be the time to
    add it,..... till then it's a nice add on, that can wait.

    Thanks both...

    GTN

    "Mike Brind [MVP]" wrote:

    >
    > "Old Pedant" <> wrote in message
    > news:...
    > > In addition to what Mike Brind said...
    > >
    > > You *can* use ASP/VBScript to "script" MS Word and then you can use
    > > various
    > > scripted commands within Word to locate specific text, etc.
    > >
    > > To say that's a pain in the neck is a gross understatement. The docs for
    > > doing this are poor, the inherent problems manifold. [Perhaps the easiest
    > > way to do this would be to open a document with Word and then ask to do a
    > > "Save as..." to a ".txt" file and then parse the resultant all-text file.]
    > >

    >
    > The Delphi bods here use Word as a COM object and cause anything that isn't
    > a PDF to open in Word. That's ok on a desktop, where the user is able to
    > dismiss any dialogue or message boxes that might be instantiated, thus
    > allowing the app to close, but you can imagine what will happen if these
    > message boxes open on a web server (on Rack #364 in some unmanned room deep
    > in the bowels of some Data Centre God knows where...). That's one of the
    > primary reasons MS advise against automating Word in web applications.
    >
    > --
    > Mike Brind
    > Microsoft MVP - ASP/ASP.NET
    >
    >
    >
     
    GTN170777, Jun 28, 2008
    #5
  6. GTN170777 wrote:
    > Thanks for our input guys, you've made me re think the idea!!!, I
    > guess for the project that we're working on it would be a nice add
    > on.... I'm sure the geniuses at MS will come up with something that
    > makes the process a little less hair pulling in a couple of years or
    > so,

    Don't count on it. They've had 30+ yrs now ...
    --
    Microsoft MVP - ASP/ASP.NET
    Please reply to the newsgroup. This email account is my spam trap so I
    don't check it very often. If you must reply off-line, then remove the
    "NO SPAM"
     
    Bob Barrows [MVP], Jun 28, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?S2V2aW4gSw==?=
    Replies:
    2
    Views:
    2,893
    =?Utf-8?B?S2V2aW4gSw==?=
    Apr 6, 2006
  2. Max
    Replies:
    6
    Views:
    6,086
    Malcolm Dew-Jones
    Sep 17, 2004
  3. Ken
    Replies:
    8
    Views:
    6,856
    Patrick TJ McPhee
    Nov 30, 2003
  4. amit
    Replies:
    0
    Views:
    353
  5. Cognizance
    Replies:
    1
    Views:
    100
    McKirahan
    May 23, 2005
Loading...

Share This Page