Parsing text file with ASP

Discussion in 'ASP General' started by SROSeaner, Sep 26, 2004.

  1. SROSeaner

    SROSeaner Guest

    I have a text file that is the result of using XMLHTTP object to pull back a
    page of search results from a search engine.

    So I have the entire results page in HTML, and want to break out each hit
    result from the text file as a unique item and do what I want with each hit
    result.

    Is there any suggested algorithms or any other techniques I could be
    directed to?
     
    SROSeaner, Sep 26, 2004
    #1
    1. Advertising

  2. What exactly is a "hit result?" As far as what you want to do, it'd all
    depend on what the html looks like and how consistent it remains. Do you
    have control over this remote source? Or is it some other site that can
    change on any given day without any forewarning?

    Ray at home

    "SROSeaner" <> wrote in message
    news:...
    >I have a text file that is the result of using XMLHTTP object to pull back
    >a
    > page of search results from a search engine.
    >
    > So I have the entire results page in HTML, and want to break out each hit
    > result from the text file as a unique item and do what I want with each
    > hit
    > result.
    >
    > Is there any suggested algorithms or any other techniques I could be
    > directed to?
     
    Ray Costanzo [MVP], Sep 27, 2004
    #2
    1. Advertising

  3. SROSeaner

    SROSeaner Guest

    Actually, all I really need to do is pull out any text in the HTML text that
    is a web site address, so, in the form of http://www._____.__ or starting
    with www.

    I think I know how to find that, by using InStr and passing it http: (for
    example) as the text to look for, but, that will only give me the starting
    point of the address correct?

    "Ray Costanzo [MVP]" wrote:

    > What exactly is a "hit result?" As far as what you want to do, it'd all
    > depend on what the html looks like and how consistent it remains. Do you
    > have control over this remote source? Or is it some other site that can
    > change on any given day without any forewarning?
    >
    > Ray at home
    >
    > "SROSeaner" <> wrote in message
    > news:...
    > >I have a text file that is the result of using XMLHTTP object to pull back
    > >a
    > > page of search results from a search engine.
    > >
    > > So I have the entire results page in HTML, and want to break out each hit
    > > result from the text file as a unique item and do what I want with each
    > > hit
    > > result.
    > >
    > > Is there any suggested algorithms or any other techniques I could be
    > > directed to?

    >
    >
    >
     
    SROSeaner, Sep 28, 2004
    #3
  4. Yes, that'd give you the starting point. The best you can do is have your
    code make an educated guess about things when you have no idea what kind of
    data will be thrown at it.

    If the string contains:

    <a href="http://something.com">click me</a>, should it be ignored because
    there's no WWW? Should your code assume that as soon as it finds a ", then
    then that is the end of the domain? What about a carriage return? What
    about a < character? What about when it's in a sentence in the document,
    eg.

    Most Web site addresses start with http://www.

    Should that be found?

    There are lots of variables to deal with, and all you can really do is hope
    for accuracy.

    Ray at work


    "SROSeaner" <> wrote in message
    news:...
    > Actually, all I really need to do is pull out any text in the HTML text
    > that
    > is a web site address, so, in the form of http://www._____.__ or starting
    > with www.
    >
    > I think I know how to find that, by using InStr and passing it http: (for
    > example) as the text to look for, but, that will only give me the starting
    > point of the address correct?
    >
    > "Ray Costanzo [MVP]" wrote:
    >
    >> What exactly is a "hit result?" As far as what you want to do, it'd all
    >> depend on what the html looks like and how consistent it remains. Do you
    >> have control over this remote source? Or is it some other site that can
    >> change on any given day without any forewarning?
    >>
    >> Ray at home
    >>
    >> "SROSeaner" <> wrote in message
    >> news:...
    >> >I have a text file that is the result of using XMLHTTP object to pull
    >> >back
    >> >a
    >> > page of search results from a search engine.
    >> >
    >> > So I have the entire results page in HTML, and want to break out each
    >> > hit
    >> > result from the text file as a unique item and do what I want with each
    >> > hit
    >> > result.
    >> >
    >> > Is there any suggested algorithms or any other techniques I could be
    >> > directed to?

    >>
    >>
    >>
     
    Ray Costanzo [MVP], Sep 28, 2004
    #4
  5. SROSeaner

    Patrice Guest

    You have DOM parsers available but your code will break if the architecture
    of the page change. I would rather use an API or a "service" if
    available....

    Patrice

    --

    "SROSeaner" <> a écrit dans le message de
    news:...
    > Actually, all I really need to do is pull out any text in the HTML text

    that
    > is a web site address, so, in the form of http://www._____.__ or starting
    > with www.
    >
    > I think I know how to find that, by using InStr and passing it http: (for
    > example) as the text to look for, but, that will only give me the starting
    > point of the address correct?
    >
    > "Ray Costanzo [MVP]" wrote:
    >
    > > What exactly is a "hit result?" As far as what you want to do, it'd all
    > > depend on what the html looks like and how consistent it remains. Do

    you
    > > have control over this remote source? Or is it some other site that can
    > > change on any given day without any forewarning?
    > >
    > > Ray at home
    > >
    > > "SROSeaner" <> wrote in message
    > > news:...
    > > >I have a text file that is the result of using XMLHTTP object to pull

    back
    > > >a
    > > > page of search results from a search engine.
    > > >
    > > > So I have the entire results page in HTML, and want to break out each

    hit
    > > > result from the text file as a unique item and do what I want with

    each
    > > > hit
    > > > result.
    > > >
    > > > Is there any suggested algorithms or any other techniques I could be
    > > > directed to?

    > >
    > >
    > >
     
    Patrice, Sep 28, 2004
    #5
  6. SROSeaner

    SROSeaner Guest

    Thanks for your help guys. I figure I will just have to code it in a way to
    take care of all the variables in such a situation.

    "Patrice" wrote:

    > You have DOM parsers available but your code will break if the architecture
    > of the page change. I would rather use an API or a "service" if
    > available....
    >
    > Patrice
    >
    > --
    >
    > "SROSeaner" <> a écrit dans le message de
    > news:...
    > > Actually, all I really need to do is pull out any text in the HTML text

    > that
    > > is a web site address, so, in the form of http://www._____.__ or starting
    > > with www.
    > >
    > > I think I know how to find that, by using InStr and passing it http: (for
    > > example) as the text to look for, but, that will only give me the starting
    > > point of the address correct?
    > >
    > > "Ray Costanzo [MVP]" wrote:
    > >
    > > > What exactly is a "hit result?" As far as what you want to do, it'd all
    > > > depend on what the html looks like and how consistent it remains. Do

    > you
    > > > have control over this remote source? Or is it some other site that can
    > > > change on any given day without any forewarning?
    > > >
    > > > Ray at home
    > > >
    > > > "SROSeaner" <> wrote in message
    > > > news:...
    > > > >I have a text file that is the result of using XMLHTTP object to pull

    > back
    > > > >a
    > > > > page of search results from a search engine.
    > > > >
    > > > > So I have the entire results page in HTML, and want to break out each

    > hit
    > > > > result from the text file as a unique item and do what I want with

    > each
    > > > > hit
    > > > > result.
    > > > >
    > > > > Is there any suggested algorithms or any other techniques I could be
    > > > > directed to?
    > > >
    > > >
    > > >

    >
    >
    >
     
    SROSeaner, Sep 28, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. GIMME
    Replies:
    2
    Views:
    935
    GIMME
    Feb 11, 2004
  2. .Net Sports
    Replies:
    11
    Views:
    1,479
    .Net Sports
    Jan 17, 2006
  3. Naren
    Replies:
    0
    Views:
    615
    Naren
    May 11, 2004
  4. Kai Schlamp
    Replies:
    1
    Views:
    432
    Arne Vajhøj
    Mar 27, 2008
  5. Domenico Discepola

    Assistance parsing text file using Text::CSV_XS

    Domenico Discepola, Sep 1, 2004, in forum: Perl Misc
    Replies:
    6
    Views:
    480
    Domenico Discepola
    Sep 2, 2004
Loading...

Share This Page