Automating Searches

Discussion in 'Java' started by nowwho@gmail.com, Jan 3, 2007.

  1. Guest

    Hey,

    New to Java! Trying to automate searhing Google, Yahoo, MSN, AOL
    and Ask by sending queries to those engines using a java program and
    storing the returned URL's in a MYSQL database. The program will open a
    text file, upload the first line as the query, connect to each of the
    search engines, and store URL's in a table called "Results_Table" which
    has the following columns:

    Search_Eng - This would record the searh engine name
    Query - This would record the query text
    Returned_URL - This is the URL thathe search engine returned
    URL_Num - This is the Number of the URL's position from the search
    engine.

    Is it possible to do this and store the first 100 URL's the query
    returns from each search engine?

    Thanks!
    , Jan 3, 2007
    #1
    1. Advertising

  2. wrote:
    ....
    > New to Java! Trying to automate searhing Google,


    See the Google search API, but be prepared to pay,
    for anything beyond the nominal numbers of queries
    the Google API permits for free.

    >...Yahoo, MSN, AOL and Ask ...


    Dunno.. Aren't most of them using data from
    Google, in any case?

    >...by sending queries to those engines using a java program and
    > storing the returned URL's in a MYSQL database.


    Why would your users prefer to query your DB, than
    query Google directly (for up to the moment data)?

    >...The program will open a
    > text file, upload the first line as the query, connect to each of the
    > search engines,

    .....
    > Is it possible to do this and store the first 100 URL's the query
    > returns from each search engine?


    Certainly - through whatever public API the search
    engine offers - talk to their tech. departments and
    they'll most probably instruct you how to get the
    data as XML (or something else conveniently as
    portable and easily parsable).

    Andrew T.
    Andrew Thompson, Jan 3, 2007
    #2
    1. Advertising

  3. Daniel Pitts Guest

    Andrew Thompson wrote:
    > wrote:
    > ...
    > > New to Java! Trying to automate searhing Google,

    >
    > See the Google search API, but be prepared to pay,
    > for anything beyond the nominal numbers of queries
    > the Google API permits for free.
    >
    > >...Yahoo, MSN, AOL and Ask ...

    >
    > Dunno.. Aren't most of them using data from
    > Google, in any case?
    >
    > >...by sending queries to those engines using a java program and
    > > storing the returned URL's in a MYSQL database.

    >
    > Why would your users prefer to query your DB, than
    > query Google directly (for up to the moment data)?
    >
    > >...The program will open a
    > > text file, upload the first line as the query, connect to each of the
    > > search engines,

    > ....
    > > Is it possible to do this and store the first 100 URL's the query
    > > returns from each search engine?

    >
    > Certainly - through whatever public API the search
    > engine offers - talk to their tech. departments and
    > they'll most probably instruct you how to get the
    > data as XML (or something else conveniently as
    > portable and easily parsable).
    >
    > Andrew T.


    Also, make sure you read the terms of use for all those services.

    Although, I do wonder why you would want to store search results in a
    database.
    Its not that hard to make a data scrapper, and just use the website
    directly. But Google DOES give you an API to do it more easily.
    Daniel Pitts, Jan 3, 2007
    #3
  4. "Andrew Thompson" <> wrote in message
    news:...
    >>...Yahoo, MSN, AOL and Ask ...

    >
    > Dunno.. Aren't most of them using data from
    > Google, in any case?


    Um . . . Certainly Yahoo and MSN are not.

    --
    LTP

    :)
    Luc The Perverse, Jan 4, 2007
    #4
  5. Luc The Perverse wrote:
    > "Andrew Thompson" <> wrote in message
    > news:...
    > >>...Yahoo, MSN, AOL and Ask ...

    > >
    > > Dunno.. Aren't most of them using data from
    > > Google, in any case?

    >
    > Um . . . Certainly Yahoo and MSN are not.


    OK - I see lots of hits for MSN bots in my server logs,
    but not one for Yahoo. What does it's bot identify itself
    as?

    Andrew T.
    Andrew Thompson, Jan 4, 2007
    #5
  6. Andrew Thompson wrote:
    > Luc The Perverse wrote:
    >> "Andrew Thompson" <> wrote in message
    >> news:...
    >>>> ...Yahoo, MSN, AOL and Ask ...
    >>> Dunno.. Aren't most of them using data from
    >>> Google, in any case?

    >> Um . . . Certainly Yahoo and MSN are not.

    >
    > OK - I see lots of hits for MSN bots in my server logs,
    > but not one for Yahoo. What does it's bot identify itself
    > as?
    >
    > Andrew T.
    >

    Look for: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/)

    --
    TechBookReport Java - http://www.techbookreport.com/JavaIndex.html
    TechBookReport, Jan 4, 2007
    #6
  7. TechBookReport wrote:
    > Andrew Thompson wrote:

    ...
    > > ..I see lots of hits for MSN bots in my server logs,
    > > but not one for Yahoo.

    ....
    > Look for: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/)


    OK - I see them now..

    Yahoo! - 9246
    msn - 21457
    goog - 7638

    I was surprised I did not find them on the first search..
    Must have been something stupid I did.. (shrugs)

    BTW - nice to see you 'about the place' again..
    I think of you whenever somebody asks after books,
    but a quick, very tentative, search failed to lay an URL
    on your site. I'll bookmark it.

    Andrew T.
    Andrew Thompson, Jan 4, 2007
    #7
  8. Daniel Pitts wrote:
    > Although, I do wonder why you would want to store search results in a
    > database.
    > Its not that hard to make a data scrapper, and just use the website
    > directly. But Google DOES give you an API to do it more easily.


    Yeah, but using that API (at least very much) is expensive. Scraping the
    results after submitting a normal query URL and a) not diving too deeply
    into the results or b) doing new queries too often you can probably fly
    under the radar and unless you're coming from a datacenter somewhere
    they won't know you from Adam doing manual searches in Firefox.

    To top it off, Java makes transparently caching pages (and with 1.6
    implementing cookies) easier too. Add in a deliberate request of the
    front page before doing the search query, some random delays, and a
    spoofed user-agent, and I'm guessing the only way Google could figure
    out you weren't just a surfer using Mozilla 4.0 (compatible; MSIE 4.0)
    would be by using a tool like EtherSniffer to analyze your incoming
    requests and discovering that Java sends the HTTP headers in an
    idiosyncratic sequence. And they won't do that unless your IP generates
    an eyebrow-raising amount of traffic.

    And for Google that "eyebrow-raising" threshold is set very high indeed;
    "normal" traffic for Google is millions of searches per day and there
    are frequently dozens per day from each of many individual IP addresses
    as well as untold numbers of one-offs and the like.

    And, of course, as long as you don't generate more traffic faster than
    you could by typing in all those queries manually, I don't see any moral
    qualms with this. At worst it's equivalent to adblocking the sponsored
    links on the results page with a commonly-available Firefox extension.
    All you've done is automate some tedium at your end without having any
    discernible effect at theirs versus not automating the tedium. So unless
    you do believe in victimless crimes or don't believe in the identity of
    indiscernibles ... :)
    John Ersatznom, Jan 4, 2007
    #8
  9. John Ersatznom wrote:
    ....
    > And, of course, as long as you don't generate more traffic faster than
    > you could by typing in all those queries manually, I don't see any moral
    > qualms with this.


    One might also argue that you were free to build
    your own web-crawler, parse the pages it finds
    for the content and links*, store the data in searchable
    form, then rate and rank it according to whatever
    criterion best suit you[1]. * Oh, of course then
    'repeat for each link', & repeat each 7(?) days.

    Setting up the software and hardware capable of
    achieving that task, might cost a lot of money (I
    guess) OTOH - you can pay a fee to someone
    who has already gone to the effort, and has the
    expertise.

    Just because it is technically possible** to rip
    Google off, does not make it right.

    ** + all the other iditioc reasons people generally
    put forward to justify such theft, starting with..
    - 'they don't have a right - it is free data!'. No it isn't -
    the web pages themselves are free, but the search
    engines hope to add value by sorting and filtering.

    Also, Google is no 'monopoly'. As has been pointed
    out in this (AFAIR) thread. You don't like Google's
    prices? Go to the competition..

    [1] And then, can you make it publicly available,
    so I can rip your data, and resell it to my paying
    clients?

    Andrew T.
    Andrew Thompson, Jan 4, 2007
    #9
  10. nowwho Guest

    Hey,
    Thanks for the information so far. I didn't realise there was so much
    legal stuff envolved, its for a once off educational project. Didn't
    think it would amount to spamming. The pogram would only be run about
    50 times in total. There is a set number of queries, and a set number
    of results returned. As its an eductional project I never thought of
    the legal side!
    nowwho, Jan 4, 2007
    #10
  11. Chris Uppal Guest

    John Ersatznom wrote:

    > Add in a deliberate request of the
    > front page before doing the search query, some random delays, and a
    > spoofed user-agent, and I'm guessing the only way Google could figure
    > out you weren't just a surfer using Mozilla 4.0 (compatible; MSIE 4.0)
    > would be by using a tool like EtherSniffer to analyze your incoming
    > requests and discovering that Java sends the HTTP headers in an
    > idiosyncratic sequence. And they won't do that unless your IP generates
    > an eyebrow-raising amount of traffic.


    Google can and does have more intelligence than that.

    The simplest thing to look for is the originating IP address of the request (at
    the TCP/IP level). A suspicious pattern of requests from one IP (e.g. too many
    in one time period), and Google will stop serving queries from that IP address.
    (The originating IP /can/ be spoofed, but not too many Java programmers will
    have the necessary skills, and in any case is hardly worth the effort.) That
    criterion can also give false positives; for instance if an organisation is
    working behind a NAT, so if one person from that organisation is detected
    abusing Google's services, the entire organisation will be blocked. Does
    Google care ? Why should it ?

    Then, too, Google has available /all/ the data which enters its data-centres;
    from low-level fingerprinting of IP packets, up through checking HTTP headers,
    extending all the way to historical and cross-site access patterns (I would be
    very surprised if they didn't use a custom TCP/IP stack implementation for
    their HTTP servers). How much of that information it actual uses (or even
    collects) I don't know -- but I'd guess that it collects most of it, and uses
    as much as it feels it has to in order to prevent abuse.

    And they do actively work to prevent abuse. There are many kinds of possible
    abuse, and I imagine Google work to prevent most of them, but I doubt if there
    are many things they dislike more than people attempting to steal their data.

    -- chris
    Chris Uppal, Jan 4, 2007
    #11
  12. nowwho wrote:
    > Hey,
    > Thanks for the information so far. I didn't realise there was so much
    > legal stuff envolved, its for a once off educational project.


    You 'ivory tower' types are *so* naiive. It's cute. ;-)

    >...Didn't
    > think it would amount to spamming.


    I am not sure I would use that term for it.

    Spamming is generally pushing an advertising
    related message out to people who do not want it.

    This (when done the 'wrong way') simply amounts
    to a bit of theft of the resources of others.

    & for my part, while I might hassle the thieves,
    I'll bludgeon the spammers.

    >...The pogram would only be run about
    > 50 times in total.


    I think you might be well placed to use the 'legal
    and free' API's currently offered! Surely even the
    small numbers of queries Google offers for free
    would cover your requirement?

    (In any case, from what I understand, Google simply
    refuses further requests for the day if the limit
    is struck - no hard feelings, and back tomorrow..)

    >...There is a set number of queries, and a set number
    > of results returned. As its an eductional project I never thought of
    > the legal side!


    Don't forget the there can be a few 'legalities' to the
    educational side of things. Be careful of tripping over
    using someone elses code without proper attribution
    or accreditation.. Plagiarism/academic misconduct.
    There was a classic thread on these groups from
    a chap by the name of RoboCop - he got to find
    out the hard way.

    Andrew T.
    Andrew Thompson, Jan 4, 2007
    #12
  13. nowwho Guest

    Andrew Thompson wrote:
    > I am not sure I would use that term for it.


    Fair enough, computers and technology aren't my main interest of study.


    > I think you might be well placed to use the 'legal
    > and free' API's currently offered! Surely even the
    > small numbers of queries Google offers for free
    > would cover your requirement?


    More than likely, but would still require advise on how to incorporate
    these into a Java program.


    > Don't forget the there can be a few 'legalities' to the
    > educational side of things. Be careful of tripping over
    > using someone elses code without proper attribution
    > or accreditation.. Plagiarism/academic misconduct.
    > There was a classic thread on these groups from
    > a chap by the name of RoboCop - he got to find
    > out the hard way.


    The use of other peoples code is allowed , however ALL work and ALL
    sources of information used in any way required for the project have to
    be detailed, we were well warned about the conquences of plagiarism.
    All websites accessed for the project along with any copyright date
    must be included along with the date that the website was accessed
    etc...
    nowwho, Jan 4, 2007
    #13
  14. NoNickName Guest

    > > Andrew Thompson wrote:
    > ..


    > BTW - nice to see you 'about the place' again..


    Thanks. Been busy with end of year deadlines recently. Should be around
    a bit more often now though.

    --
    TechBookReport Java - http://www.techbookreport.com/JavaIndex.html
    NoNickName, Jan 5, 2007
    #14
  15. nowwho wrote:
    > Hey,
    > Thanks for the information so far. I didn't realise there was so much
    > legal stuff envolved, its for a once off educational project. Didn't
    > think it would amount to spamming. The pogram would only be run about
    > 50 times in total. There is a set number of queries, and a set number
    > of results returned. As its an eductional project I never thought of
    > the legal side!


    It's not spamming -- I don't know what the other guy was smoking when he
    wrote the post you're replying to. There is NO DIFFERENCE discernible to
    Google if you

    a) do 10 searches during the day by typing in a Firefox window while
    doing research or
    b) have your computer do the searches with less/no typing on your part

    Google is being "ripped off" iff you do something like:

    a) use huge amounts of their bandwidth -- well in excess of a normal
    user doing a bit of heavy research say, generating large numbers of
    searches or delving very deeply into the result set. Fetching 10
    first-pages-of-results one for each of 10 queries, whether done by one
    mouse click or ten typed-in queries, has little impact on them, and of
    course the one mouse click case makes it actually 10 queries instead of
    11 because you mistyped one and had to do it again :)
    b) or use google search results to populate your own rival "search
    engine" site with revenue-generating ads or what-have-you, either by
    scraping google's database or by just putting up a page with a script
    that takes peoples' queries and passes them to google, then takes the
    result page and replaces google's sponsored links with umpteen flashing
    banner ads. Then you're using google's work output to actually compete
    against google, rather than simply using google for research. That makes
    a crucial difference.

    Using code to drive Google lightly and for personal/educational/research
    reasons rather than commercial ones doesn't seem to be evil to me,
    especially if they cannot in practise distinguish it from "normal" use
    anyway, as it isn't producing excessive traffic or being used to compete
    against google in some way.

    In fact, where do you draw the line? Firefox with manually-typed queries
    is OK. Then we have Firefox with a MRU for queries; Firefox with query
    guessing or autocompletion based on your current activities and
    interests; Firefox with a plugin to take the result set too and
    transform it e.g. to show 50 rather than 10 hits or to weed out
    "supplemental results" that are usually MFA sites that really ARE
    ripping off google; Firefox with a plugin to run the query of your
    choice and bookmark the results every few days; ... Firefox with a
    plugin to gradually build up a database of hits for various queries by
    occasionally fetching the nth page of results for one of them, but you
    don't publish these anywhere, just use them personally ...

    I think the two things that mark a transition to being evil are causing
    them excessive traffic and competing with them using their own data in
    some way. (Also generating content-free MFA pages to generate revenue
    via AdSense ads and SEOing them, but that's more using AdSense than
    using the search engine proper, though the SEO will impact the latter
    and pollute the results.)

    I don't see any way to derive some kind of moral law that makes typing
    something morally superior to doing it with one click, and actually
    scheduling an automatic (infrequent) job or whatever actually sinful.
    There's no inherent virtue in inefficiency, and computers exist to
    enable automating tasks. Hyperlinks automate looking up and finding that
    dusty reference or whatever; librarians may complain that they rot young
    brains but the actual upshot is a gain in productivity, rather than some
    kind of evil decadence setting in.
    John Ersatznom, Jan 5, 2007
    #15
  16. Chris Uppal wrote:
    > And they do actively work to prevent abuse. There are many kinds of possible
    > abuse, and I imagine Google work to prevent most of them, but I doubt if there
    > are many things they dislike more than people attempting to steal their data.


    All of this depends on what constitutes "stealing" their data. Copying
    it and publishing it? Sort of -- it's some kind of infringement but not
    really "theft".

    Merely doing with one mouse click or zero what you'd do anyway with
    twenty keypresses? I don't see how the amount of clacking emanating from
    someone's workstation at location A is in any way relevant to Google as
    long as a) a single user isn't suddenly hogging their resources and b)
    the user is using the results "normally" rather than to compete with
    Google or whatever.

    The red flags that would make them look into their logfiles would be a)
    excessive bandwidth use and b) a Google clone or whatever springing up
    all of a sudden and competing for their revenue streams.

    Personal use of the search results isn't anything they can fault. Nor
    however a person chooses to generate the requests (so long as they
    aren't excessively frequent) or however they choose to filter and use
    the results so long as they don't use them commercially.

    I see no logical reason for them to care whether the 3 requests a given
    IP gave them in a given day came from 30 typed characters and 3 mouse
    clicks, 3 mouse clicks, or 0 mouse clicks at the requesting end, as long
    as they don't consider 3 requests in one day from one source to be
    excessive and as long as they aren't using those results in a way that
    competes somehow with Google.

    Unless, of course, the real intent is to enforce terms that let them use
    a business model based on charging ordinary users a premium merely to
    avoid tedium. I hope that isn't their intent; it would violate their
    famous motto. A tiered "typed queries are free, bookmarked are a dime
    each, and cron jobs require a monthly $59.99 subscription fee and
    special account" service where it actually costs them exactly the same
    amount (next to nil) to provide for all three use cases seems not merely
    silly, but tantamount to fraudulent. A tiered "more than xx queries a
    day requires a premium $10/month account" thing with xx in the dozens or
    hundreds might not be considered evil -- after all, generating that many
    queries actually scales up the amount serving you is costing them per
    day. And of course disallowing commercial use of the results (other than
    incidentally, like researching a purchase or new hire -- more selling
    the results themselves in some manner) without a licensing arrangement
    where Google gets a percentage. That's only fair.
    John Ersatznom, Jan 5, 2007
    #16
  17. nowwho Guest

    John Ersatznom wrote:
    > nowwho wrote:
    > > Hey,
    > > Thanks for the information so far. I didn't realise there was so much
    > > legal stuff envolved, its for a once off educational project. Didn't
    > > think it would amount to spamming. The pogram would only be run about
    > > 50 times in total. There is a set number of queries, and a set number
    > > of results returned. As its an eductional project I never thought of
    > > the legal side!

    >
    > It's not spamming -- I don't know what the other guy was smoking when he
    > wrote the post you're replying to. There is NO DIFFERENCE discernible to
    > Google if you
    >
    > a) do 10 searches during the day by typing in a Firefox window while
    > doing research or
    > b) have your computer do the searches with less/no typing on your part
    >
    > Google is being "ripped off" iff you do something like:
    >
    > a) use huge amounts of their bandwidth -- well in excess of a normal
    > user doing a bit of heavy research say, generating large numbers of
    > searches or delving very deeply into the result set. Fetching 10
    > first-pages-of-results one for each of 10 queries, whether done by one
    > mouse click or ten typed-in queries, has little impact on them, and of
    > course the one mouse click case makes it actually 10 queries instead of
    > 11 because you mistyped one and had to do it again :)
    > b) or use google search results to populate your own rival "search
    > engine" site with revenue-generating ads or what-have-you, either by
    > scraping google's database or by just putting up a page with a script
    > that takes peoples' queries and passes them to google, then takes the
    > result page and replaces google's sponsored links with umpteen flashing
    > banner ads. Then you're using google's work output to actually compete
    > against google, rather than simply using google for research. That makes
    > a crucial difference.


    The point of the exercise is to get the URL's returned into an offline
    database. It's an excersise purly to pull back the URL's from the
    different search engines.

    > Using code to drive Google lightly and for personal/educational/research
    > reasons rather than commercial ones doesn't seem to be evil to me,
    > especially if they cannot in practise distinguish it from "normal" use
    > anyway, as it isn't producing excessive traffic or being used to compete
    > against google in some way.


    I don't think its a question of good or evil, I think people are
    worried that the code could be used for commercial reasons.

    > In fact, where do you draw the line? Firefox with manually-typed queries
    > is OK. Then we have Firefox with a MRU for queries; Firefox with query
    > guessing or autocompletion based on your current activities and
    > interests; Firefox with a plugin to take the result set too and
    > transform it e.g. to show 50 rather than 10 hits or to weed out
    > "supplemental results" that are usually MFA sites that really ARE
    > ripping off google; Firefox with a plugin to run the query of your
    > choice and bookmark the results every few days; ... Firefox with a
    > plugin to gradually build up a database of hits for various queries by
    > occasionally fetching the nth page of results for one of them, but you
    > don't publish these anywhere, just use them personally ...
    >
    > I think the two things that mark a transition to being evil are causing
    > them excessive traffic and competing with them using their own data in
    > some way. (Also generating content-free MFA pages to generate revenue
    > via AdSense ads and SEOing them, but that's more using AdSense than
    > using the search engine proper, though the SEO will impact the latter
    > and pollute the results.)


    This is an educational project and as computers is not my main interest
    of study I don't know what a MFA, SEO are. Can this be explained?


    > I don't see any way to derive some kind of moral law that makes typing
    > something morally superior to doing it with one click, and actually
    > scheduling an automatic (infrequent) job or whatever actually sinful.
    > There's no inherent virtue in inefficiency, and computers exist to
    > enable automating tasks. Hyperlinks automate looking up and finding that
    > dusty reference or whatever; librarians may complain that they rot young
    > brains but the actual upshot is a gain in productivity, rather than some
    > kind of evil decadence setting in.


    Any help with using the Google API or other suggestions would be a
    great help. I also assume that Googe's API won't work with the other
    serch engines, so would I have to write a different class for each
    search engine?
    nowwho, Jan 5, 2007
    #17
  18. nowwho wrote:
    >>I think you might be well placed to use the 'legal
    >>and free' API's currently offered! Surely even the
    >>small numbers of queries Google offers for free
    >>would cover your requirement?

    >
    > The use of other peoples code is allowed , however ALL work and ALL
    > sources of information used in any way required for the project have to
    > be detailed, we were well warned about the conquences of plagiarism.
    > All websites accessed for the project along with any copyright date
    > must be included along with the date that the website was accessed
    > etc...


    Oh what a tangled web we weave...what happened to the days when you
    could just tinker and innovate without fear of lawyers or similar? Hmm?
    Of course, wholesale copying of other stuff without permission and
    misattributing it as your own original work is simply bad, but it's
    because it's fraud and misrepresentation, not because it's copying, IMO.
    Wheel-reinventing is supposed to be a bad thing. Let some attorneys get
    involved and soon everyone is expecting you to get their permission to
    copy anything. Then to *use* anything. Then to breathe or take a leak,
    no doubt.

    I think it's worth pointing out that unless you've signed something in
    writing, you aren't in a binding agreement with Google about anything
    (or anyone else) and only copyright, trademark, and patent law has any
    true legal force. No matter what TOC boilerplate is on whose website.
    Hell, they can't even prove that you *read* it, in any meaningful way,
    even if your IP retrieved the page one day.

    Of course the defacto law in the US isn't so rosy, thanks to a braindead
    court system and a legislature that's long since been ritually auctioned
    with great fanfare biannually to the highest bidder. I'd suggest a saner
    country. Many in Europe and, I think, even Canada actually still have
    sane legal systems, standards for when someone's actually entered into a
    binding contract, standards of evidence to get subpoenas, warrants, and
    judgments, and whatnot. Australia's as bad as the US or worse though. I
    wonder how long it is before individuals have to jurisdiction-shop by
    travel agent and $500 one-way airfare express just to do ordinary
    victimless activities without legal repercussions and $50,000 in bogus
    fines for phantom file sharing someone else on the neigborhood's cable
    company internet service may or may not actually have done...
    John Ersatznom, Jan 6, 2007
    #18
  19. nowwho Guest

    John Ersatznom wrote:
    > nowwho wrote:
    > >>I think you might be well placed to use the 'legal
    > >>and free' API's currently offered! Surely even the
    > >>small numbers of queries Google offers for free
    > >>would cover your requirement?

    > >
    > > The use of other peoples code is allowed , however ALL work and ALL
    > > sources of information used in any way required for the project have to
    > > be detailed, we were well warned about the conquences of plagiarism.
    > > All websites accessed for the project along with any copyright date
    > > must be included along with the date that the website was accessed
    > > etc...

    >
    > Oh what a tangled web we weave...what happened to the days when you
    > could just tinker and innovate without fear of lawyers or similar? Hmm?
    > Of course, wholesale copying of other stuff without permission and
    > misattributing it as your own original work is simply bad, but it's
    > because it's fraud and misrepresentation, not because it's copying, IMO.
    > Wheel-reinventing is supposed to be a bad thing. Let some attorneys get
    > involved and soon everyone is expecting you to get their permission to
    > copy anything. Then to *use* anything. Then to breathe or take a leak,
    > no doubt.
    >
    > I think it's worth pointing out that unless you've signed something in
    > writing, you aren't in a binding agreement with Google about anything
    > (or anyone else) and only copyright, trademark, and patent law has any
    > true legal force. No matter what TOC boilerplate is on whose website.
    > Hell, they can't even prove that you *read* it, in any meaningful way,
    > even if your IP retrieved the page one day.
    >
    > Of course the defacto law in the US isn't so rosy, thanks to a braindead
    > court system and a legislature that's long since been ritually auctioned
    > with great fanfare biannually to the highest bidder. I'd suggest a saner
    > country. Many in Europe and, I think, even Canada actually still have
    > sane legal systems, standards for when someone's actually entered into a
    > binding contract, standards of evidence to get subpoenas, warrants, and
    > judgments, and whatnot. Australia's as bad as the US or worse though. I
    > wonder how long it is before individuals have to jurisdiction-shop by
    > travel agent and $500 one-way airfare express just to do ordinary
    > victimless activities without legal repercussions and $50,000 in bogus
    > fines for phantom file sharing someone else on the neigborhood's cable
    > company internet service may or may not actually have done...


    While the legal information is handy and can (more than likely will) be
    included in the report, is there any suggestions on how to tackle the
    coding of the problem or suggestions as to where I can look for further
    information?
    nowwho, Jan 6, 2007
    #19
  20. Chris Uppal Guest

    John Ersatznom wrote:

    > > The use of other peoples code is allowed , however ALL work and ALL
    > > sources of information used in any way required for the project have to
    > > be detailed, we were well warned about the conquences of plagiarism.
    > > All websites accessed for the project along with any copyright date
    > > must be included along with the date that the website was accessed
    > > etc...

    >
    > Oh what a tangled web we weave...what happened to the days when you
    > could just tinker and innovate without fear of lawyers or similar?


    I think the OP's problem here is not so much the legality (or otherwise) of
    "borrowing" Google's data, but that this is work in an academic context where
    all sources /must/ be declared for reasons of honesty in scholarship.

    -- chris
    Chris Uppal, Jan 6, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Antonio Maciel

    Full-text searches and ASP.NET

    Antonio Maciel, Jun 27, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    432
    samham
    Jun 28, 2003
  2. Stephan Bour

    Active Directory searches

    Stephan Bour, Oct 15, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    622
    Chee Seong Ong
    Oct 15, 2003
  3. David Kleyman

    Long running searches

    David Kleyman, Jan 6, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    385
    Curt_C [MVP]
    Jan 6, 2004
  4. Luis Esteban Valencia

    searches

    Luis Esteban Valencia, Jan 6, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    489
    Luis Esteban Valencia
    Jan 6, 2005
  5. =?Utf-8?B?ZGhucml2ZXJzaWRl?=

    AD searches using LIKE

    =?Utf-8?B?ZGhucml2ZXJzaWRl?=, Apr 6, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    425
    =?Utf-8?B?ZGhucml2ZXJzaWRl?=
    Apr 6, 2005
Loading...

Share This Page