finding subdirectories from starting URL

Discussion in 'Java' started by Alan, Nov 12, 2007.

  1. Alan

    Alan Guest

    Alan, Nov 12, 2007
    #1
    1. Advertising

  2. Alan

    Daniel Pitts Guest

    Alan wrote:
    > I want to find subdirectories from a starting URL. For example,
    > if I start with http://www.someplace.net, I want to be able to find
    > the subdirectories there, e.g.:
    >
    > http://www.someplace.net/documentation/
    > http://www.someplace.net/about/
    > http://www.someplace.net/images/
    >
    > Are there Java methods that facilitate this?
    >
    > Thanks, Alan
    >

    There is no easy way to do that, unless someplace.net gives you a
    listing page. Generally, in order to do that, you have to have either
    direct access to the disk, access to an FTP account on that machine, or
    you have to crawl the web site and parse out the urls.

    Note, this is not a limitation of Java, but simply a result of the way
    http works.

    There are plenty of web-crawling libraries/programs out there, I suggest
    you Google for them.

    Good luck,
    Daniel

    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
    Daniel Pitts, Nov 12, 2007
    #2
    1. Advertising

  3. Andrew Thompson, Nov 12, 2007
    #3
  4. Alan

    Roedy Green Guest

    On Mon, 12 Nov 2007 00:20:17 -0000, Alan <>
    wrote, quoted or indirectly quoted someone who said :

    >http://www.someplace.net/documentation/
    >http://www.someplace.net/about/
    >http://www.someplace.net/images/
    >
    > Are there Java methods that facilitate this?


    In general no. It is considered confidential information. Sometimes a
    server will give you a directory listing in HTML if you give it an URL
    of a directory without in index.html file in it.
    --
    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
    Roedy Green, Nov 12, 2007
    #4
  5. Daniel Pitts <> writes:

    > Alan wrote:
    >> I want to find subdirectories from a starting URL. For example,
    >> if I start with http://www.someplace.net, I want to be able to find
    >> the subdirectories there, e.g.:
    >>
    >> http://www.someplace.net/documentation/
    >> http://www.someplace.net/about/
    >> http://www.someplace.net/images/
    >>
    >> Are there Java methods that facilitate this?
    >>
    >> Thanks, Alan
    >>

    > There is no easy way to do that, unless someplace.net gives you a
    > listing page. Generally, in order to do that, you have to have either
    > direct access to the disk, access to an FTP account on that machine,
    > or you have to crawl the web site and parse out the urls.
    >
    > Note, this is not a limitation of Java, but simply a result of the way
    > http works.


    In particular, note that documentation, about, and images may not even be
    directories at all. A content-management system could use them as keys
    into a database of managed documents, for example, or as category keywords
    that are used to dynamically assemble a list of documents in the specified
    category.

    sherm--

    --
    WV News, Blogging, and Discussion: http://wv-www.com
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Sherman Pendley, Nov 12, 2007
    #5
  6. Alan

    Lew Guest

    Roedy Green wrote:
    > On Mon, 12 Nov 2007 00:20:17 -0000, Alan <>
    > wrote, quoted or indirectly quoted someone who said :
    >
    >> http://www.someplace.net/documentation/
    >> http://www.someplace.net/about/
    >> http://www.someplace.net/images/
    >>
    >> Are there Java methods that facilitate this?

    >
    > In general no. It is considered confidential information. Sometimes a
    > server will give you a directory listing in HTML if you give it an URL
    > of a directory without in index.html file in it.


    In addition, many directories on the server hard drive, while they are
    subdirectories of the web document directory, will not be accessible to public
    clients. A classic example is the WEB-INF/ directory tree in Java EE apps,
    but Apache .htaccess can also restrict directories.

    --
    Lew
    Lew, Nov 12, 2007
    #6
  7. Roedy Green wrote:
    >>http://www.someplace.net/documentation/

    ...
    >> Are there Java methods that facilitate this?

    >
    >In general no. It is considered confidential information. Sometimes a
    >server will give you a directory listing in HTML if you give it an URL
    >of a directory without in index.html file in it.


    An ironic side note is that I'd like my server to *allow*
    automatic 'directory listing' for dirs with no index.html,
    but cannot figure how to achieve it using the arcane
    ..contol panel ..thingy the host offers.

    If directory indexing *is* turned on, it is a perhaps tedious
    but mundane task to parse the resulting HTML, looking for
    links to sub-dirs and resources.

    --
    Andrew Thompson
    http://www.athompson.info/andrew/

    Message posted via JavaKB.com
    http://www.javakb.com/Uwe/Forums.aspx/java-general/200711/1
    Andrew Thompson, Nov 12, 2007
    #7
  8. On Nov 12, 5:11 pm, "Andrew Thompson" <u32984@uwe> wrote:
    > Roedy Green wrote:
    > >>http://www.someplace.net/documentation/

    > ..
    > >> Are there Java methods that facilitate this?

    >
    > >In general no. It is considered confidential information. Sometimes a
    > >server will give you a directory listing in HTML if you give it an URL
    > >of a directory without in index.html file in it.

    >
    > An ironic side note is that I'd like my server to *allow*
    > automatic 'directory listing' for dirs with no index.html,
    > but cannot figure how to achieve it using the arcane
    > .contol panel ..thingy the host offers.


    [snip]

    If it is an IIS web server, you will usually find a checkbox
    in the IIS server properties, or for memory, you can even get
    to it by right clicking on the virtual directory and editing
    the properties.

    Tomcat for example has the following setting in its web.xml file:

    <init-param>
    <param-name>listings</param-name>
    <param-value>true</param-value>
    </init-param>

    But I think it's only good whilst developing.

    --
    Chris
    Chris ( Val ), Nov 12, 2007
    #8
  9. Alan

    Alan Guest

    Thanks for the information. I think I shall just follow href
    links instead of finding directories.

    Thanks, Alan
    Alan, Nov 12, 2007
    #9
  10. Alan wrote:
    >Thanks for the information. I think I shall just follow href
    >links instead of finding directories.


    In that case, as Daniel mentioned, search "Web crawler"/
    "web crawling". There have been some interesting discussions
    about crawlers in these groups, across the ages. As I vaguely
    recall there was a source posted for one by Mr Omar Khan
    ..ahh yes.
    <http://groups.google.com/group/comp.lang.java.programmer/msg/df4a6f43d57e3e6a
    >


    But please (please, please) respect the directions of the
    site's robots.txt (if it has one).

    --
    Andrew Thompson
    http://www.athompson.info/andrew/

    Message posted via JavaKB.com
    http://www.javakb.com/Uwe/Forums.aspx/java-general/200711/1
    Andrew Thompson, Nov 12, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thieum22
    Replies:
    1
    Views:
    677
    Joe Smith
    Aug 6, 2004
  2. Steve Haun
    Replies:
    0
    Views:
    445
    Steve Haun
    Nov 16, 2003
  3. =?Utf-8?B?Q2hhcmxpZSBEaXNvbg==?=

    Organizing pages in subdirectories of main directory

    =?Utf-8?B?Q2hhcmxpZSBEaXNvbg==?=, Feb 8, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    387
    Yan-Hong Huang[MSFT]
    Feb 9, 2004
  4. Guest
    Replies:
    5
    Views:
    345
    Kevin Spencer
    Dec 3, 2004
  5. Helen
    Replies:
    7
    Views:
    109
    Helen
    Aug 19, 2003
Loading...

Share This Page