Attention, hyperlinkers: inference of active text

Discussion in 'Python' started by Cameron Laird, Jun 18, 2004.

  1. I'm looking for ideas, although their expression in executable
    certainly doesn't offend me.

    I do text manipulation. As it happens, I'm in a position to
    "activate" the obvious URI in
    Now is the time for all good men to read http://www.ams.org/
    That's nice. End-users "get it", and are happy I render
    "http://www.ams.org" as a hyperlink. Most of them eventually
    notice the implications for punctuation, that is, that they're
    happier when they write
    Look at http://bamboo.org !
    than
    Look at http://bamboo.org!

    The design breaks down more annoyingly by the time we get to
    the "file" scheme, though. How do the rest of you handle this?
    Do you begin to make end-users quote, as in
    The secret is in "file:\My Download Folder\dont_look.txt".
    ? Is there some other obvious approach? I am confident that
    requiring
    It is on my drive as file:\Program%20Files\Perl\odysseus.exe
    is NOT practical with my clients.
    --

    Cameron Laird <>
    Business: http://www.Phaseit.net
     
    Cameron Laird, Jun 18, 2004
    #1
    1. Advertising

  2. > The design breaks down more annoyingly by the time we get to
    > the "file" scheme, though. How do the rest of you handle this?
    > Do you begin to make end-users quote, as in
    > The secret is in "file:\My Download Folder\dont_look.txt".


    Some thoughts:

    1. The quoting certainly seems like a good idea, and one that is
    applicable even if other other approaches are also used. Plus,
    it is consistent with how most shells handle this problem.

    2. You can special case common filenames like "Program Files",
    "Documents and Settings", "My Music", etc., (the precise list
    would depend on your environment & usage).

    3. You could conceivably look in the filesystem (or even on the web) to
    check which names/URLs are valid... but I think this could be a bad
    idea because the program's behavior become non-deterministic. It might
    confuse users.

    -param

    PS: I've never encountered this problem myself, so this could all be wrong.
     
    Paramjit Oberoi, Jun 18, 2004
    #2
    1. Advertising

  3. (Cameron Laird) writes:

    > I'm looking for ideas, although their expression in executable
    > certainly doesn't offend me.
    >
    > I do text manipulation. As it happens, I'm in a position to
    > "activate" the obvious URI in
    > Now is the time for all good men to read http://www.ams.org/
    > That's nice. End-users "get it", and are happy I render
    > "http://www.ams.org" as a hyperlink. Most of them eventually
    > notice the implications for punctuation, that is, that they're
    > happier when they write
    > Look at http://bamboo.org !
    > than
    > Look at http://bamboo.org!
    >
    > The design breaks down more annoyingly by the time we get to
    > the "file" scheme, though. How do the rest of you handle this?
    > Do you begin to make end-users quote, as in
    > The secret is in "file:\My Download Folder\dont_look.txt".
    > ? Is there some other obvious approach? I am confident that
    > requiring
    > It is on my drive as file:\Program%20Files\Perl\odysseus.exe
    > is NOT practical with my clients.


    Can't you get them to write <URL:http://bamboo.org> (or, alternatively
    <http://bamboo.org> which, although not backed up by a RFC, also ought to do
    the job and is less to type and to remember).

    Apart from making escaping superfuous, this should also solve all your
    punctuation and linebreak problems robustly. '<','>' can't occur in URIs so
    matching '<http:|file:|www\..*?>.' or so (and then kicking out '\n\s.*') ought
    to work, no?

    'as
     
    Alexander Schmolck, Jun 18, 2004
    #3
  4. Cameron Laird

    JanC Guest

    Alexander Schmolck <> schreef:

    > Can't you get them to write <URL:http://bamboo.org> (or, alternatively
    > <http://bamboo.org> which, although not backed up by a RFC, also ought
    > to do the job and is less to type and to remember).


    Recent URI RFCs say <...> is more common than <URL:...>.

    --
    JanC

    "Be strict when sending and tolerant when receiving."
    RFC 1958 - Architectural Principles of the Internet - section 3.9
     
    JanC, Jun 19, 2004
    #4
  5. Cameron Laird

    Jeff Epler Guest

    I'm pretty sure that this isn't a valid url:
    file:\I never\used anything\besides windows.txt
    It's something, but it's not a URL.

    For actual HTTP URLs, I would suggest that you have a step in the
    highlighting that considers whether the last part of the URL seems to
    contain plausible characters. Letters from this set are pretty
    unlikely: ".,!])}'\""

    For these file: faux-URLs, you could again start by parsing the maximum
    number of characters as the URL, then repeatedly check whether the
    current fragment exists on disk. If it doesn't, chop off part of it
    (probably at whitespace) and try again until you get something that
    exists or your string is empty.

    If that doesn't work (for instance, you're not in a position to check
    what exists on the user's disk) then you could try a rule where the
    hyperlink portion extends from file: at least to the last \, and if the
    part beyond that is of the form "word word word.ext" then it's included
    too.

    Best of luck. This'll probably require a lot of experimentation.

    Jeff

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)

    iD8DBQFA1D/xJd01MZaTXX0RAh5yAJ9kyn8l8+XBheDYbFGomXvtW29fLgCfWk8M
    ejPm975Sb8ASPTWknsE/huQ=
    =CokA
    -----END PGP SIGNATURE-----
     
    Jeff Epler, Jun 19, 2004
    #5
  6. Cameron Laird

    Nelson Minar Guest

    If I understand your question correctly, you're looking for a way to
    guess what part of an English sentence is a URL. The problem you're
    facing is trailing punctuation characters.

    Ie, these are good:
    Look at http://bamboo.org !
    It is on my drive as file:\Program%20Files\Perl\odysseus.exe
    And these are bad:
    Look at http://bamboo.org!
    The secret is in "file:\My Download Folder\dont_look.txt".

    If you want to make life as easy as possible for your authors, you
    need some good heuristics. You need to guess where the URL starts and
    ends. My terminal emulator (SecureCRT) does a pretty good job of this.
    Nat Friedman's dingus also did this trick awhile ago - I can't find it
    easily now, but I think the code might be part of rxvt or Gnome.

    Your other option is to require folks to delimit URLs with something
    like <http://bamboo.org>. This is pretty painless and common, but only
    you can know whether your users will accept it.
     
    Nelson Minar, Jun 19, 2004
    #6
  7. Cameron Laird

    John Seal Guest

    In article <>,
    (Cameron Laird) wrote:

    > End-users "get it", and are happy I render
    > "http://www.ams.org" as a hyperlink. Most of them eventually
    > notice the implications for punctuation, that is, that they're
    > happier when they write


    So who is constructing these sentences, you or the end-users?

    > Look at http://bamboo.org !
    > than
    > Look at http://bamboo.org!


    Any idea *why* are they happier with the first than the second?

    > The design breaks down more annoyingly by the time we get to
    > the "file" scheme, though.


    What design, and in what way is it breaking down?

    > I am confident that requiring
    > It is on my drive as file:\Program%20Files\Perl\odysseus.exe
    > is NOT practical with my clients.


    Any idea why not? The lack of terminal punctuation?
     
    John Seal, Jun 20, 2004
    #7
  8. Cameron Laird <> wrote:

    > The design breaks down more annoyingly by the time we get to
    > the "file" scheme, though. How do the rest of you handle this?


    The file scheme is no different to http regarding punctuation.
    Personally, I trim characters that are valid in URIs but not likely to
    be at the end, such as '.', from the end of URIs, so that constructs
    like "See http://www.foo.com/index.html." still work. It's a hack but
    the results seem reasonable.

    > It is on my drive as file:\Program Files\Perl\odysseus.exe


    URIs with spaces and backslashes are not valid at all, and will break
    browsers. (Also the example is missing the drive letter.)

    If inputting file names directly is a requirement I would suggest
    having a different format for it that doesn't involve escaping-to-URI,
    for example you could sniff for double-quoted strings starting with
    '[drive letter]:\'.

    --
    Andrew Clover
    mailto:
    http://www.doxdesk.com/
     
    Andrew Clover, Jun 21, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?RGlmZmlkZW50?=

    Attention MVP's: I have an issue

    =?Utf-8?B?RGlmZmlkZW50?=, May 25, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    419
    =?Utf-8?B?RGlmZmlkZW50?=
    May 26, 2005
  2. John
    Replies:
    2
    Views:
    795
  3. Al
    Replies:
    0
    Views:
    841
  4. Isaac Grover
    Replies:
    148
    Views:
    2,673
    Mark Parnell
    May 4, 2004
  5. Replies:
    3
    Views:
    530
    Ashish
    Jul 16, 2003
Loading...

Share This Page