Attention, hyperlinkers: inference of active text

C

Cameron Laird

I'm looking for ideas, although their expression in executable
certainly doesn't offend me.

I do text manipulation. As it happens, I'm in a position to
"activate" the obvious URI in
Now is the time for all good men to read http://www.ams.org/
That's nice. End-users "get it", and are happy I render
"http://www.ams.org" as a hyperlink. Most of them eventually
notice the implications for punctuation, that is, that they're
happier when they write
Look at http://bamboo.org !
than
Look at http://bamboo.org!

The design breaks down more annoyingly by the time we get to
the "file" scheme, though. How do the rest of you handle this?
Do you begin to make end-users quote, as in
The secret is in "file:\My Download Folder\dont_look.txt".
? Is there some other obvious approach? I am confident that
requiring
It is on my drive as file:\Program%20Files\Perl\odysseus.exe
is NOT practical with my clients.
 
P

Paramjit Oberoi

The design breaks down more annoyingly by the time we get to
the "file" scheme, though. How do the rest of you handle this?
Do you begin to make end-users quote, as in
The secret is in "file:\My Download Folder\dont_look.txt".

Some thoughts:

1. The quoting certainly seems like a good idea, and one that is
applicable even if other other approaches are also used. Plus,
it is consistent with how most shells handle this problem.

2. You can special case common filenames like "Program Files",
"Documents and Settings", "My Music", etc., (the precise list
would depend on your environment & usage).

3. You could conceivably look in the filesystem (or even on the web) to
check which names/URLs are valid... but I think this could be a bad
idea because the program's behavior become non-deterministic. It might
confuse users.

-param

PS: I've never encountered this problem myself, so this could all be wrong.
 
A

Alexander Schmolck

I'm looking for ideas, although their expression in executable
certainly doesn't offend me.

I do text manipulation. As it happens, I'm in a position to
"activate" the obvious URI in
Now is the time for all good men to read http://www.ams.org/
That's nice. End-users "get it", and are happy I render
"http://www.ams.org" as a hyperlink. Most of them eventually
notice the implications for punctuation, that is, that they're
happier when they write
Look at http://bamboo.org !
than
Look at http://bamboo.org!

The design breaks down more annoyingly by the time we get to
the "file" scheme, though. How do the rest of you handle this?
Do you begin to make end-users quote, as in
The secret is in "file:\My Download Folder\dont_look.txt".
? Is there some other obvious approach? I am confident that
requiring
It is on my drive as file:\Program%20Files\Perl\odysseus.exe
is NOT practical with my clients.

Can't you get them to write <URL:http://bamboo.org> (or, alternatively
<http://bamboo.org> which, although not backed up by a RFC, also ought to do
the job and is less to type and to remember).

Apart from making escaping superfuous, this should also solve all your
punctuation and linebreak problems robustly. '<','>' can't occur in URIs so
matching '<http:|file:|www\..*?>.' or so (and then kicking out '\n\s.*') ought
to work, no?

'as
 
J

Jeff Epler

I'm pretty sure that this isn't a valid url:
file:\I never\used anything\besides windows.txt
It's something, but it's not a URL.

For actual HTTP URLs, I would suggest that you have a step in the
highlighting that considers whether the last part of the URL seems to
contain plausible characters. Letters from this set are pretty
unlikely: ".,!])}'\""

For these file: faux-URLs, you could again start by parsing the maximum
number of characters as the URL, then repeatedly check whether the
current fragment exists on disk. If it doesn't, chop off part of it
(probably at whitespace) and try again until you get something that
exists or your string is empty.

If that doesn't work (for instance, you're not in a position to check
what exists on the user's disk) then you could try a rule where the
hyperlink portion extends from file: at least to the last \, and if the
part beyond that is of the form "word word word.ext" then it's included
too.

Best of luck. This'll probably require a lot of experimentation.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA1D/xJd01MZaTXX0RAh5yAJ9kyn8l8+XBheDYbFGomXvtW29fLgCfWk8M
ejPm975Sb8ASPTWknsE/huQ=
=CokA
-----END PGP SIGNATURE-----
 
N

Nelson Minar

If I understand your question correctly, you're looking for a way to
guess what part of an English sentence is a URL. The problem you're
facing is trailing punctuation characters.

Ie, these are good:
Look at http://bamboo.org !
It is on my drive as file:\Program%20Files\Perl\odysseus.exe
And these are bad:
Look at http://bamboo.org!
The secret is in "file:\My Download Folder\dont_look.txt".

If you want to make life as easy as possible for your authors, you
need some good heuristics. You need to guess where the URL starts and
ends. My terminal emulator (SecureCRT) does a pretty good job of this.
Nat Friedman's dingus also did this trick awhile ago - I can't find it
easily now, but I think the code might be part of rxvt or Gnome.

Your other option is to require folks to delimit URLs with something
like <http://bamboo.org>. This is pretty painless and common, but only
you can know whether your users will accept it.
 
J

John Seal

End-users "get it", and are happy I render
"http://www.ams.org" as a hyperlink. Most of them eventually
notice the implications for punctuation, that is, that they're
happier when they write

So who is constructing these sentences, you or the end-users?

Any idea *why* are they happier with the first than the second?
The design breaks down more annoyingly by the time we get to
the "file" scheme, though.

What design, and in what way is it breaking down?
I am confident that requiring
It is on my drive as file:\Program%20Files\Perl\odysseus.exe
is NOT practical with my clients.

Any idea why not? The lack of terminal punctuation?
 
A

Andrew Clover

Cameron Laird said:
The design breaks down more annoyingly by the time we get to
the "file" scheme, though. How do the rest of you handle this?

The file scheme is no different to http regarding punctuation.
Personally, I trim characters that are valid in URIs but not likely to
be at the end, such as '.', from the end of URIs, so that constructs
like "See http://www.foo.com/index.html." still work. It's a hack but
the results seem reasonable.
It is on my drive as file:\Program Files\Perl\odysseus.exe

URIs with spaces and backslashes are not valid at all, and will break
browsers. (Also the example is missing the drive letter.)

If inputting file names directly is a requirement I would suggest
having a different format for it that doesn't involve escaping-to-URI,
for example you could sniff for double-quoted strings starting with
'[drive letter]:\'.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top