R
Ramza Brown
I am parsing a collection of URLS; URLS that seem to have Chinese/Indian
and other unicode characters. My question, how can I filter those out
while still leaving room for alpha-numeric and characters that are
typical of a URL or Title
For example I might get a URL with:
http://????????????-????????
title = ??????
where the ? represents some unicode character
I want to filter these out, but leave room for non-alphanumeric characters:
http://www.yahoo.com
--
Berlin Brown
(ramaza3 on freenode)
http://www.newspiritcompany.com
http://www.newspiritcompany.com/newforums
also checkout alpha version of botverse:
http://www.newspiritcompany.com:8086/universe_home
and other unicode characters. My question, how can I filter those out
while still leaving room for alpha-numeric and characters that are
typical of a URL or Title
For example I might get a URL with:
http://????????????-????????
title = ??????
where the ? represents some unicode character
I want to filter these out, but leave room for non-alphanumeric characters:
http://www.yahoo.com
--
Berlin Brown
(ramaza3 on freenode)
http://www.newspiritcompany.com
http://www.newspiritcompany.com/newforums
also checkout alpha version of botverse:
http://www.newspiritcompany.com:8086/universe_home