T
theAndroidGuy
Hi All,
Is there any specific way/standard APIs for converting any text to
Unicode format. Actually I'm trying to download an html page, for a
given URL, then extract the text[ This html page can be in any
language, specifically I'm working on non-english pages] and then post
that to Apache Solr for indexing. Now I want that whatever the content
may be I'll convert that to unicode and then send it to Solr for
indexing. I'm sure there must be standard way of converting text to
unicode format. Also I'd like to know the basic encoding format for
any webpage, I think most of the times the encoding happens to be
unicode utf-8 for non-english contents as well, but what if this is
not the case then how to convert that to unicode. Any suggestions
would be appreciated.
Thanks.
Is there any specific way/standard APIs for converting any text to
Unicode format. Actually I'm trying to download an html page, for a
given URL, then extract the text[ This html page can be in any
language, specifically I'm working on non-english pages] and then post
that to Apache Solr for indexing. Now I want that whatever the content
may be I'll convert that to unicode and then send it to Solr for
indexing. I'm sure there must be standard way of converting text to
unicode format. Also I'd like to know the basic encoding format for
any webpage, I think most of the times the encoding happens to be
unicode utf-8 for non-english contents as well, but what if this is
not the case then how to convert that to unicode. Any suggestions
would be appreciated.
Thanks.