Help with Iconv needed

M

Marcus Strube

Can someone tell me what it is that I'm getting wrong here with "iconv"?
I either get "IllegalSequence" or "äöüß" are not encoded properly when
using Iconv.conv while it looks good using backticks. ("IllegalSequence
right now with the second. ÄÖü with the first anytime...)

require 'rss/1.0'; require 'rss/2.0'; require 'open-uri'; require
"iconv"

#source = "http://www.sueddeutsche.de/app/service/rss/alles/rss.xml"
source = "http://www.welt.de/vermischtes/?service=Rss"

content = ""; open(source) { |s| content = s.read }; rss =
RSS::parser.parse(content, false)

rss.items.each do |item|
converted = `'#{item.title}' | iconv -c -f ISO-8859-1 -t UTF8`
puts(Iconv.conv('ISO-8859-1', 'UTF-8', item.title)); puts " "
end
 
M

MonkeeSage

Can someone tell me what it is that I'm getting wrong here with "iconv"?
I either get "IllegalSequence" or "äöüß" are not encoded properly when
using Iconv.conv while it looks good using backticks. ("IllegalSequence
right now with the second. ÄÖü with the first anytime...)

require 'rss/1.0'; require 'rss/2.0'; require 'open-uri'; require
"iconv"

#source = "http://www.sueddeutsche.de/app/service/rss/alles/rss.xml"
source = "http://www.welt.de/vermischtes/?service=Rss"

content = ""; open(source) { |s| content = s.read }; rss =
RSS::parser.parse(content, false)

rss.items.each do |item|
converted = `'#{item.title}' | iconv -c -f ISO-8859-1 -t UTF8`
puts(Iconv.conv('ISO-8859-1', 'UTF-8', item.title)); puts " "
end

Not sure about the error, but I see two issues. First, this is an
error...

`'#{item.title}' | iconv -c -f ISO-8859-1 -t UTF8`

I think you meant to echo the vale to the pipe...

`echo -n '#{item.title}' | iconv -c -f ISO-8859-1 -t UTF8`

Second, iso-8859-1 to utf-8 doesn't appear to be the proper encoding.
The following string...

Düsseldorf: Prominentengedrängel bei der Bambi-Verleihung

...is encoded as...

"D\303\203\302\274sseldorf: Prominentengedr\303\203\302\244ngel bei
der Bambi-Verleihung"

...by iconv from the command prompt. But it should be...

"D\303\274sseldorf: Prominentengedr\303\244ngel bei der Bambi-
Verleihung"

I'm not good with encodings and utf-8, so I can't tell you the
problem. I just know "umlaut u" should be 0xc3bc (\303\274), but it's
not doing that.

Regards,
Jordan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,142
Latest member
DewittMill
Top