slice! invalid byte sequence in UTF-8

M

Marek Kis

Hello

I am started my adventures with Ruby I want to write simple parser:

if RUBY_VERSION =~ /1.9/
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
end

url = URI.parse('example url')

response = Net::HTTP.start(url.host, url.port) do |http|
http.get(url.path)
end

main_page = response.body
links = main_page.slice!(/<table class="regions">.+<\/table>/)


I am getting error: parser.rb:17:in `slice!': invalid byte sequence in
UTF-8 (ArgumentError)

Could somebody explain me how to resolve this problem?
All solutions that I found doesn't work for me.

Regards
 
I

Iñaki Baz Castillo

2011/3/3 Marek Kis said:
main_page =3D response.body
links =3D main_page.slice!(/<table class=3D"regions">.+<\/table>/)

Add this line to check the got body:

puts main_page.inspect

--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 
M

Marek Kis

Everything seems to looks ok, any strange maybe :

<title>Biura nieruchomo\xB6ci | Agencje nieruchomo\xB6ci</title>

polish letters in page are problem ?
 
I

Iñaki Baz Castillo

2011/3/3 Marek Kis said:
Everything seems to looks ok, any strange maybe :

<title>Biura nieruchomo\xB6ci | Agencje nieruchomo\xB6ci</title>

polish letters in page are problem ?

Maybe such page is not encoded in UTF8.

--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 
M

Marek Kis

I forgot about checking encoding.

I put Encoding.default_external = Encoding::UTF_8 because without this I
got the same error.

Any chance that it will be work with iso-8859-2?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,053
Messages
2,570,431
Members
47,075
Latest member
TysonV438

Latest Threads

Top