Mechanize output:strange number

P

Pen Ttt

require 'mechanize'
agent = WWW::Mechanize.new
page =
agent.get('http://hd.openv.com/tv_play-hddoc_20090703_7064487.html')


i get the following output:

#<WWW::Mechanize::page::Link
"\347\210\261\345\215\241\346\261\275\350\275\246"
"http://newcar.xcar.com.cn/">
#<WWW::Mechanize::page::Link
"58\347\224\237\346\264\273\345\234\210"
"http://q.58.com/">
#<WWW::Mechanize::page::Link
"\345\207\244\345\207\260\345\233\275\346\227\205"
"http://www.51tour.com/">
#<WWW::Mechanize::page::Link
"\351\236\255\347\211\233\345\243\253"
"http://www.bianews.com/">
#<WWW::Mechanize::page::Link
"\345\244\232\347\216\251\346\270\270\346\210\217\347\275\221"
"http://www.duowan.com/">
#<WWW::Mechanize::page::Link
"\344\271\220\351\200\224\346\227\205\346\270\270\347\275\221"


what is the meanning of
"\347\210\261\345\215\241\346\261\275\350\275\246"?
 
R

Ryan Davis

require 'mechanize'
agent =3D WWW::Mechanize.new
page =3D
agent.get('http://hd.openv.com/tv_play-hddoc_20090703_7064487.html')
=20
=20
i get the following output:
=20
#<WWW::Mechanize::page::Link
"\347\210\261\345\215\241\346\261\275\350\275\246"
"http://newcar.xcar.com.cn/">
#<WWW::Mechanize::page::Link
"58\347\224\237\346\264\273\345\234\210"
"http://q.58.com/">
#<WWW::Mechanize::page::Link
"\345\207\244\345\207\260\345\233\275\346\227\205"
"http://www.51tour.com/">
#<WWW::Mechanize::page::Link
"\351\236\255\347\211\233\345\243\253"
"http://www.bianews.com/">
#<WWW::Mechanize::page::Link
"\345\244\232\347\216\251\346\270\270\346\210\217\347\275\221"
"http://www.duowan.com/">
#<WWW::Mechanize::page::Link
"\344\271\220\351\200\224\346\227\205\346\270\270\347\275\221"
=20
=20
what is the meanning of
"\347\210\261\345\215\241\346\261\275\350\275\246"?

With `curl -I $url` you can quickly see that the page is encoded UTF-8. =
If you take off the "-I" you can see the content. In my terminal, it =
displays very prettily:
<div class=3D"img"><a href=3D"tv_show-8210.html" =
target=3D"_hdplay"><img alt=3D"Discovery=E5=85=A8=E7=90=83=E9=A6=96=E9=80=89=
=E7=BB=BF=E4=BD=8F=E5=AE=B6" title=3D"Discovery=E5=85=A8=E7=90=83=E9=A6=96=
=E9=80=89=E7=BB=BF=E4=BD=8F=E5=AE=B6" =
src=3D"http://swf1.openv.tv/programme/dvdprogramme/20100511/20100511_movie=
play_upload_105856370_small.jpg" width=3D"151" height=3D"113" =
/></a></div>

You're probably running your script with the default ASCII encoding. Try =
this out:
ruby -KU -rubygems -e 'require "mechanize"; p =
Mechanize.new.get("http://hd.openv.com/tv_play-hddoc_20090703_7064487.html=
").links'
 
R

Ryan Davis

my system:ubuntu10.04+firefox
shell terminal :utf-8
how to set my irb terminal with the default ASCII encoding?

I'm kinda surprised that `irb` doesn't have a -K flag... so read up on =
$KCODE.
 
P

Pen Ttt

irb(main):001:0> $KCODE = "U"
=> "U"
when i add $KCODE = "U",the output is ok.
i input in shell :irb,need input $KCODE = "U" everytime?
can i set it?when i open irb , the code is U?
 
J

John W Higgins

[Note: parts of this message were removed to make it a legal post.]

Good Afternoon,

irb(main):001:0> $KCODE = "U"
=> "U"
when i add $KCODE = "U",the output is ok.
i input in shell :irb,need input $KCODE = "U" everytime?
can i set it?when i open irb , the code is U?
You need a .irbrc file which will run automatically for you when you start
IRB. You should just be able to put your $KCODE= 'U' inside a .irbrc (note
the dot (.) please) file within your home folder.

John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,262
Messages
2,571,056
Members
48,769
Latest member
Clifft

Latest Threads

Top