irb and ruby giving different results

Discussion in 'Ruby' started by Nit Khair, Nov 5, 2008.

  1. Nit Khair

    Nit Khair Guest

    in IRB,
    ASCII = (0..255).map{|c| c.chr }
    PRINTABLE = ASCII.grep(/[[:print:]]/)
    PRINTABLE.length
    >>> 191


    However, inside the ruby program PRINTABLE.length only gives 95 !! ???

    #!/opt/local/bin/ruby
    ASCII = (0..255).map{|c| c.chr }
    puts(ASCII.length)
    PRINTABLE = ASCII.grep(/[[:print:]]/)
    puts(PRINTABLE.length)
    # -> 95 instead of 191

    (Using ruby 1.8.7 on OS X 10.5.5, powerpc). Ran both from same Terminal.
    Both use /opt/local/bin/ruby.

    Why this difference? I ran irb with -f (so irbrc would not be loaded and
    still got the same result, so its not some require that is causing the
    difference).

    p.s. sorry for cross-posting from roguelike thread -- this is getting
    lost there.
    --
    Posted via http://www.ruby-forum.com/.
     
    Nit Khair, Nov 5, 2008
    #1
    1. Advertising

  2. [Note: parts of this message were removed to make it a legal post.]

    This won't help much, but when I executed:

    >
    > ASCII = (0..255).map{|c| c.chr }
    > PRINTABLE = ASCII.grep(/[[:print:]]/)
    > PRINTABLE.length
    > >>> 191

    >


    in irb, I got 95 on my ruby 1.8.6 (i386-mswin32) running on an XP box.

    What were the 191 characters displayed when computed the PRINTABLE
    expression?

    As a totally random theory, I wonder if [[:print:]] might take into account
    what device is attached to stdout and recognize that your terminal is
    capable of and use that to decide what is printable or not.

    It would be quite surprising (and, perhaps unfortunate), if that's what's
    going on, but it might explain what you saw.

    A slightly more plausible explanation might be that [[:print:]] alters its
    behavior based on the TERM environment variable. What is ENV["TERM"] in the
    two cases?

    That's all I've got. I warned you at the beginning that this wouldn't help
    much.

    --wpd
     
    Patrick Doyle, Nov 5, 2008
    #2
    1. Advertising

  3. Nit Khair

    Nit Khair Guest

    Patrick Doyle wrote:
    > This won't help much, but when I executed:
    >
    >>
    >> ASCII = (0..255).map{|c| c.chr }
    >> PRINTABLE = ASCII.grep(/[[:print:]]/)
    >> PRINTABLE.length
    >> >>> 191

    >>

    >
    > in irb, I got 95 on my ruby 1.8.6 (i386-mswin32) running on an XP box.
    >
    > What were the 191 characters displayed when computed the PRINTABLE
    > expression?
    >
    >
    > A slightly more plausible explanation might be that [[:print:]] alters
    > its
    > behavior based on the TERM environment variable. What is ENV["TERM"] in
    > the
    > two cases?
    >
    > That's all I've got. I warned you at the beginning that this wouldn't
    > help
    > much.
    >
    > --wpd


    I mentioned that i used the same terminal to verify that it was not a
    terminal issue. I tried both out with TERM=screen (my usual), then
    xterm-color, xterm-256color and perhaps VT100 and VT200 as well.

    One of the characters in the 191 for example is 165 or "\245" which is
    the code generated by Alt-A on my MAC OSX (powerpc, 10.5.5, darwin).

    (This is when i have *not* enabled "Use alt as meta" - if you dont know
    what that is just ignore, its a MAC default).

    Here's the dump, since you asked:

    irb(main):030:0> PRINTABLE
    [" ", "!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", "-",
    ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";",
    "<", "=", ">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I",
    "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W",
    "X", "Y", "Z", "[", "\\", "]", "^", "_", "`", "a", "b", "c", "d", "e",
    "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s",
    "t", "u", "v", "w", "x", "y", "z", "{", "|", "}", "~", "\240", "\241",
    "\242", "\243", "\244", "\245", "\246", "\247", "\250", "\251", "\252",
    "\253", "\254", "\255", "\256", "\257", "\260", "\261", "\262", "\263",
    "\264", "\265", "\266", "\267", "\270", "\271", "\272", "\273", "\274",
    "\275", "\276", "\277", "\300", "\301", "\302", "\303", "\304", "\305",
    "\306", "\307", "\310", "\311", "\312", "\313", "\314", "\315", "\316",
    "\317", "\320", "\321", "\322", "\323", "\324", "\325", "\326", "\327",
    "\330", "\331", "\332", "\333", "\334", "\335", "\336", "\337", "\340",
    "\341", "\342", "\343", "\344", "\345", "\346", "\347", "\350", "\351",
    "\352", "\353", "\354", "\355", "\356", "\357", "\360", "\361", "\362",
    "\363", "\364", "\365", "\366", "\367", "\370", "\371", "\372", "\373",
    "\374", "\375", "\376", "\377"]
    --
    Posted via http://www.ruby-forum.com/.
     
    Nit Khair, Nov 5, 2008
    #3
  4. Nit Khair wrote:
    > in IRB,
    > ASCII = (0..255).map{|c| c.chr }
    > PRINTABLE = ASCII.grep(/[[:print:]]/)
    > PRINTABLE.length
    >>>> 191

    >
    > However, inside the ruby program PRINTABLE.length only gives 95 !! ???
    >
    > #!/opt/local/bin/ruby
    > ASCII = (0..255).map{|c| c.chr }
    > puts(ASCII.length)
    > PRINTABLE = ASCII.grep(/[[:print:]]/)
    > puts(PRINTABLE.length)
    > # -> 95 instead of 191
    >
    > (Using ruby 1.8.7 on OS X 10.5.5, powerpc). Ran both from same Terminal.
    > Both use /opt/local/bin/ruby.
    >
    > Why this difference?


    FWIW, I get 95 with irb187 under Ubuntu Dapper.

    Looking at source code, the [[:print:]] character class uses isascii(c)
    && isprint(c)

    man isprint says:

    NOTE
    The details of what characters belong into which class depend
    on the
    current locale. For example, isupper() will not recognize an
    A-umlaut
    (Ä) as an uppercase letter in the default C locale.

    So look at what ENV.grep(/^LC/) shows. You could try setting
    ENV['LC_ALL']='C' in irb, or export LC_ALL=C before running it. Or try
    'POSIX' instead of 'C'.

    Finally, be completely sure that your irb is running the right ruby.
    Check RUBY_VERSION within irb.
    --
    Posted via http://www.ruby-forum.com/.
     
    Brian Candler, Nov 5, 2008
    #4
  5. Nit Khair

    Nit Khair Guest

    Brian Candler wrote:
    > FWIW, I get 95 with irb187 under Ubuntu Dapper.
    >
    > Looking at source code, the [[:print:]] character class uses isascii(c)
    > && isprint(c)
    >
    > man isprint says:
    >
    > NOTE
    > The details of what characters belong into which class depend
    > on the
    > current locale. For example, isupper() will not recognize an
    > A-umlaut
    > (Ä) as an uppercase letter in the default C locale.
    >
    > So look at what ENV.grep(/^LC/) shows. You could try setting
    > ENV['LC_ALL']='C' in irb, or export LC_ALL=C before running it. Or try
    > 'POSIX' instead of 'C'.
    >
    > Finally, be completely sure that your irb is running the right ruby.
    > Check RUBY_VERSION within irb.


    1.8.7 both.

    ENV.grep(/^LC/) show nothing in both irb and ruby
    ENV['LC_ALL']='C' 'POSIX' etc has no effect in both

    However, "echo $LC_ALL" on my prompt gives en_US.UTF-8.
    So when i did LC_ALL='C', i get only 95 in both ruby and irb.

    Is there any way i get can ruby to also give 191 ?
    Tried ENV['LC_ALL']='en_US.UTF-8' at the start of my ruby program but it
    had no effect. Anyway, thanks for pointing this out.
    --
    Posted via http://www.ruby-forum.com/.
     
    Nit Khair, Nov 5, 2008
    #5
  6. Nit Khair wrote:
    > ENV.grep(/^LC/) show nothing in both irb and ruby


    My bad; try

    ENV.select{|k,v| k=~/^LC/}

    > ENV['LC_ALL']='C' 'POSIX' etc has no effect in both
    >
    > However, "echo $LC_ALL" on my prompt gives en_US.UTF-8.
    > So when i did LC_ALL='C', i get only 95 in both ruby and irb.
    >
    > Is there any way i get can ruby to also give 191 ?


    Perhaps then:

    env LC_ALL=en_US.UTF-8 ruby foo.rb

    Also, looking through source: it seems that ruby doesn't normally call
    setlocale() by itself, but maybe some third-party library which irb is
    invoking is doing this for you. "readline" is a likely candidate. So you
    could try:

    require 'readline'

    in your ruby file. Or check $LOADED_FEATURES in irb and try loading the
    same modules in your ruby file.
    --
    Posted via http://www.ruby-forum.com/.
     
    Brian Candler, Nov 5, 2008
    #6
  7. Nit Khair

    Nit Khair Guest

    Brian Candler wrote:
    > Nit Khair wrote:
    >> ENV.grep(/^LC/) show nothing in both irb and ruby

    >
    > My bad; try
    >
    > ENV.select{|k,v| k=~/^LC/}
    >
    >> ENV['LC_ALL']='C' 'POSIX' etc has no effect in both
    >>
    >> However, "echo $LC_ALL" on my prompt gives en_US.UTF-8.
    >> So when i did LC_ALL='C', i get only 95 in both ruby and irb.
    >>
    >> Is there any way i get can ruby to also give 191 ?

    >
    > Perhaps then:
    >
    > env LC_ALL=en_US.UTF-8 ruby foo.rb
    >
    > Also, looking through source: it seems that ruby doesn't normally call
    > setlocale() by itself, but maybe some third-party library which irb is
    > invoking is doing this for you. "readline" is a likely candidate. So you
    > could try:
    >
    > require 'readline'
    >
    > in your ruby file. Or check $LOADED_FEATURES in irb and try loading the
    > same modules in your ruby file.


    Very strange:

    1. > ENV.select{|k,v| k=~/^LC/} give en_US.UTF-8 in both irb and ruby. I
    get LC_ALL AND LC_CTYPE.

    2. > env LC_ALL=en_US.UTF-8 ruby foo.rb
    still gives 95

    3. I copied $LOADED_FEATURES, and then tried out (I hope i have this
    correct):

    ["enumerator.so", "e2mmap.rb", "irb/init.rb", "irb/workspace.rb",
    "irb/context.rb", "irb/extend-command.rb", "irb/output-method.rb",
    "irb/notifier.rb", "irb/slex.rb", "irb/ruby-token.rb",
    "irb/ruby-lex.rb", "readline.bundle", "irb/input-method.rb",
    "irb/locale.rb", "irb.rb", "irb/completion.rb",
    "irb/ext/save-history.rb", "stringio.bundle", "yaml/error.rb",
    "syck.bundle", "yaml/ypath.rb", "yaml/basenode.rb", "yaml/syck.rb",
    "yaml/tag.rb", "yaml/stream.rb", "yaml/constants.rb", "rational.rb",
    "date/format.rb", "date.rb", "yaml/rubytypes.rb", "yaml/types.rb",
    "yaml.rb"].each do |rr|

    require "#{rr}"
    end

    I still get 95.
    --
    Posted via http://www.ruby-forum.com/.
     
    Nit Khair, Nov 5, 2008
    #7
  8. Nit Khair wrote:
    > I still get 95.


    Possibly readline isn't calling setlocale until you actually
    invoke/initialise the library.

    Here's an alternative test. Install the RubyInline gem, and then stick
    this in front of your test program:

    require 'rubygems'
    require 'inline'

    class MyTest

    inline do |builder|
    builder.include '<locale.h>'
    builder.c "
    void set_locale(void) {
    setlocale(LC_ALL, 0);
    }"

    end
    end

    MyTest.new.set_locale

    If that works, you can remove the dependency on the LC_ALL environment
    variable by changing to: setlocale(LC_ALL, "en_US.UTF-8"); or whatever.

    However, this dependence on the C stdlib's half-baked idea of "locale"
    is very hairy. I understand why Ruby doesn't call setlocale() normally -
    it means that at least the normal behaviour is (a) sane, and (b) not
    affected randomly by global environment variable settings.

    To be honest, if you want a character class which always matches 0x20 to
    0x7e and 0xa0 to 0xff, then you might as well just say so directly:

    [\x20-\x7e\xa0-\xff]
    --
    Posted via http://www.ruby-forum.com/.
     
    Brian Candler, Nov 7, 2008
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sixteenmillion

    The giving that keeps on giving

    sixteenmillion, Nov 19, 2007, in forum: C Programming
    Replies:
    0
    Views:
    431
    sixteenmillion
    Nov 19, 2007
  2. Sam Stephenson
    Replies:
    1
    Views:
    228
    Andrew Walrond
    Jun 18, 2005
  3. Replies:
    1
    Views:
    161
    Florian Groß
    Oct 26, 2005
  4. anne001
    Replies:
    1
    Views:
    277
    anne001
    Jun 27, 2006
  5. John Wu
    Replies:
    1
    Views:
    131
    Jean-Julien Fleck
    Feb 24, 2010
Loading...

Share This Page