querying using HTTP

P

Peter Bailey

I'm querying a graphics database using http. On my server and on my PC,
I can successfully run a simple script that checks for the existence of
files in different formats—PDF, PNG, and TIFF. But, when my colleagues
run this same script, they get the error I display below. I've literally
copied the entire directory structure of my c:\ruby setup to their PCs,
and, they still get this error message. Can someone help me with this?

Thanks,
Peter

C:\Users\hv0797.INTDOM>ruby c:\scripts\checkorca.rb %1

C:/ruby/lib/ruby/1.9.1/net/http.rb:2212:in `error!': 400 "Bad Request"
(Net::HTTPServerExeption)
from C:/ruby/lib/ruby/1.9.1/net/http.rb:2221:in `value'
from c:/scripts/checkorca.rb:23:in `block in <main>'
from C:/ruby/lib/ruby/1.9.1/net/http.rb:564:in `start'
from C:/ruby/lib/ruby/1.9.1/net/http.rb:453:in `start'
from c:/scripts/checkorca.rb:12:in `<main>'
from C:/ruby/lib/ruby/1.9.1/net/http.rb:2221:in `value'
from c:/scripts/checkorca.rb:23:in `block in <main>'
from C:/ruby/lib/ruby/1.9.1/net/http.rb:564:in `start'
from C:/ruby/lib/ruby/1.9.1/net/http.rb:453:in `start'
from c:/scripts/checkorca.rb:12:in `<main>'
 
B

Brian Candler

Peter said:
I'm querying a graphics database using http.

... and ruby 1.9.1, it appears
On my server and on my PC,
I can successfully run a simple script that checks for the existence of
files in different formats—PDF, PNG, and TIFF. But, when my colleagues
run this same script, they get the error I display below.

ruby 1.9's runtime behaviour of anything using String varies depending
on the environment it runs in. It is quite difficult to get it behave
sanely, and is such a mess that I stick with ruby 1.8. I think this is
particularly likely to be your problem given that you are handling
binary data.

You could try looking at the body of the 400 response to see if it has
any more detail about what's gone wrong (that is, check Response#body
before Response#value), or you could use wireshark to look at the actual
packets going back and forth. Compare what you see on the working
machine with the non-working one.

If you are reading any data from local files on disk before posting, you
could try File.open(name,"rb"). You could also try adding "# encoding:
UTF-8" to the top of your source file, and also running your script
using ruby -Kn script.rb

If those don't work, then I'd suggest you stick with ruby 1.8 for the
next year or so.
 
P

Peter Bailey

Brian said:
... and ruby 1.9.1, it appears


ruby 1.9's runtime behaviour of anything using String varies depending
on the environment it runs in. It is quite difficult to get it behave
sanely, and is such a mess that I stick with ruby 1.8. I think this is
particularly likely to be your problem given that you are handling
binary data.

You could try looking at the body of the 400 response to see if it has
any more detail about what's gone wrong (that is, check Response#body
before Response#value), or you could use wireshark to look at the actual
packets going back and forth. Compare what you see on the working
machine with the non-working one.

If you are reading any data from local files on disk before posting, you
could try File.open(name,"rb"). You could also try adding "# encoding:
UTF-8" to the top of your source file, and also running your script
using ruby -Kn script.rb

If those don't work, then I'd suggest you stick with ruby 1.8 for the
next year or so.

Thanks, Brian. Well, ironically, I've got 1.9 on my PC, and it works,
and, I'm running 1.8.6 on my server, and it works there, too. My
assistant's PC now, too, using 1.9. I increased her permissions on the
c:\ruby folder on her PC. That seemed to do it. But, our other two
colleagues still can't get it to work. We keep getting the error above.
We've got them on 1.9 now. I'm going to try some of your suggestions.
 
J

James Gray

... and ruby 1.9.1, it appears


ruby 1.9's runtime behaviour of anything using String varies depending
on the environment it runs in.

That's not accurate.

Certain encoding options have default settings relating to the =20
environment they run in, but none of that matters if specify the =20
desired encodings for your source and/or IO objects. These defaults =20
are provided as conveniences so that simple scripting can fit in =20
naturally with the rest of the environment.

Removing the defaults would just mean more work for the programmer as =20=

you would be forced to specify all encodings even in situations where =20=

a default makes sense. I also don't think it's bad to say that a =20
programmer must specify the encoding of data they wish to read. How =20
in the world can we expect Ruby to get a gets() call right on a =20
UTF-16LE file without us providing a warning about what the data is?
It is quite difficult to get it behave sanely, and is such a mess =20
that I stick with ruby 1.8.

Ruby 1.8 had a single global variable that, when set, changed the =20
behavior of all code in the interpreter, including the stuff I didn't =20=

write. I hope that isn't your idea of a "sane" system.

Ruby 1.9 probably does require us to learn the bare minimum about how =20=

character encodings are handled. It's about time. How many years =20
have we tried to get by with crossed fingers and a prayer that it =20
would just work out? Character encodings should have been required =20
reading long before now.

For those who are ready to learn the basics, the new Pickaxe has a =20
solid 13 page introduction. It doesn't take long to work through and =20=

it covers the important stuff. If you want to go farther, I've =20
covered character encoding basics, the Ruby 1.8 system, and the new =20
1.9 system a bit deeper in a series of posts to my blog:

http://blog.grayproductions.net/articles/understanding_m17n
If those don't work, then I'd suggest you stick with ruby 1.8 for the
next year or so.

For years the Ruby community has begged for more speed and robust =20
character encoding support. The core team delivered that and much =20
more this January with a production release that's substantially =20
faster and has a very powerful new encoding engine. To repay their =20
monumental efforts, we complain and urge people to stick with Ruby =20
1.8. We must truly be the most ungrateful lot of bums ever.

For what it's worth, I believe Brian is wrong. I think the best thing =20=

we can do as a community is to move everything to Ruby 1.9 as fast as =20=

possible. If there are barriers to us doing that, we need find ways =20
to tear them down. There are a lot more plusses than minuses, I =20
promise. Ruby 1.9 is ready for us. Come on in, the water is fine!

James Edward Gray II
 
B

Brian Candler

James said:
Certain encoding options have default settings relating to the
environment they run in, but none of that matters if specify the
desired encodings for your source and/or IO objects.

Put another way: write extra code to defend against environment
pollution, and hope that you haven't forgotten any places where it is
required.
Ruby 1.8 had a single global variable that, when set, changed the
behavior of all code in the interpreter, including the stuff I didn't
write. I hope that isn't your idea of a "sane" system.

Ruby 1.8 treated strings as sequences of 8-bit bytes unless explicitly
told otherwise. That is sane.

Here is an example of the sort of problems still being caused by String
in 1.9:
http://groups.google.com/group/rack-devel/browse_thread/thread/99628ed37ac5f5b
Ruby 1.9 probably does require us to learn the bare minimum about how
character encodings are handled.

Which is rather difficult if it's not documented. Yes I know there have
been some third-party efforts, including your own, but I have yet to see
anything which is anywhere near complete.
It's about time. How many years
have we tried to get by with crossed fingers and a prayer that it
would just work out? Character encodings should have been required
reading long before now.

Sure, people who process text need to understand character encodings.
But text is a small subset of data. When you're processing JPEGs or PDFs
or ASN1 certificates or HTTP POSTs, you just want something that's 8-bit
clean.
For years the Ruby community has begged for more speed and robust
character encoding support. The core team delivered that and much
more this January with a production release that's substantially
faster and has a very powerful new encoding engine.

Nobody's complaining about improved performance. I'm saying there is
still much pain to be had by using 1.9, and advising that people may
wish to avoid the pain until (hopefully) most of it has gone away.
Anyone who has had no pain with 1.9 or libraries which don't work under
1.9 is free to speak up.
To repay their
monumental efforts, we complain and urge people to stick with Ruby
1.8. We must truly be the most ungrateful lot of bums ever.

Who's "we"? Speakly only for myself, I didn't ask for String to be
changed in this way. And does the amount of effort which went in mean
that I am forbidden from saying that I don't like the result?

Regards,

Brian.
 
J

James Gray

Ruby 1.8 treated strings as sequences of 8-bit bytes unless explicitly
told otherwise. That is sane.

I'm pretty sure you are in the minority with this opinion. You really =20=

like this?

$ ruby -e 'p "R=E9sum=E9"[0..1]'
"R\303"

How often is that going to be the desired result?

There were a lot of complaints about Ruby's encoding support over the =20=

years. A lot. I'm pretty sure if we had all been saying, "Matz, we =20
love the it's-all-bytes approach," there would be no m17n. That just =20=

wasn't the case though.
Here is an example of the sort of problems still being caused by =20
String
in 1.9:
=
http://groups.google.com/group/rack-devel/browse_thread/thread/99628ed37ac=
5f5b

And if we combed the Web for documented problems caused by the Ruby =20
1.8 system, do you think we would find a few of those? I would be =20
willing to bet I've seen one encoding related problem post every =20
couple of weeks I've been on Ruby Talk.

I just did a quick search in one place users are prone to report =20
issues with my FasterCSV library and about 47% of all the issues ever =20=

reported were character encoding issues.
Which is rather difficult if it's not documented. Yes I know there =20
have
been some third-party efforts, including your own, but I have yet to =20=
see
anything which is anywhere near complete.

Can you list what's not yet covered in my blog series? I'm aware of =20
two very small things that I've never once seen used in the wild. =20
I'll add those, but let's say I feel my current coverage is about 98% =20=

complete. How am I still failing to meet your needs?
Sure, people who process text need to understand character encodings.
But text is a small subset of data. When you're processing JPEGs or =20=
PDFs
or ASN1 certificates or HTTP POSTs, you just want something that's 8-=20=
bit
clean.

Ruby 1.9 has an encoding for that too and it's very well documented.
Nobody's complaining about improved performance.

That's a relief. Now we just need to get everyone over to Ruby 1.9 =20
and those issues will be a thing of the past. Thus, it would make me =20=

happy if you stop telling people not to do that.
I'm saying there is still much pain to be had by using 1.9, and =20
advising that people may wish to avoid the pain until (hopefully) =20
most of it has gone away.
Anyone who has had no pain with 1.9 or libraries which don't work =20
under
1.9 is free to speak up.

I am speaking up. That's the point. :)

I had quite a bit of pain when I adapted FasterCSV to be the standard =20=

CSV library. There were two reasons for that. First, the m17n =20
implementation was still pretty raw and I ran into bugs and =20
complications. Those are almost completely resolved now. The second =20=

reason was the lack of documentation, so I wrote some from what I had =20=

learned in converting the code.

Now there's a lot less pain.
Who's "we"? Speakly only for myself, I didn't ask for String to be
changed in this way. And does the amount of effort which went in mean
that I am forbidden from saying that I don't like the result?

It means that I think your comments are doing harm to the 1.9 =20
migration and I can't find the good you are doing to balance that.

James Edward Gray II
 
B

Brian Candler

James said:
I'm pretty sure you are in the minority with this opinion.

Quite possibly :)
You really like this?

$ ruby -e 'p "R�sum�"[0..1]'
"R\303"

How often is that going to be the desired result?

Well, if I were extracting the first two bytes from a JPEG header, that
would be exactly what I'd expect. I've very rarely wanted to extract the
first two *characters* from a string. I can think of one example: a
string truncation helper in a web page.

def trunc(string, maxlen=50)
if string.length > maxlen
string = string[0,maxlen-3] + "..."
end
string
end

I'll certainly agree that's something you'd want to do, and /.{,50}/u is
an ugly way of doing it. In any case, I'm not saying there shouldn't be
any m17n support, or even that tagging strings with encodings is in
itself wrong, as long as the semantic implications are made clear.

The number one bugbear I have is that (unless you take a number of
specific steps to avoid it), program behaviour is inconsistent. You can
run the *same* program with exactly the *same* input data on two
different machines, and they will process it differently, possibly even
crashing in one case. If someone has a problem running your app, it's
now insufficient just to ask what O/S and ruby version they are running
in order to be able to replicate the problem.

Consider an app which is bundled with HTML templates, which the app
reads using File.read(). The templates happen to be written using, say,
UTF-8. It all works fine on my machine, and passes all tests. However it
barfs when run on someone else's machine, because their environment
variables are different.

I think that LC_ALL is a very poor predictor of what encoding a specific
file is in. Ruby doesn't trust it for source files (it uses #encoding
tags instead), so why trust it for data?

Now, if the default external encoding were fixed as (say) UTF-8, that
would be more sane. The default behaviour would then be the same on any
machine where ruby is installed:

- File#gets returns a string with encoding='UTF-8'
- File#read returns a string with encoding='BINARY'

unless explicitly overridden, e.g. when the file is opened. So if these
hypothetical HTML templates are written in ISO-8859-15, you would be
forced to declare this in your program.

In any case, I'm used to having my data treated as binary unless I
explicitly ask otherwise. e.g.

$ echo "ßßß" | wc
1 1 7
$ echo "ßßß" | wc -m
4

[Ubuntu Hardy, default setup with LANG=en_GB.UTF-8]
Can you list what's not yet covered in my blog series?

I've posted a bunch of lists before. Every time I try out some feature,
because it's undocumented, the test turns up more questions than it
answers. Maybe I really should go ahead and document it all, but that
would be a very large project.

Trying out in irb used to be a good way to test ruby, but that's no good
in ruby 1.9 because it's not consistent with script behaviour. For
example:

$ irb19
irb(main):001:0> "foo".encoding
=> #<Encoding:US-ASCII>
irb(main):002:0> /foo/.encoding
=> #<Encoding:US-ASCII>
irb(main):003:0> "fooß".encoding
=> #<Encoding:UTF-8>
irb(main):004:0> /fooß/.encoding
=> #<Encoding:UTF-8>

Now try running this program:

p "foo".encoding
p /foo/.encoding
p "fooß".encoding
p /fooß/.encoding

It barfs on the multi-byte chars. That's reasonable in the absence of
knowledge about the source file, so now add an #encoding line:

#encoding: UTF-8
p "foo".encoding
p /foo/.encoding
p "fooß".encoding
p /fooß/.encoding

and you still get a different answer to IRB. The first string gets an
encoding as UTF-8 instead of US-ASCII; and yet the /foo/ regexp gets an
encoding of US-ASCII in both cases.

This is compounded by the hidden state which remembers whether a
particular string is all 7-bit characters or not. That is, although
"foo" and "fooß" are both marked as having identical encoding UTF-8,
they are actually treated *differently* by the encoding rules. You have
to test using the #ascii_only? method. And yet a regexp literal
apparently follows a different rule. Except when you are in IRB.
It means that I think your comments are doing harm to the 1.9
migration and I can't find the good you are doing to balance that.

I don't think what I'm saying would stop any library author from
modifying their library to work with 1.9 if they so wish. They have to
make up their own minds.

I believe the worst long-term problems are likely to be C extensions. I
have seen no hints at all for C extension writers on how to handle
strings properly (especially the hidden ascii_only? state) so I believe
these are likely to have obscure bugs for some time.

Regards,

Brian.
 
J

James Gray

Consider an app which is bundled with HTML templates, which the app
reads using File.read(). The templates happen to be written using, =20
say,
UTF-8. It all works fine on my machine, and passes all tests. =20
However it
barfs when run on someone else's machine, because their environment
variables are different.

But you bundled those files. You know the encoding much better than =20
Ruby. Is it really too much to ask for?

html =3D File.read("my_template.html", external_encoding: "UTF-8")

That's more correct than any magic behavior would be and self-=20
documenting to boot.
- File#read returns a string with encoding=3D'BINARY'

File.binread() was added for exactly this purpose.
I've posted a bunch of lists before.

Yeah, I've read those. I responded to your last one yesterday telling =20=

you that I had no addressed all the concerns I saw in that one, plus =20
some more. I'm sure trying to help you.

I guess it's time for a new list of what you're still missing=85
Trying out in irb used to be a good way to test ruby, but that's no =20=
good
in ruby 1.9 because it's not consistent with script behaviour.

While I agree that IRb may need some more integration, there has =20
always been some minor differences between how code runs in it and how =20=

it runs in a real Ruby script. I don't think this means 1.9 isn't =20
read for the masses.

James Edward Gray II=
 
J

James Gray

Yeah, I've read those. I responded to your last one yesterday =20
telling you that I had no addressed all the concerns I saw in that =20
one, plus some more. I'm sure trying to help you.

I meant "had now addressed=85"

James Edward Gray II=
 
B

Brian Candler

James said:
But you bundled those files. You know the encoding much better than
Ruby. Is it really too much to ask for?

html = File.read("my_template.html", external_encoding: "UTF-8")

Sure, *if you remember* everywhere this is needed. If you don't, then
your program will work fine, and pass all your tests, until you run it
somewhere else and it dies.
 
J

James Gray

Sure, *if you remember* everywhere this is needed. If you don't, then
your program will work fine, and pass all your tests, until you run it
somewhere else and it dies.

Well, I definitely don't think this is the first case of that in Ruby
(or most other languages for that matter). Heck, fork() isn't cross-
platform and I love fork().

James Edward Gray II
 
D

David Masover

Well, I definitely don't think this is the first case of that in Ruby
(or most other languages for that matter). Heck, fork() isn't cross-
platform and I love fork().

Indeed, and there are win32-specific things on Windows. Even something as
simple as pathnames isn't universal unless you always use FIle.join -- or
better yet, Pathname. How often do you do that, instead of just:

open 'foo/bar.txt'

This is a weak example, now that Windows supports / as well as \ as a
directory delimiter, but I think I've made my point. Even Java programs have
platform-specific quirks, and this one is quite avoidable.

Given that most other software on a given system (including Perl) will obey a
default encoding, unless it has a specific reason to believe otherwise (like a
byte-order mark), I think it's reasonable for Ruby to do the same. Your
suggestion to default to UTF8 really only makes sense on English systems
(where encoding is likely to be set to that anyway) -- and even that doesn't
save you from having to specify binary for binary files.

For that matter, if you've written all your tests, and they pass on one
system, and fail on another, your tests are working as designed -- in this
case, exposing a platform-specific bug, either in your program or the
interpreter.
 
B

Brian Candler

James said:
Well, I definitely don't think this is the first case of that in Ruby
(or most other languages for that matter). Heck, fork() isn't cross-
platform and I love fork().

But it's not a different "platform". Someone could be running exactly
the same version of Ruby under exactly the same operating system and
version, but with different localisation the program will break.

That's how this thread started: the bemused OP wrote

| But, when my colleagues
| run this same script, they get the error I display below. I've literally
| copied the entire directory structure of my c:\ruby setup to their PCs,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top