Lots of noise about user agent strings

P

Peter Michaux

The HTTP Accept-Encoding header sent with the request would seem
like the obvious place to start (as that is precisely what it is
for).

I believe that the issue is that IE6 claims it can accept gzip but in
actual fact it cannot due to a decompression bug. This bug may only
apply to files over a certain size. This leads to the use of the user
agent string.
Incredible, and incredibly foolish as HTTP very explicitly allows
proxies to change the encoding. That is, if a client cannot handle
gzip but the proxy can it can ask the server for gzip, decompress
it and send the identity encoded result to the client. It could
also do this the other way around, but it would be unlikely
that doing so would be seen as a good idea. And it could also
disregard any client preference for a compressed encoding and only
make identity requests to servers itself.

So a proxy may or may not send on the client's UA string or
substitute an alternative (which does not matter as the UA string
is arbitrary) and it may or may not impose the same encoding
limitations as the client. That would make looking at the UA
string at all in this context extremely foolish. Indeed more
foolish that ignoring q values in the Accept header when content
negotiating HTML/XHTML.

Interesting. I do need to read these documents more.

[snip]

Thanks,
Peter
 
M

Michael Wojcik

That's dubious. Modem compression has to be real-time, whereas
general-purpose compression is often run out-of-band. When you gzip,
you usually don't do it while sending the data.

Consequently, modem compression (LAP-M with BTLZ, V.44, MNP-5, or
whatever) has to make different trade-offs than general-purpose
compression. The modem compression standards use smaller dictionaries
and windows than the most aggressive general-purpose LZ compressors
(eg level-9 gzip).

For that matter, the modem compression standards I'm familiar with are
not optimized for text; they're general-purpose adaptive entropy encoders.

True, unless you're using a software modem (like the so-called
"Winmodems").
I didn't know modems do this. There must be a standard compression
algorithm to ensure the receiver knows how to decompress.

Yes, since the late 1980s or early 1990s. (I don't recall the exact
history and a trivial Google search didn't turn one up in the first
few hits.) Look up MNP-5 (an early, widely-supported proprietary
protocol), V.42bis (the first ITU standard that included compression),
BTLZ (the version of Lempel-Ziv used in V.42bis), and V.44 (a later
and more aggressive compressor).

Compression for modems came along shortly after decent synchronous
protocols (most notably LAPM, an asymmetric HDLC protocol) were
introduced, getting rid of the async framing overhead and allowing for
decent blocking of data.

Any growth should be negligible. Modem compression protocols have
uncompressed modes.
Hmm. This is quite contrary to the current popular thought about
gzipping JavaScript before sending it over the wire.

Actually, it isn't, if you study the subject in a bit more depth.

First, as I explained above, out-of-band compression with
general-purpose compressors typically will yield better compression
than what a POTS modem will achieve.

Second, many people are not using POTS modems for their connections.
Sometimes they're on uncompressed LANs. Sometimes they're using
high-speed (so-called "broadband", though that's a misnomer)
connections, like cable or DSL or FiOS. I'll admit that I haven't
looked into what kinds of compression are typically done on those
networks, but simply taking a bunch of received wisdom about POTS
modems and assuming it applies everywhere would be foolish.

Finally, precompressing the payload may have other performance
effects, because you produce smaller TCP segments. Besides saving
somewhat on TCP and IP overhead (probably negligible), you may improve
pacing (particularly if the client or server has poorly-written code
that is vulnerable to things like Nagle/Delayed-ACK Interaction),
reduce stack overhead on both ends, etc.
Steve Souders works for Yahoo!'s performance team and has made many
experiments.

There's a huge body of literature on TCP/IP performance. A handful of
experiments by "Yahoo!'s performance team" might give some decent
general guidelines, but they're not much better ground for
generalization than the "modems compress" folklore is.

The real rule is that there is no set of rules that adequately covers
all situations. If you find there's a performance problem for a
particular case, you can investigate that and often improve it; and
your improvements may result in better performance for most or all of
your users. But blanket recommendations like "compress Javascript" (or
don't) are the litanies of the cargo cults.
 
J

Jorge

If you were a system administrator and you wanted to send gzipped
JavaScript files to save bandwidth, how would you determine which
browsers could accept gzipped files and which could not?

Looking at the Accept-Encoding header.
 
V

VK

Looking at the Accept-Encoding header.

In the context of the discussion I dare to question what principal
difference one sees between altering over say (Gecko) about:config
User-Agent string and altering network.http.accept-encoding string? If
one doesn't have a trust to one chunk of info send by agent, why so
much trust to another chunk sent in the same request? ;-)
With a reliable stats proving the point of view, please ;-)
 
J

Jorge

In the context of the discussion I dare to question what principal
difference one sees between altering over say (Gecko) about:config
User-Agent string and altering network.http.accept-encoding string? If
one doesn't have a trust to one chunk of info send by agent, why so
much trust to another chunk sent in the same request? ;-)
With a reliable stats proving the point of view, please ;-)

Fool the server in order to receive a .gz that you can't deal with ?
Why would you want to ?

--Jorge.
 
L

Lasse Reichstein Nielsen

VK said:
In the context of the discussion I dare to question what principal
difference one sees between altering over say (Gecko) about:config
User-Agent string and altering network.http.accept-encoding string? If
one doesn't have a trust to one chunk of info send by agent, why so
much trust to another chunk sent in the same request? ;-)

Because users have, or have had, reason to fake the User-Agent string
(fake in the sens that it claims to be a user agent that the browser
is not in fact a version of), in order to receive usable content. They
have never had a reason to fake the accept-encoding string (fake in
the sense that it claims to accept an encoding that the browser does
not in fact understand), because there are no web sites where that
would give the client any advantage. By faking the accept-encoding,
they may get something they don't understand, i.e., the page might
stop working. There is no way it can make a broken page start working.
With a reliable stats proving the point of view, please ;-)

Nope. You find one page where faking the accept-encoding would help,
or any browser that by default claims to accept an encoding that it
doesn't understand, and I'd consider that there might be reason to not
trust it totally.

The accept-encoding header has, from the start, been meant as a way
for the browser to represent its exact capabilities to the server. The
servers have taken it as such. NO server has said "if you accept gz
compression, then you probably also accept arj compression, so I'll
just use that". That's the kind of faulty feature deduction that web
authors do based on the User Agent string, which was never officially
meant to specify capabilities (and even if it does, was used to exclude
those that were not recognized, which got us into this mess.).

/L
 
V

VK

Fool the server in order to receive a .gz that you can't deal with ?
Why would you want to ?

And why anyone would want to fool the server in order to receive a
content the browser cannot deal with?
 
V

VK

Because users have, or have had, reason to fake the User-Agent string

Correction: not users, but some browser producers. The only other
cases I am aware of are experienced programming involved users
removing some data from the distributer section of User-Agent string
in IE. This section is at the rightmost position of the string and was
originally intended to put the name of the computer seller with
Windows and IE pre-installed - so like an extra bonus to IE supporters
at IE3/IE4 times. Obviously Microsoft doesn't push this promo option
anymore but the relevant registry field still inherited - and some
programs once had a bad habit to add their own marker in there for the
relevant software sniffing and usage stats collection. This part of
the User-Agent string is always welcome to be checked and killed if
needed. Many PC maintenance programs do provide this option so users
do not have to dig in the registry manually.
(fake in the sens that it claims to be a user agent that the browser
is not in fact a version of), in order to receive usable content. They
have never had a reason to fake the accept-encoding string (fake in
the sense that it claims to accept an encoding that the browser does
not in fact understand), because there are no web sites where that
would give the client any advantage. By faking the accept-encoding,
they may get something they don't understand, i.e., the page might
stop working. There is no way it can make a broken page start working.


Nope. You find one page where faking the accept-encoding would help,
or any browser that by default claims to accept an encoding that it
doesn't understand, and I'd consider that there might be reason to not
trust it totally.

And do we have a site where User-Agent spoofing would help? Like with
this string welcome, with this - go away? I mean being reasonable so
on a browser no more than 6-7 years old?
The accept-encoding header has, from the start, been meant as a way
for the browser to represent its exact capabilities to the server. The
servers have taken it as such. NO server has said "if you accept gz
compression, then you probably also accept arj compression, so I'll
just use that". That's the kind of faulty feature deduction that web
authors do based on the User Agent string, which was never officially
meant to specify capabilities (and even if it does, was used to exclude
those that were not recognized, which got us into this mess.).

The Browser Wars was the fight of two: nobody cared of some 3rd or
4th. It is not fair to blame on developers that they didn't account
some possible neutral 3rd parties that will come someday from
somewhere.
It is not an argument, just a side comment.
 
J

Jorge

And why anyone would want to fool the server in order to receive a
content the browser cannot deal with?

Nope, it's :

Engañar al servidor para recibir un fichero .gz con el que no puedes
hacer nada ?
Por que ibas a querer hacer eso ?

8¬)
 
R

Richard Cornford

Peter said:
I believe that the issue is that IE6 claims it can accept
gzip but in actual fact it cannot due to a decompression bug.

IE 6 absolutely can accept gzip encoding, else that would have been
spotted long ago and be very well known by now.
This bug may only apply to files over a certain size.

Are we in the realm of rumour and folk-law or are there demonstrable
facts behind this assertion? Such as the precise size of the (compressed
or uncompressed) files that are supposed to be a problem, a Microsoft KB
article about it, a test case created by someone who's analytical skills
run to real cause and effect identification?

Beyond my normal cynicism, one of the reasons that I suspect this is BS
is that at work we have a QA department that delights in trying to break
our web applications (which is, after all, their job) and one of the
ways they try to do that is by overwhelming the browser with huge
downloads. The HTTPS test servers are set-up to server gziped content
when they think they can and IE 6 is certainly is the test set of
browser used so not having seen any evidence of this being a problem
suggest that it is not (or the problematic files size is so very large
that there is no real issue).
This leads to the use of the user agent string.
<snip>

But you only have to see that other browsers send default UA headers
that are indistinguishable from that of IE 6 to know that would be a
poor approach. If you had to make an assumption based on a request
header I would probably pick on IE's unusual Accept header. How many
other browsers would be willing to accept the set of Microsoft specific
formats that IE says it would prefer (say Word and Access documents)?
And even if some other browser said it could handle that content would
those types come out with the same relative q values as in IE 6's Accept
header? That doesn't entirely solve the problem because IE's Accept
headers can be modified, but it is better than looking at something that
is known to be deliberately spoofed by other browsers.

Richard.
 
R

Richard Cornford

VK said:
In the context of the discussion I dare to question what
principal difference one sees between altering over say
(Gecko) about:config User-Agent string and altering
network.http.accept-encoding string? If one doesn't have
a trust to one chunk of info send by agent, why so
much trust to another chunk sent in the same request? ;-)

Who is proposing not trussing the User Agent header? The HTTP
specification defines it as an arbitrary sequence of characters that
does not even need to be consistent over time, and so as being something
that should not be treated as a source of information. And the proposal
being made here is to trust it to be precisely what it is defined as
being; not a source of information.
With a reliable stats proving the point of view, please ;-)

"Stats" are not capable of "proving" anything.

Richard.
 
M

Michael Wojcik

The premises of your argument are false. The problem with user-agent
feature detection has never been that the user-agent string is
"untrustworthy"; there is no trust relationship between the user agent
and the server, so that attribute does not apply. User-agent feature
detection is a broken mechanism because it makes incorrect inferences,
especially false negatives that restrict UAs from receiving content
they're perfectly capable of handling.
Correction: not users, but some browser producers.

A bogus dichotomy. The user agent is an agent of the user.

User-agent values often are set by the user; it doesn't matter what
tool enables them to do so.
The only other
cases I am aware of are experienced programming involved users
removing some data from the distributer section of User-Agent string
in IE.

It is remotely possible that your experience does not cover the entire
set of applicable cases.
And do we have a site where User-Agent spoofing would help? Like with
this string welcome, with this - go away? I mean being reasonable so
on a browser no more than 6-7 years old?

GIYF. A trivial search turned up complaints about [1], for example.
Look at the Javascript used by that page, particularly the computation
of the variables is_ie and is_nav, and how they're used in functions
like displayAll().
The Browser Wars was the fight of two: nobody cared of some 3rd or
4th.

Except the people who did, of course. And the people who cared about
standards.
It is not fair to blame on developers that they didn't account
some possible neutral 3rd parties that will come someday from
somewhere.

Oh yes it is. That's the whole point of standards and
interoperability, and those have always been explicit goals for the
web, just like most other Internet applications.

[1] http://www.trader.ca/search/default.asp?category=1&categoryid=1&CAT=1
 
V

VK

IE 6 absolutely can accept gzip encoding, else that would have been
spotted long ago and be very well known by now.


Are we in the realm of rumour and folk-law or are there demonstrable
facts behind this assertion? Such as the precise size of the (compressed
or uncompressed) files that are supposed to be a problem, a Microsoft KB
article about it, a test case created by someone who's analytical skills
run to real cause and effect identification?

Beyond my normal cynicism, one of the reasons that I suspect this is BS
is that at work we have a QA department that delights in trying to break
our web applications (which is, after all, their job) and one of the
ways they try to do that is by overwhelming the browser with huge
downloads. The HTTPS test servers are set-up to server gziped content
when they think they can and IE 6 is certainly is the test set of
browser used so not having seen any evidence of this being a problem
suggest that it is not (or the problematic files size is so very large
that there is no real issue).

As a devil advocate I would suggest to your QA department to test IE6
SP1 w/o Q837251 patch installed ;-)
That is in reference to http://support.microsoft.com/kb/837251
But if they come back victorious you may point out that users not
updating their IE or Windows for a year and half do deserve every bit
of troubles they are getting as the result.
 
V

VK

Who is proposing not trussing the User Agent header? The HTTP
specification defines it as an arbitrary sequence of characters that
does not even need to be consistent over time, and so as being something
that should not be treated as a source of information.

Are you positive about it?

Hypertext Transfer Protocol - HTTP/1.1

....
14.43 User-Agent
The User-Agent request-header field contains information about the
user agent originating the request. This is for statistical purposes,
the tracing of protocol violations, and automated recognition of user
agents for the sake of tailoring responses to avoid particular user
agent limitations. User agents SHOULD include this field with
requests. The field can contain multiple product tokens (section 3.8)
and comments identifying the agent and any subproducts which form a
significant part of the user agent. By convention, the product tokens
are listed in order of their significance for identifying the
application.
....
 
T

Thomas 'PointedEars' Lahn

Jorge said:
Nope, it's :

Engañar al servidor para recibir un fichero .gz con el que no puedes
hacer nada ?
Por que ibas a querer hacer eso ?

My Spanish is not good enough, so I have to use an online translator.
Trying Google Translate, this leads to:

| Mislead the server to receive a file. Gz with which you can not
| do anything?
| Why Ibas to want to do that?

Which IMHO would beg the question if you confused "receive" and "send".
Unless, of course, the translation is incorrect. However, since English and
Spanish are both Indo-European languages, I would assume the common root of
"recibir" and "receive" to be of meaning.


PointedEars
 
T

Thomas 'PointedEars' Lahn

VK said:
Are you positive about it?
Very.

Hypertext Transfer Protocol - HTTP/1.1

...
14.43 User-Agent
The User-Agent request-header field contains information about the
user agent originating the request. This is for statistical purposes,
the tracing of protocol violations, and automated recognition of user
agents for the sake of tailoring responses to avoid particular user
agent limitations. User agents SHOULD include this field with
^^^^^^
_not_ MUST
requests. The field can contain multiple product tokens (section 3.8)
^^^
_not_ MUST
and comments identifying the agent and any subproducts which form a
significant part of the user agent. By convention, the product tokens ^^^^^^^^^^
are listed in order of their significance for identifying the
application.
...

And then the HTTP/1.1 grammar states:

| User-Agent = "User-Agent" ":" 1*( product | comment )
| [...]
| product = token ["/" product-version]
| product-version = token
| [...]
| token = 1*<any CHAR except CTLs or separators>
| [...]
| comment = "(" *( ctext | quoted-pair | comment ) ")"
| ctext = <any TEXT excluding "(" and ")">
| TEXT = <any OCTET except CTLs,
but including LWS>

See also http://en.wikipedia.org/wiki/User_agent#User_agent_spoofing


PointedEars
 
J

Jorge

My Spanish is not good enough, so I have to use an online translator.
Trying Google Translate, this leads to:

| Mislead the server to receive a file. Gz with which you can not
| do anything?
| Why Ibas to want to do that?

Which IMHO would beg the question if you confused "receive" and "send".
Unless, of course, the translation is incorrect.  However, since Englishand
Spanish are both Indo-European languages, I would assume the common root of
"recibir" and "receive" to be of meaning.

Yes, recibir and receive mean the same thing.

You fool the server:
you send the fake Accept-Encoding: gzip request header,
you receive the answer gzipped,
yet you don't know how to deal with gzips,
you are the browser.

HTH, Thomas.

Regards,
--Jorge.
 
T

Thomas 'PointedEars' Lahn

Jorge said:
Yes, recibir and receive mean the same thing.

You fool the server:
you send the fake Accept-Encoding: gzip request header,
you receive the answer gzipped,
yet you don't know how to deal with gzips,
you are the browser.

HTH, Thomas.

Thanks, now it does. I overlooked "in order (to)" which changes the
meaning; my bad.


PointedEars
 
J

Joost Diepenmaat

Richard Cornford said:
IE 6 absolutely can accept gzip encoding, else that would have been
spotted long ago and be very well known by now.


Are we in the realm of rumour and folk-law or are there demonstrable
facts behind this assertion? Such as the precise size of the
(compressed or uncompressed) files that are supposed to be a problem,
a Microsoft KB article about it, a test case created by someone who's
analytical skills run to real cause and effect identification?

Now this is hear-say, since I haven't dealt with the problem myself (a
coworker of me did), but there appears to be some issue with some
versions of IE6 combined with some (microsoft, IIRC) HTTP proxy server
that does indeed send out accept-encoding headers for gzip while
messing up the download (possibly the proxy doesn't send on the
encoding headers, or maybe it adds accept-encoding headers when
they're not reliable; i'm not sure). AFAIK IE6 by itself works fine,
though, and win XP SP2 or installing IE7 also seems to fix the issue,
even with a proxy server.
 
V

VK

The HTTP
_not_ MUST

nor Accept-Encoding

But OK, I see here other people came to inform how they were doing
careful feature testing - and I assume graceful fallback for Mosaic
and Opera 2.x - back in 1998 and before. A few more posts of the kind
and all bored rwar combatants will be here. I'm leaving for now, guys.
You won, you cool - I'm going back to make my money, you're going back
to feature detection, everyone is happy, right?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,772
Messages
2,569,591
Members
45,100
Latest member
MelodeeFaj
Top