how we use referer header to track users

deepak · Jan 15, 2007

Hi,
how can i track the sites that send traffic to my site and how can
i get the user ip address that com threw referel sites any body can
help me
regards
dinkar

Andrew Thompson · Jan 15, 2007

deepak wrote:
....

how can i track the sites that send traffic to my site and how can
i get the user ip address that com threw referel sites any body can
help me

Sure. If you have a question remotely connected to
*Java* - I am sure someone will be able to help.

Feel free to drop back by when you do.

Andrew T.

John Ersatznom · Jan 15, 2007

deepak wrote:
"how we use referer header to track users"

"We" don't. Referer is easy to spoof. And "users" don't like being
tracked without asking them.

What you want to do is probably either evil (such as denying access to
stuff based on source URL) or better done another way (such as with a
session cookie and a server side session such as a "shopping cart", to
persist session state). The one legitimate use I can think of is finding
out which links internal to your site are heavily used and who's
referring you traffic. You'll get semi-reliable statistics on the latter
just by collecting referer[sic] headers and dumping the ones with your
own domain name, then examining the rest, probably as a histogram by
domain name. What you'll probably find is that the top referrer is
Google.

For internal link-usage, your safest bet is to not rely on spoofable
headers at all, and instead to note sequential page accesses by a single
IP. If 121.102.133.234 accesses page A at 15:11, then page B at 15:12
and page C at 15:14 and A links to B and B links to C, you can guess
that those two links were used once each by a user. This kind of
chaining lets you build up a histogram of usage of your internal links
per month, which may be useful for guiding changes to the site's
navigation design. Anywhere people go to but don't follow internal links
from is somewhere they either jump offsite, find to be a dead end, or
consider to be an actual destination. The middle of the three, and maybe
the former, could indicate users are having trouble navigating your
site. Even this data, though unaffected by referrer spoofing, is going
to have wonkiness due to browser caches -- in the example above, the
user may actually have back-buttoned from B to A, then followed a direct
link (if one exists) from A to C. Referrers can be combined with
timestamps and URLs to try to figure out these cases, but ultimately,
your best bet in figuring out your navigation and any needed changes is
to explicitly user-test the site. See http://www.useit.com/alertbox/ and
browse around there for information on user testing sites.

Ultimately, the only real use of referrers not either better covered by
something else or better avoided altogether is to figure out what
external sites are referring you how much traffic. Even then, take the
results with a grain of salt because it's easy to spoof the header and
users will often do so, even without technical knowledge, by using
browser-privacy plugins or by using proxies that whitewash their
browsetrails.

=?ISO-8859-1?Q?Arne_Vajh=F8j?= · Jan 15, 2007

deepak said:
how can i track the sites that send traffic to my site and how can
i get the user ip address that com threw referel sites any body can
help me

request.getHeader("Referer")

is the way to get referrer i JSP and servlet.

Arne

=?ISO-8859-1?Q?Arne_Vajh=F8j?= · Jan 15, 2007

Andrew said:
Sure. If you have a question remotely connected to
*Java* - I am sure someone will be able to help.

Feel free to drop back by when you do.

It is possible to get it in JSP/Servlet and as per
discussion some weeks ago, then enterprise Java is
actually on topic here.

Arne

=?ISO-8859-1?Q?Arne_Vajh=F8j?= · Jan 15, 2007

John said:
deepak wrote:
"how we use referer header to track users"

"We" don't. Referer is easy to spoof. And "users" don't like being
tracked without asking them.

All web servers using extended logging log it in access log ...

Arne

Lew · Jan 15, 2007

John said:
"We" don't. Referer is easy to spoof. And "users" don't like being
tracked without asking them.

... proxies that whitewash their browsetrails.

IP addresses are scarcely more reliable for identifying the "real" client,
although they can be useful for the kind of site analysis John described. Even
for that they're not completely reliable; at some sites all external packets
reach the Web servers with the source IP address of the organization's own
security node. The servers not only never see an external IP, they never see a
different one.

- Lew

kingpin+nntp · Jan 15, 2007

Arne said:
All web servers using extended logging log it in access log ...

Yep, it's a default configuration usually as well. Apache HTTPd
server, the de-facto industry standard, comes to mind immediately.

It's strange that people are worried about web servers knowing their IP
addresses when they're still willing to trust the information provided
on the web pages therein. Where a response is needed from a remote
host, it's simply not possible (without screwing around with
man-in-the-middle routing games) to hide one's IP address.

Someone else pointed out that one can track requests from a given IP
address, but depending on the reason for tracking, that can prove
problematic when you're dealing with a proxy server that services
multiple users (e.g., some ISPs operate a proxy server to reduce
bandwidth consumption costs, some
corporations/institutions/organizations use a proxy server to control
internet access, etc.).

If you need to differentiate between users, the easiest method is to
require a login and set one cookie containing a session number. On the
server side, the client's IP address should be combined with that
session number as an extra measure of verification.

In short, never trust the information supplied by the user, and require
them to prove themselves with every subsequent request.

John Ersatznom · Jan 16, 2007

[email protected] said:
If you need to differentiate between users, the easiest method is to
require a login and set one cookie containing a session number. On the
server side, the client's IP address should be combined with that
session number as an extra measure of verification.

Requiring a login (for anything not internal/private) is evil too. If
it's supposed to be public, just set a cookie if one isn't already set
and the user leaves a nice little trail across your site, proxies or no.
Unless of course they've disabled cookies. Usually though you'll get
either no data or accurate data, rather than misleading data. (If you
let users contribute content, of course, you also want a captcha to
confound automated spamming bots. You *especially* don't want a login,
as that will deter people from bothering to contribute, since who can be
bothered to make up and memorize yet another username and password on
top of the six zillion they already have forgotten these days? If you
actually want lots of user participation, requiring logins is a great
way to sabotage those goals, and you'll need a captcha on the
registration form to stop automated spamming anyway, so just put the
captcha on the submission form instead. You can even go the
half-and-half route, providing *optional* registration. Registration has
the benefit to hardcore users that they get to do the captcha only once
but the downside that they have to memorize another login and they don't
know what you will do with the registration data with any certainty.
That may be worth it for really heavy users where a stable password they
regularly use (and so don't forget) is cheaper in their time than even
more frequent captcha solving with a different string to enter each time.)

kingpin+nntp · Jan 16, 2007

John said:
Requiring a login (for anything not internal/private) is evil too. If
it's supposed to be public, just set a cookie if one isn't already set
and the user leaves a nice little trail across your site, proxies or no.
Unless of course they've disabled cookies. Usually though you'll get
either no data or accurate data, rather than misleading data. (If you
let users contribute content, of course, you also want a captcha to
confound automated spamming bots. You *especially* don't want a login,
as that will deter people from bothering to contribute, since who can be
bothered to make up and memorize yet another username and password on
top of the six zillion they already have forgotten these days? If you
actually want lots of user participation, requiring logins is a great
way to sabotage those goals, and you'll need a captcha on the
registration form to stop automated spamming anyway, so just put the
captcha on the submission form instead. You can even go the
half-and-half route, providing *optional* registration. Registration has
the benefit to hardcore users that they get to do the captcha only once
but the downside that they have to memorize another login and they don't
know what you will do with the registration data with any certainty.
That may be worth it for really heavy users where a stable password they
regularly use (and so don't forget) is cheaper in their time than even
more frequent captcha solving with a different string to enter each time.)

This depends on the application. If contributions from as many users
as possible is the goal, then the optional registration method makes
sense.

Your point about users not knowing what's going to happen to their
information is a valid one because the vast majority of sites either
link to a long and complicated privacy policy that may have been
written in a way that only lawyers can understand, or the privacy
policy is non-existent.

A privacy policy that's short and easy for the majority to understand
(e.g., it fits on the registration page in one short paragraph) can go
a long way. Of course, there will be people who won't trust the
privacy policy to be followed, or will wonder what happens when it gets
changed without them being notified -- these are more difficult
problems I'd be happy to get into (I have a solution for the latter),
but only if someone's interested.

deepak · Jan 17, 2007

forward me if u have solution.

[email protected] said:
This depends on the application. If contributions from as many users
as possible is the goal, then the optional registration method makes
sense.

Your point about users not knowing what's going to happen to their
information is a valid one because the vast majority of sites either
link to a long and complicated privacy policy that may have been
written in a way that only lawyers can understand, or the privacy
policy is non-existent.

A privacy policy that's short and easy for the majority to understand
(e.g., it fits on the registration page in one short paragraph) can go
a long way. Of course, there will be people who won't trust the
privacy policy to be followed, or will wonder what happens when it gets
changed without them being notified -- these are more difficult
problems I'd be happy to get into (I have a solution for the latter),
but only if someone's interested.

JavaScript: how to keep track of the circle in canvas on specific path?	0	Mar 20, 2023
URL-Rewriting, referer and https	3	Jul 7, 2007
Sticky Header - How Do I Make It Hide/Show on Scroll?	0	Sep 27, 2021
Use of logging module to track TODOs	0	Nov 27, 2013
How to use PDF-lib and how to center each line of texts on the page?	1	Aug 16, 2023
HCaptcha - How to stop page from refreshing on submit if captcha is not checked/validated	1	Aug 29, 2023
How to add a subject value in a php contact form	0	Aug 1, 2022
Horizontal menu bar header	2	May 12, 2021

how we use referer header to track users

deepak

Andrew Thompson

John Ersatznom

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Lew

kingpin+nntp

John Ersatznom

kingpin+nntp

deepak

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads