How to maintain privacy when using the web
There are a couple of pieces of information that can potentially be used
to identify a user to any web service. These include cookies a web site
sets, your computer's IP address, and the user agent string your web
browser sends when it visits a web site.
A cookie is basically a small amount of data that a web site stores on
your computer. If a web browser has cookies turned on, whenever you
visit a web site, it will check to see if you've visited the site before
and it set a cookie. If it has a cookie that hasn't expired, it sends
that cookie along with the request for a page. There are three typical
uses for a cookie: storing preference information, storing a unique
identifier per user, and maintaining session information. Most modern
web browsers allow you to configure settings for cookies on a per-site
basis, so if one site requires cookies to work, but you don't want to
use cookies in general, it's possible to do this.
Every time you visit a web site that tries to associate a unique ID with
you, it will set a cookie in your browser. If you have cookies turned
off, what essentially happens is that every time you visit the site,
your browser says, "I've never been here before," and the site sends you
a new id.
Unfortunately, if the site also puts your preferences in your cookie,
you can't save any preferences with cookies turned off. If the site
relies on cookies to maintain sessions, you won't be able to use it with
cookies turned off (E2 fits into this category.) So in order to do this
properly, you'll need to figure out how your browser's cookie management
works.
To maximize privacy, turn off cookies in your browser. Unfortunately,
E2 requires cookies to be turned on. So if you want to both
use E2 and secure your privacy, you'll need to accept cookies on some
sites and not on others.
It's very difficult to surf the web for a long time without running into
the ad which states "Your computer is broadcasting an IP address!" In
order for the computers you're visiting on the web to send data back to
you (like this web page), the computer on the other side needs to know
where to send it, and this is your IP address. For many people, IP
addresses aren't a big deal privacy-wise: every time you dial in to an
ISP over a modem, your IP address changes. If you use DSL or cable
modem, it's still possible that your IP address changes periodically.
There are two ways you can potentially gain some privacy here:
Use a dynamic IP address. If your IP keeps changing, your
identity can't be assigned to any one IP address. The ISP you're
logging into still has the capability to figure out which traffic is
coming from your account.
Use a proxy. A proxy is a server that's set up somewhere on the
net that acts as a middleman for any connections you want to make. Your
IP address doesn't get sent out, only the proxy's does. If you're using
a popular enough proxy, there's a lot of traffic coming out of it, and
odds are it'll be difficult or impossible to separate out your traffic.
Additionally, if the proxy uses encrypted connections (HTTPS), your ISP
won't be able to tell what you're doing, only that you're connecting to
a proxy somewhere. You're still in danger if the proxy logs what
connections you make, but since the point of many of these proxies is to
give people privacy, many have policies that they don't keep track of
where users are going.
Whenever your browser connects to a web site, it tells that server
what it is. Sometimes these descriptions can get pretty specific:
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.1)
Gecko/20021104 Chimera/0.6. Not only do browsers send what
version of the browser they're running, but they'll often send what
operating system you're running on, which language it was compiled for,
and various other details. Many proxies will strip these off for you,
so a proxy can be the answer here. Additionally, if you're using the
same user agent as many other people, this can't be used to positively
identify you.
Some browsers let you change the string that gets sent. If you're using mozilla, there's a toolbar at http://xulplanet.com/downloads/prefbar/ which allows you to change the user agent. Opera also has a preference that allows you to change the user agent string.
Other HTTP headers
Last-modified
This particular method is a bit more difficult to detect. It turns out that whenever a web browser hits a web page, it checks whether that page has been cached. It does this by looking at the Last-modified header. The web server can send an arbitrary date instead of the real one, thereby sending something which can identify that particular web browser. Since this header is required to be a date, the web server has much less flexibility in assigning unique identifiers than if using cookies, but most people will have no way of detecting that such a thing is happening.
As far as I know, the only people using this technique are people trying to demonstrate this as a potential problem (see, for example, http://zork.net/~mbp/meantime/). There are two possible ways to avoid this problem:
- Turn off the cache. This potentially uses less bandwidth, and since you wouldn't be able to backtrack without re-retrieving web pages, you'll make it easier for the server operator to construct your path through the web site.
- Use a proxy that strips off cache information. I don't know how common such proxies are.
Referrers
Referrers, (or referers as they're spelled in the HTTP specification) get sent to a webserver any time you hit a web site. These can be used to determine the path that you took through a web site. They're not really all that great of a signal: any time you click on the back button, your browser usually has the previous page cached, so the webserver can't track your every move, only when you see a new page.
It's possible to get some browsers to not send this information. Additionally, some proxies will strip this information out. If you're ultra-paranoid, you can type every URL in the location bar, since those URLs are assumed not to have referrers.
Thanks to Jetifi for reminding me about the ability to change UA strings.