This article explores how much anonymity really exists online, and how anonymity is reduced by everyday technologies used in Internet communication.
Many people expect their actions online to be far removed from their physical identity which often leads them to behave in ways they would never dare when their name were connected to it. But how well founded is this belief in online anonymity?
Sadly, there is no such thing online anonymity per se. Without special technical measures anonymity on the Internet should be deemed non-existing. Every Internet user leaves a long trail of data behind, much of which can be directly and cheaply connected with his identity. It is necessary to understand the technologies involved to get a clear and true picture of the state of online anonymity:
Every communication on the Internet – such as surfing to a website or making a VoIP call – involves data to be reformatted into smaller packets that are then delivered over a vastly complex network of routers – computers that pass the packet on from computer to computer until it reaches the final destination. Here’s an example path for an information packet that travels through the Internet, each line referring to one computer that passed the data on to the next one:
The various routes data can take on the Internet is determined by both the sender and the recipient of the data, as well as any system in between that is responsible for passing on the data – and the routes will constantly change. For this to work, every packet of data must come with a sender- and a recipient-address that uniquely identifies the computers that are talking to each other. This address is called Internet Protocol Address – or just IP-Address – and is simply a number (18.104.22.168 is an example of such an address from the path shown above).
While these numbers look innocent, they are directly related to the computer of the Internet user. When accessing the Internet, the ISP (Internet Service Provider) of the user will assign a unique IP-Address to the user’s computer and store this information in a database from which it can be retrieved with a subpoena, or even resold to data traders and marketers.
But even third parties have information about IP-Addresses that they have gathered in various ways, making it possible to often pinpoint an IP-Address to a single street address by using only publicly available data (click on this link to find out what everybody can know about you right now, just based on your IP-Address Maxmind.com GeoIP).
Each of the computers involved – be it sender, recipient or any of the routers in between – sees the IP-Addresses of the parties communicating with each other, and even what data is transferred.
Be aware that dynamic IP-Address assignment, as it is offered by many ISPs, does not change the anonymity impact of IP-Addresses at all. At best, more data needs to be stored and analyzed to achieve an attribution of Internet communication to a user.
Cookies, ETag, etc.
By now, every Internet user should have heard about “cookies”. These are little pieces of data that a website can place on the visitor’s computer, and that will be sent back to the website when the user visits it again. They allow the website to connect multiple visits together as coming from a single computer. This sounds innocent enough, however:
But even with cookies disabled in the user’s browser, there exist numerous similar techniques that are not as easy to block. This includes, but is not limited to, ETags (a method to optimize loading speed), flash cookies (Local Shared Objects used by Adobe Flash) and HTML5 (allows the storage of data in the browser in multiple ways). Combined, these are used to create “Zombie Cookies” which are exceedingly hard to remove from a computer.
All anonymity ends when users fill out forms on the web. Be it the signup form for a website, the order form of a shop or even the search terms put into a search engine. Most users use true or close-to-true data when being asked for it.
From there on, the data can be associated with the IP-Address used and cookies stored on the computer. Depending on the policies of the websites in question, the data can then be shared with other websites and associated with even more cookies and IP-Addresses, forming comprehensive profiles of browsing habits and identity attributes that are almost impossible to remove from the long-living databases of data traders and marketers.
Frequently form data later becomes the subject of subpoenas, with authorities compiling in-depths reports on searches made on the Internet and websites used.
Especially search terms easily find their way through the Internet, and the reason for that is the “Referer”. Every time a user clicks on a link on a webpage, the newly opened webpage is sent the address of the previous one. From this a website learns the search terms used when the user clicks on a search result.
In combination with cookies and IP-Addresses, this informs social networks about the majority of content their members consume, even when none of the websurfing involved any of their own websites directly.
Some websites will generate new page addresses (URLs) for every new visitor. When these pages are then bookmarked or shared with others, website operaters are enabled to both recognize repeat visitors as well as gaining insight into who shared their pages with whom – giving them easy access to the user’s social relations.
History & Cache
When loading webpages the user’s browser will first test if it has a locally stored (cached) version from a previous visit, so that the websurfing experience can be sped up. Due to this behavior websites can connect visits together and recognize repeat visitors.
Another technique to track user’s on the Internet is by finding out about their browser’s fingerprint. Due to minute differences between installations, many browsers express a unique behavior which can be tested by websites. This makes it possible to identify repeat visitors and to even track them without relying on IP-Addresses or storing identifying data in their browsers (like cookies).
To see how unique your browser is, check out this page by the Electronic Frontier Foundation: Panopticlick
Tracking methods are not limited to websurfing. As an example for other technologies that have anonymity implications email shall be quickly examined.
Unbeknownst to most users, emails carry information that consist not just of the email addresses of the parties, but also IP-Addresses of the sending user.
Received: from [22.214.171.124] by freemail.com via HTTP; Fri, 11 Nov 2011 11:11:11 PST
The above shows one of the “headers” included in an email that. The “Received” headers include the full path an email traveled from mail-server to mail-server, usually including the original IP-Address of the sender’s computer (126.96.36.199 in this case). In addition to this, other headers exist that uniquely identify the message, the mail program used, or the conversation the mail refers to. All of this information is visible to any router on the path of the email.
This is especially interesting to operators of free webmail services that attract a lot of users. The correspondence of their users allows the operators to temporarily attribute email addresses (and often names) to the IP-Addresses that were used in sending, creating a very precise database of user information that does not have to rely on the cooperation of other parties.
Putting it all together
This article could only present a shallow overview of the many methods and technologies that compromise user privacy and anonymity on the Internet. When combined with each other and utilized by specialized parties, they comprise powerful means to not only reduce anonymity on the Internet to nothing, but also to spread information about users between networks of actors.
The depths of this threat materializes when multiple of these technologies are combined and the generated data is mined. Just using cookies, referers, IP-Addresses and mail headers, most users can be identified during most past and future connections to the Internet, essentially reducing the IP-Address to a unique identifier that is directly associated with the user’s name – without having to resort to subpoenas or data held only by the user’s ISP. Numerous parties with that kind of information exist, most of them unbeknown to the public.
There is very little reporting about the methods used by data traders and marketeers, or how they compile vast databases of user information and make it available to paying customers. This is only natural, because methods to protect privacy and anonymity online exist as we will explore in Part V of this series. Too much attention on the factual non-existence of anonymity online would only result in users choosing to take the protection of their privacy back into their own hands, instead of misguidedly trusting it to the Internet itself.
Only one conclusion can remain: The Internet provides no anonymity whatsoever, unless defensive technologies are employed by individual users.
Things to come…
In the first part of this series the theoretical aspects of what anonymity is are explored.
Here we apply the theory of anonymity to offline interaction.
Some lessons have been learned that can help to improve anonymity in general, both online and offline.