Andrew Leahey

Privacy Policies for the Modern Web

Traditionally privacy policies have answered three general questions: (1) what user data is collected, (2) how that data is collected and (3) how that data is stored or used; generally, how that data is treated.  A privacy policy is a simple enough thing to draft when the entity directing the drafting is the sole party acting to collect and store data. In the case of the modern web, however, that is never the case. For the most part even the most basic of websites will utilize some content delivery network (CDN) to serve images and other larger files and embed code to track usage data. CDNs and analytics companies will, then, have access to non-personally identifiable information: internet protocol (IP) addresses, browser, operating system information, and display information, and in the latter case, referrer universal resource locators (URLs). More and more ecommerce sites are taking advantage of third party payment processors, rather than taking payments on-site and taking on the burden of security maintenance and all of the accompanying risk. In short, as more and more services are outsourced, the number of entities gaining access to some form of user information is directly correlated to the complexity of the website. Indeed, with the advent of third-party authentication services such as Facebook Authentication, Google OAuth and LinkedIn OAuth, user data is only becoming more removed from the control of the entity owning and operating even a relatively simple website.[1] This raises questions as to how one can best disclose the answers to the above three questions. 

What User Data is Collected

There are known-knowns, known-unknowns, and unknown-unknowns[2]. The data that is collected by the website owner is a known-known – you either do or do not offer a "Contact Us" form that collects and stores a user's name, email address and their comment. Without further investigation, the information obtained from a user choosing to use Facebook Authentication is a known-unknown – you are aware that user registration and login information is collected by a third party, but you don't know how that information is being stored. Unknown-unknown information is all of that information[3], potentially collected by a third party, that it is difficult or impossible to ascertain the storage and use of. For example, an analytics company that offers free user analytics services may be using the aggregated information in order to improve their other offerings. Their own privacy policies may only obliquely reference what aggregated information is used for and, as such, a privacy policy cannot in good conscience be drafted in a way that makes representations or warranties as to what information is collected by that third party.

How Data is Collected

Methods of collecting data is another area by which the complexity of a website significantly impacts the scope of the privacy concerns. The obvious first level collection schemes are all of those user-driven and user-chosen methods: contact forms, user registration, payment processing, email listserv subscriptions, etc. Second level collection schemes are those that can be ascertained by examining the website and its source – analytics and tracking scripts, calls to offsite-hosted images, cookies, etc. Third level collection schemes, which are the most difficult to know and thus disclose in a policy, are all those methods that are not either user driven or evident in the source of a website. For instance, an examination of the source of a website may indicate that a call is being made to a third party for a tracking script, but the source cannot and will not give guidance as to whether that third party is using data analytics on their own server logs to form a more complete picture of a user, or tracking the user across multiple websites.[4] How information is collected is more difficult to disclose, then, as the owner of the website can realistically only indicate how they think information is collected, what information is collected that they are privy to, and the identities of some of the entities that they have chosen to collect information.   

How Data is Stored or Used  

So-called "Right to be Forgotten" European Union laws aside, if there is one immutable fact about the internet, it is that anything that is on it, remains on it. In the early days of the web, a privacy policy could give users an accurate picture of how long information that is collected about them will remain – a quick call down to IT would be all that was required. In the modern web, a visit to a website is more akin to tossing a stone in to a pond. A rudimentary privacy policy can describe the splash, a well-drafted policy can predict some of the ripples, but no policy will be able to describe the effect on the shoreline. To abandon the metaphor, a privacy policy can be drafted to outline how long the website owner intends to hold on to user information and how long third-party services tied to the website claim to retain user information, but it can never tell a user with an accuracy how long it will be before their visit or use of the site is "forgotten." 


1. To the extent that you can, disclose.

The above may serve to discourage an individual from bothering to draft a policy at all, but this is not the intent. Information privacy was hardly in the public discourse twenty years ago. The first discussions of privacy mostly centered around personal healthy information, later turning to contact information, with the advent of do not call registries, and financial information, as identity theft has become more common. Moving forward, the trend line would appear to point squarely in the direction of privacy becoming more important and more relevant -- the European Union and the State of California have already adopted laws mandating as much.

As such, not surprisingly, the solution to privacy policies lagging behind the increasing complexity of the web is … increased disclosure. To the extent that you know what information is collected, stored and used, disclose it. To the extent that you know what information MIGHT be collected, store and used, disclose it. Where you aren't certain of anything, disclose the entities that you have contracted with to provide services that might be collecting information and link to their privacy policies. Don't work with entities that do not provide privacy policies that at least give some modicum of explanation as to how information is collected, stored and used. 

2. Offer an up to date list.

In your privacy policy, elect a Data Controller and allow users to reach out to them to obtain an updated list of all the companies (third party service providers, mail carriers, hosting services, IT companies, communications companies, analytics companies, advertisers, etc.) that may be processing user data. Maintain the list and, where possible, post it and link to it in your privacy policy.

3. Update your policy frequently.

The privacy policy of a website cannot be thought of as a set-it and forget-it static page -- just as the rest of your website is evolving, so too must your policy. At regular intervals have your policy reviewed and updated to reflect features added or removed from your site, third parties contracted with or released, and changes to third party's policies.

4. Demand more.

Think of yourself as a steward of your user's information. When you choose a new product, contract with a new party, or implement a new feature, call on the entity you are working with to provide you with answers to the aforementioned three questions: what information is collected, how that information is collected, and how that information is used. It is more than just a nice thing to do for your users, it helps ensure that, as privacy becomes more and more of a hot button issue, you remain ahead of the curve and needn't fear having to fix a problem that could have been prevented through good management.


[1] Indeed, to want to make use of a third-party authentication service the website owner need only wish to save user preferences with slightly more granularity and permanence than a traditional "cookie" can provide. The availability of free authentication services through Facebook, Google and LinkedIn, among others, creates a situation where the entities that are seeking the most cost effective and simple method to permit users to have an account on their website are the ones that need to draft the more sophisticated privacy policy.


[3] Generally speaking this will be non-personally identifiable information – IP addresses, regional location data and the like.

[4] The concern being, at some point, a certain order of magnitude of non-personally identifiable information is personally identifiable information – enough individual data points and the number of individuals that fit all of those points eventually drops to one.