What is PII?

pii

Share This Post

With data privacy becoming an increasingly hot topic as major data breaches make headlines around the globe, the biggest question typically is: “What PII was exposed?” But what is PII? PII stands for “Personally Identifiable Information.” It generally refers to any information that can be used to identify a particular person such as names, credit card number, email, SIN, etc. The definition of PII varies depending on the jurisdiction, the agency, and the context. In the following article we cover the origins of the term PII, whether and how it is used and defined in US federal laws as well as selected state laws, in Europe’s GDPR, and Canada’s federal and provincial private sector privacy laws.

Examples of PII

Examples of Personally Identifiable Information listed by the Department of Homeland Security (DHS) include: name, date of birth, mailing address, telephone number, Social Security number (SSN), email address, zip code, account numbers, certificate/license numbers, vehicle identifiers including license plates, uniform resource locators (URLs), static Internet protocol addresses, biometric identifiers (e.g., fingerprints), photographic facial images, or any other unique identifying number or characteristic, and any information where it is reasonably foreseeable that the information will be linked with other information to identify the individual.” PII can include both direct identifiers and indirect (or quasi-) identifiers. Learn more about direct and indirect identifiers.

Origins

The origins of the term PII are difficult to trace. It may have originated in the US government’s use of the term “sensitive but unclassified.” PII is not consistently used, even within the US. Several states simply use ‘personal information’ or ‘personal data.’

One of the first definitions of PII can be found in the Office of Management and Budget (OMB) Memorandum M-06-19 (Reporting Incidents Involving Personally Identifiable Information  and Incorporating the Cost for Security in Agency Information Technology Investments): “For purposes of this policy, the term Personally Identifiable Information means any information about an individual maintained by an agency, including, but not limited to, education, financial transactions, medical history, and criminal or employment history and information which can be used to distinguish or trace an individual’s identity, such as their name, social security number, date and place of birth, mother’s maiden name, biometric records, etc., including any other personal information which is linked or linkable to an individual.”

A shortened version of this definition was provided by the OMB in Memorandum M-07-16 (Safeguarding Against and Responding to the Breach of Personally Identifiable Information): The term “personally identifiable information” refers to information which can be used to distinguish or trace an individual’s identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother’s maiden name, etc.”

This definition is used, for example, on the website of the US Office of Privacy and Open Government. A very similar definition can be found on the website of the IAPP as well.

DHS in its Privacy Incident Handling Guide from 2012 defined PII in section 1.4.9 as “Any information that permits the identity of an individual to be directly or indirectly inferred, including any other information which is linked or linkable to that individual regardless of whether the individual is a United States citizen, legal permanent resident, or a visitor to the U.S. PII includes any item, collection, or grouping of information about an individual that is maintained by an agency, including, but not limited to, education, financial transactions, medical history, and criminal or employment history.”

We will see in the following sections, perhaps surprisingly, that the term PII is not used in any of the data privacy legislation we compare. However, whatever terminology is used, the general aim of all laws considered is to protect information that is likely to identify an individual. Yet, notable differences remain.

United States Laws – Federal 

US Privacy Act 1974
§552a.(a)(4)

E-Government Act of 2002
§ 208(d)

(Proposed) American Data Privacy and Protection Act

‘record’

information in ‘identifiable form’

‘covered data’

any item, collection, or grouping of information about an individual that is maintained by an agency, including, but not limited to, his education, financial transactions, medical history, and criminal or employment history and that contains his name, or the identifying number, symbol, or other identifying particular assigned to the individual, such as a finger or voice print or a photograph

any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means.

information that identifies or is linked or reasonably linkable, alone or in combination with other information, to an individual or a device that identifies or is linked or reasonably linkable to an individual, and may include derived data and unique persistent identifiers.

  

Excluded:

(i) de-identified data;

(ii) employee data;

(iii) publicly available information; or

(iv) inferences made exclusively from multiple independent sources of publicly available information that do not reveal sensitive covered data with respect to an individual. 

 

United States – State Specific Definitions

There is currently no comprehensive federal law in place that uniformly applies to data collected by private organizations in the US. There are instead several federal laws that apply to specific kinds of personal information, for example in the healthcare sector.

As the federal government, several states have also started to enact comprehensive privacy laws that would apply regardless of the industry. The following is a selection of enacted or proposed laws showcasing the different approaches taken by the states in this regard.

California

New York (Proposed)

Arkansas

California Consumer Privacy Act of 2018 (CCPA) section 1789.140 as amended by CPRA

New York privacy act § 1100(16)

Arkansas’s Personal Information Protection Act § 4-110-103(7)

‘personal information’

‘personal data’

‘personal information’

information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.

any data that identifies or could reasonably be linked, directly or indirectly, with a specific natural person, household, or device. 

an individual’s first name or first initial and his or her last name in combination with any one (1) or more of the following data elements when either the name or the data element is not encrypted or redacted:

Included (examples from the definition):

  • – Identifiers such as a real name, alias, postal address, unique personal identifier, online identifier, Internet Protocol address, email address, account name, social security number, driver’s license number, passport number.
  • – Commercial information, including records of personal property, products or services purchased, obtained, or considered, or other purchasing or consuming histories or tendencies.
  • – Internet or other electronic network activity information, including, but not limited to, browsing history, search history, and information regarding a consumer’s interaction with an internet website application, or advertisement.
  • – Biometric information
  • – Geolocation data
 

(A) Social Security number;

(B) Driver’s license number or Arkansas identification card number;

(C) Account number, credit card number, or debit card number in combination with any required security code, access code, or password that would permit access to an individual’s financial account;

(D) Medical information; and

(E)(i) Biometric data.

Excluded:

  • – publicly available information or lawfully obtained, truthful information that is a matter of public concern
  • – consumer information that is deidentified or aggregate consumer information

Excluded:

Deidentified data

Excluded:

Encrypted or redacted data

 

Europe

The General Data Protection Regulation (GDPR) of the European Union does not use the term PII, but rather ‘personal data.’ Personal data is defined in Article 4(1) as 

“any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.” 

This definition is substantively similar to many of the US definitions of personal information. Its examples capture well the breadth of information that can now be used to build a profile of and to identify individuals even in the absence of direct identifiers.

Canada

Canadian legislation does not use the US terminology of PII. The federal data protection law, Personal Information Protection and Electronic Documents Act (PIPEDA) defines ‘personal information’ as “information about an identifiable individual.” This definition remains unaltered in the proposed Consumer Privacy Protection Act (CPPA). 

There are three provincial privacy laws that apply in the private sector instead of PIPEDA, namely the Alberta, British Columbia, and Québec private sector privacy laws. All three acts use the term ‘personal information.’

Alberta

British Columbia

Québec

Personal Information Protection Act

Personal Information Protection Act

Act respecting the protection of personal information in the private sector

information about an identifiable individual

information about an identifiable individual and includes employee personal information but does not include:

  1. contact information, or
  2. work product information

any information which relates to a natural person and allows that person to be identified either directly or indirectly

 

One difficulty that arises when interpreting the Canadian definitions of personal information is the little word “about” (or “relates to” in Québec). The Supreme Court of Canada (SCC), for example, has interpreted the term “about” expansively, justifying many categories of information to be considered personal, whether they are sensitive, private, or well-known. Note, however, that the SCC made this interpretation in the context of the federal Privacy Act, which is important because the protection of information held by the government may be regarded in a different light than information held by private organizations. Yet, in the same context the Federal Court of Canada interpreted the definition more narrowly, capturing only information that connotes concepts of intimacy, identity, dignity, and integrity of the individual. The Alberta Privacy Commissioner regards the term as a “highly significant restrictive modifier” and quite different from “relates to.” For example, if an individual makes a complaint or a suggestion and information is gathered or created as a result, this information is not necessarily “about” that individual, although it is in some way connected to the individual. 

An example of a surprisingly narrow interpretation of personal information is the Court of Appeal of Alberta’s position that a licence plate number as well as a street address is not personal information because it is not about an individual. The court said: 

“The fact that the vehicle is owned by somebody does not make the licence plate number information about that individual. It is “about” the vehicle. The same reasoning would apply to vehicle information (serial or VIN) numbers of vehicles. Likewise a street address identifies a property, not a person, even though someone may well live in the property. The licence plate number may well be connected to a database that contains other personal information, but that is not determinative. The appellant had no access to that database, and did not insist that the customer provide access to it.”

This position seems to be an outlier. We have seen above that most jurisdictions require only indirect identifiers and a reasonable possibility of identification, even if only in combination with other information that is accessible, to meet the definition of personal information.

Canada’s Privacy Commissioner, too, takes a broader view, giving the following examples of personal information in the context of PIPEDA:

  • – age, name, ID numbers, income, ethnic origin, or blood type;
  • – opinions, evaluations, comments, social status, or disciplinary actions; and
  • – employee files, credit records, loan records, medical records, existence of a dispute between a consumer and a merchant, intentions (for example, to acquire goods or services, or change jobs).

Considering the fact that individuals retain their license plate number when selling their cars and are supposed to surrender it when taking residence in a new province, it seems reasonable to consider the license plate number as a number that can be used to identify the individual.  

Summary

As we can see, the different approaches taken in the jurisdictions considered in this article lead to significantly different definitions of PII (or equivalent terms). Not all authorities even agree that personal information is, at a minimum, likely able to identify (alone or in combination with other data) an individual person. It is therefore very important to establish the context in which the term PII (or equivalent) is being used and to consult the actual text of the legislation and authoritative guidance. 

Most commonly, though, it seems that PII refers to information that can likely be used to identify an individual, if only in connection with other available information. Given that today’s world is empowered with mighty AI, identification that would have been deemed impossible just a little while ago may well be ‘likely’ today, and the fact that ‘other available information’ is growing at unprecedented rates, more and more information will have to be considered PII. Most of the additional information available is made up of indirect identifiers, to be sure, but indirect identifiability suffices for the most common definitions of the term. 

How Private AI Can Help With Compliance

Being in the know on what PII exists in your organization and where it lives will allow you to determine what is entailed in compliance with the applicable legislation or industry standard. 

Private AI can help you make that determination, even in unstructured data and across 47 languages. Using the latest advancements in Machine Learning, the time to identify and categorize your data can be minimized and compliance facilitated. Private AI can identify over 50 different entities of PII, PCI, and PHI. To see the tech in action, try our web demo, or request an API key to try it yourself on your own data.

 

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.