San Francisco COVID-19 Data Privacy Explainer
See answers to other commonly asked questions on our FAQ.
The City is committed to keeping residents safe, and their private health information secure.
Your name and health information is private. The City follows federal and state healthcare privacy laws and never shares protected health information with the public.
The City is also committed to transparency and keeping the public informed.
We share data on the COVID-19 Data Tracker so everyone has access to high-quality and current information. We strive to be one of the most transparent jurisdictions in the country, with live datasets available to the public that update daily. A dataset is a collection of similar information such as the number of people tested for COVID-19 on a particular day.
We work to balance these commitments by ensuring that none of the COVID data we publish put resident privacy at risk.
The City only shares information in ways that protect resident privacy. Before releasing any data, we complete a thorough analysis to consider the risks of releasing the data.
Here's what that looks like in practice.
One of the main risks we consider before sharing data is the possibility that someone could use an individual dataset or a combination of datasets to identify a specific resident. To prevent this, we ensure that:
1. The underlying population for any data is large enough that no one can be easily identified. Releasing data for the entire City is the best way to mitigate this risk. With over 880,000 residents, it is highly unlikely that citywide data could be used to identify any one individual. When sharing data on subgroups or smaller populations, the population of the subgroup must be 1,000 residents or more.
2. The count of cases or tests (or other data of interest) is high enough to protect privacy. For example, we report on cases by gender identity once a category has at least five cases. This threshold ensures that privacy isn't jeopardized with small numbers.
3. The data cannot be linked to other publicly accessible data in a way that identifies a case. We first assess how a dataset could be linked to other datasets (on the COVID Tracker or other sources). We then assess these linkages to ensure that no one could combine data to identify an individual. We weigh this assessment hand-in-hand with the small numbers described in #2.
Here are two examples:
- Case data by neighborhood: We performed an analysis to ensure that releasing COVID-19 case data by neighborhood over time does not risk resident privacy. We assessed whether there are sufficient residents in each neighborhood and how many other neighborhood datasets could be linked to this dataset. We determined that the risk that any one individual could be identified in the data is minimal.
- Cases by age group: Our analysis found that this data for the entire City was not putting resident privacy at risk. Each age group contains over 15,000 residents, and there are no other datasets that could be easily linked to this data.
There are some datasets that would be more risky to release.
For example, we do not publish new cases by neighborhood and age (or any other characteristic). In this case, the risk of identifying a particular person is too high. If there is a positive case among a small age group in a small neighborhood, the risk that someone might be re-identified in the data is higher.
These small cross sections (granular data for multiple overlapping characteristics) are more risky and could put resident privacy in jeopardy. We do not release this type of data. Instead, we share as much data as we can to ensure residents and journalists are informed without risking anyone’s privacy.
The Department of Public Health releases research studies on sub-populations, which shares key findings without risking privacy.
For questions related to specific populations, the Department of Public Health often publishes research reports. A report is different from the raw data sharing we do on the COVID tracker. A report shares analysis findings, often without producing the underlying data. This enables the City to share key findings on smaller populations and protect resident privacy.
For example, SFDPH researchers recently released a study on COVID-19 susceptibility and outcomes among people living with HIV in San Francisco. In San Francisco, gay and other men who have sex with men bear a disproportionate burden of the HIV epidemic. The SFDPH researchers found that living with HIV can result in immune suppression which is a risk factor for more severe COVID-19 disease. Susceptibility to COVID-19 was increased among people living with HIV over the first six months of the pandemic. Homelessness and higher rates of congregate living situations among people living with HIV likely accounted for this disparity.