Our Disclosure Risk Calculator allows anyone to quickly and easily explore disclosure risks in a dataset. The data is stored in your browser unless you explicitly ask to upload it to our server for more detailed results.

Elliott, Milne, Roberts, Simpson, and Sporle (2022). Disclosure Risk Calculator. https://risk.terourou.org.

What it does

The app reads the dataset and asks you to choose a selection of variables that might be used for identification and provide a population size or sampling fraction.

  • Population size: the total size of the population the data was collected from (e.g., if the data is a sample of adults, this is the total number of adults in the population).
  • Sampling fraction: the fraction of the population included in the sample (e.g., the fraction of adults in your sample).

Using the chosen variables, the software calculates the number of unique combinations (i.e., individuals in the sample with a unique set of responses for the chosen variables) and pairs of combinations. This is combined with the sampling fraction to calculate the disclosure risk.

Since this calculation is simple, it can be performed in your browser without uploading the data to our server.

Uploading data for extra details

If you choose to upload your data to the server by clicking the ‘Upload’ button, the encrypted version of the data (in which values are replaced by random labels) is uploaded to a secure process that only your current connection can access. It then makes use of functions from the ‘sdcMicro’ R package to calculate:

  • Variable contributions: the percentage of the estimated disclosure risk that is attributed by each variable;
  • Individual disclosure risks: the disclosure risk of each row/individual in the dataset, displayed in a table ordered from highest risk to lowest.

You can adjust the chosen variables and sampling fraction to see the effect on the estimated disclosure risk(s).

Once you close your browser, the data will be released from memory and deleted.

How it works

We have an R server running using Rserve that the client (your browser) can connected to using websockets. When you load the app, the websocket connection creates a new R instance on the server that is tied directly to your connection — once the websocket closes (i.e., if you close your browser or refresh the page) the associated R session is killed and any data contained within it is lost. Importantly, noone else can access this R session.

Once the connection is made and the user clicks the ‘Upload’ button, the encrypted version of the data is sent to the R process and stored in memory as an R dataframe. The process then returns a function (calculate_risk()) which can be called by your app session (and only yours) to estimate the risk information. This function is scoped, so the data is availble to it but doesn't need to be re-uploaded each time.

If you want more details on how this all works, this project is completely open-source:

View on GitHub

If you have questions to feedback, please either open an issue on Github or send us an email at terourounz@gmail.com.