Data Anonymization at SDIL

Analyses of personally identifiable information (PII) may interfere with the right to privacy of the respective identifiable persons. The processing of PII may even break the law (e.g. the German Federal Data Protection Act or the European Union’s General Data Protection Regulation).

Anonymization of PII can remedy this: Anonymization prevents the identification of a particular person. The regulations for the processing of anonymized data are substantially less strict in comparison with the processing of PII, so anonymized datasets can be analysed with less organizational effort. Anonymization also minimizes the personal risk posed to the data subjects and the legal risk to the smart data analyst.

Complete anonymization is not easy to achieve, however: There is a number of cases of successful de-anonymization attacks on supposedly anonymized data. How to achieve a robust anonymization is a scientific question that has not yet received a satisfactory answer. Nonetheless, there are a few approaches that can already be used in practice.

Besides the protection of PII, SDIL also takes the protection of trade secrets seriously. All datasets by SDIL’s project partners are handled in strict confidentiality. Datasets are kept in a secured computer cluster, located in Germany.

The anonymization tools used by SDIL can in some cases be used to protect the trade secrets of SDIL’s partners by removing them from the data before the data is handed to SDIL. SDIL will gladly provide advice and support regarding the necessary data preprocessing to its partners.

SDIL will gladly support its project partners in anonymizing datasets prior to the actual analysis. There is a number of software tools available:


ARX Deidentifier is a free software used for anonymization of databases. ARX supports several scientific notions of anonymization and features a graphical UI, making it easy to use.


μ-ARGUS is a software used by the dutch central bureau for statistics to anonymize datasets. The application is available for free, and can therefore be used in smart data analyses.


sdc-micro may be used to anonymize PII, too. It is based on the R programming language, which is specifically built to support statistical analyses. sdc-micro is free-to-use.