Statistical database security focuses on the protection of confidential individual values stored in so-called statistical databases and used for statistical purposes. Examples include patient records used by medical researchers, and detailed phone call records, statistically analyzed by phone companies in order to improve their services. This problem became apparent in the 1970s and has escalated in recent years due to massive data collection and growing social awareness of individual privacy. The techniques used for preventing statistical database compromise fall into two categories: noise addition, where all data and/or statistics are available but are only approximate rather than exact, and restriction, where the system only provides those statistics and/or data that are considered safe. In either case, a technique is evaluated by measuring both the information loss and the achieved level of privacy. The goal of statistical data protection is to maximize the privacy while minimizing the information loss. In order to evaluate a particular technique it is important to establish a theoretical lower bound on the information loss necessary to achieve a given level of privacy. In this chapter, we present an overview of the problem and the most important results in the area.
Security, Privacy and Trust in Modern Data Management p. 167-181