Data Management
The insights gained by the large-scale analysis of health-related data can have an enormous impact in public health and medical research, but access to such personal and sensitive data poses serious privacy implications for the data provider and a heavy data security and administrative burden on the data consumer. The discussion on policies for balancing between scientific advancement and privacy are very relevant, but should be complemented by the equally relevant discussion of whether there is tension at all between data privacy and data-driven research. In other words, it might be possible to simultaneously satisfy privacy considerations and address the needs of data-intensive medical research.
In RADIO, we have developed the secure summation protocol RASSP. The RASSP Protocol provides privacy-preserving peer-to-peer distributed computation of statistics. The querying application can get the sum of values it never sees individually. The members of the network exchange and process derivative values that cannot be transformed back to the original secret values. The core conceptual infrastructure existed but was never worked into a full, implementable communications protocol. We designed and
implemented protocol and stack for peer-to-peer communication, Scala/Java library for node clients and R library that abstracts the library calls into statistical functions (t-test, average, etc.)
A full description of the protocol has been published by Zamani et al. (2016) and the source code of the implementation is also publicly available.
The RASSP Challenge
In order to test the RASSP protocol and our implementation, we are organizing a capture the flag hacking competition. We have setup 100 RASSP nodes as Debian VMs on the cloud, execting the code at
https://bitbucket.org/dataengineering/rassp
The RASSP nodes are deployed as Docker containers. The Docker image used can be found at
https://github.com/gmouchakis/docker-rassp
Competition participants are granted full control of one of the nodes so they know the secret database of exactly one node. Aggregate queries can be executed by any participant from their own client machine. Although they control the query and participate in the network, participants should not be able to discover other nodes' secret values. Monetary reward for discovering secret values and further reward for patching the security gap.