Project Denition and Motivation
Internet of Things (IoT) is one of the hottest universal technologies available nowadays. In fact, Swisscom estimates around 50 billion IoT devices by 2020. More particularly, IoT devices are also being widely used in personalized medicine; in fact, we count 3.7 million medical IoT devices in 2018.
As the number of IoT devices keeps increasing over the years, the amount of data to be analyzed increases significantly as well, making it very dificult to store the data locally. Given that IoT devices have energy and resource constraints and that data coming from IoT devices has to be further processed in order to extract meaningful statistics, there is no choice but to upload this data to remote dedicated servers to carry out the necessary computations. However, user data is sensitive, it can include personally identifiable information (PII). So, instead of having users "blindly" trust the servers to respect the privacy of their data, our solution will use a homomorphic cryptosystem since homomorphic encryption is a type of encryption that allows us to execute the desired computations on the encrypted data without first decrypting it. In that case, the aforementioned servers could calculate the desired statistics (e.g. average, variance ...) and send back the encrypted response to the corresponding user.
Consequently, the main objective of this project is to compute aggregate statistics over encrypted medical data collected by IoT devices in a secure and privacy-conscious way.
The main use case of the project is to detect arrhythmia using the RR-interval duration signal. In fact, arrhythmia describes an irregular heartbeat (too fast, too slow, too early), and the RR interval is the interval from the peak of one QRS complex to the peak of the next, as shown on an electrocardiogram. The reason why the RR-interval is important is because studies show that the average heart rate together with the variance of the RR-interval can help diagnose arrhythmia.
One of the use case scenarios is the following: we have patients (e.g. in retirement home) wearing Kenzen patches (IoT devices/patches used in this project) that record ECGs of their cardiac activity. The ECGs are then sent to local Raspberry Pis (RPis) that will process the ECGs using the Pan Tompkins algorithm (QRS complex detection algorithm). The output would be the heart rate (HR) and RR-interval duration, whose average and variance can be computed respectively over a certain time period (e.g. 24 hrs, 3 days ). At the end of that process, a specialist (e.g. doctor) queries for these results, in order to potentially detect arrhythmia.
The requirements of the described system are the following::
- Confi dentiality of patients' data
- Correct computations at RPis: make sure that the RPis do not cheat by outputting random fake values for the heart rate and RR-interval in order to obtain meaningful diagnosis
- Privacy of nal result: only the nal result is revealed to entity with enough privileges (e.g. assigned doctor)
- Fast encryption at RPis: we want our system to be somewhat fast, given that RPis are not very powerful machines
Envisioned System Model
In this section, we present our envisioned system model, conforming to the system requirements de fined in
Figure 1: Envisioned System Model With 9 Kenzen Patches, 3 Raspberry Pis, 3 Servers, 3 Verifying Nodes and a Querier.
We first see that we have 9 Kenzen patches, each 3 connected via Bluetooth v4 to a RPi equipped with a local database. The patches monitor the heart activity of the patients and produce ECGs to the RPi, which will process them and output HR and RR, as previously mentioned, that will be stored as samples in the RPi's database. The RPis will be playing the role of data providers (DPs, a kind of distributed database across the system. Let us now detail the progress of a query issued by the querier Q).
The query is then broadcast to the server S 1 (in direct communication with the querier) which forwards the query to the other servers. Each server then in turns forwards the query to the DPs assigned to it. Once the DP gets the query, it will calculate the desired statistic (depending on the query), encrypt it homomorphically and then send it back to the corresponding server, which will be aggregating the answers of all its DPs. A final collective aggregation step is done at S 1 where S 1 sums up the values sent to it by all other servers, and then forwards the fi nal (encrypted) result to Q, the only entity able to decrypt the result and then output the desired statistic.
Furthermore, to make sure that the DPs and servers do not cheat (send false values instead of the true value), we make use of Verifying Nodes (VNs), that collectively maintain a blockchain where they immutably store proofs of computation, to be later veri ed Q.
Experiments and Results
In this section, we include some preliminary results. Shown below is the simulation of the Average operation, where the results are averaged over 20 di erent runs for consistency. The purpose is to perform a scalability analysis where we increase the number of DPs and keep the number of servers constant to 3.
Regarding the hardware setup, the local databases at the RPis are populated with 302,400 rows of synthetic data. Furthermore, we used servers from EPFL's IC cluster as the actual servers in our system model.
Moreover, the average encryption (at RPis) and decryption (querier) times are 95 ms and 570 s respectively, which is very good considering the RPis are somewhat resource limited.
We were able to securely and correctly solve a real problem, which is to provide a meaningful arrhythmia diagnosis, using a system that scales well with increasing number of data providers and servers.