%0 Computer Program
%A Vos, Jelle
%A Pentyala, Sikha
%A Golob, Steven
%A Maia, Ricardo José Menezes
%A Kelley, Dean
%A Erkin, Zekeriya
%A De Cock, Martine
%A Nascimento, Anderson
%D 2025
%T Code underlying: Privacy-Preserving Membership Queries for Federated Anomaly Detection
%U 
%R 10.4121/4e1739c5-f743-47cc-aa01-df52481e3fb3.v1
%K privacy enhancing technologies
%K anomaly detection
%K federated learning
%K private membership queries
%K secure computation
%K cryptography
%K elliptic curves
%X <p>Privacy-Preserving Feature Extraction for Detection of</p><p>Anomalous Financial Transactions</p><p><br></p><p>------------------------------------------------------------------------</p><p><br></p><p>This repository holds the code written by the PPMLHuskies for the 2nd Place solution in the PETs Prize Challenge, Track A.</p><p><br></p><p><strong>Description</strong></p><p><br></p><p>The task is to predict probabilities for anomalous transactions, from a</p><p>synthetic database of international transactions, and several synthetic</p><p>databases of banking account information. We provide two solutions. One</p><p>solution, our centralized approach, found in `solution_centralized.py`,</p><p>uses the transactions database (PNS) and the banking database with no</p><p>privacy protections. The second solution, which provides robust privacy</p><p>gurantees outlined in our report, follows a federated architecture,</p><p>found in `solution_federated.py` and model.py. In this approach, PNS</p><p>data resides in one client, banking data is divided up accross other</p><p>clients, and an aggregator handles all the communication between any</p><p>clients. We have built in privacy protections so that clients and the</p><p>aggregator learn minimal information about each other, while engaging in</p><p>communication to detect anomalous transactions in PNS.</p><p><br></p><p>The way in which we conduct training and inference in both the</p><p>centralized and the federated architectures is fundamentally the same</p><p>(other than the privacy protections in the latter). Several new features</p><p>are engineered from the given PNS data. Then a model is trained on those</p><p>features from PNS. Next, during inference, a check is made to determine</p><p>if attributes from a PNS transaction match with the banking data, or if</p><p>the associated account in the banking data is flagged. If any of these</p><p>attributes are amiss, we give it a value of 1, and a 0 otherwise.</p><p>Lastly, we take the maximum of the inferred probabilities from the PNS</p><p>model, and the result from the Banking data validation, which is used as</p><p>our final prediction for the probability that the transaction is</p><p>anomalous.</p><p><br></p><p>The difference between the federated and centralized logic is that in</p><p>the federated set up, where there are one or multiple partitions of the</p><p>banking data across clients, is that the PNS client engages in a</p><p>cryptographic protocol based on homomorphic encryption with the banking</p><p>clients, routed through the aggregator, to perform feature extraction.</p><p>This protocol, to ensure privacy, and that PNS does not learn anything</p><p>from the banks beyond the set membership of a select few features, is</p><p>carried out over several rounds, r. r = 7 + n, where n is the number of</p><p>bank clients.</p>
%I 4TU.ResearchData