%0 Computer Program %A Vos, Jelle %A Pentyala, Sikha %A Golob, Steven %A Maia, Ricardo José Menezes %A Kelley, Dean %A Erkin, Zekeriya %A De Cock, Martine %A Nascimento, Anderson %D 2025 %T Code underlying: Privacy-Preserving Membership Queries for Federated Anomaly Detection %U %R 10.4121/4e1739c5-f743-47cc-aa01-df52481e3fb3.v1 %K privacy enhancing technologies %K anomaly detection %K federated learning %K private membership queries %K secure computation %K cryptography %K elliptic curves %X <p>Privacy-Preserving Feature Extraction for Detection of</p><p>Anomalous Financial Transactions</p><p><br></p><p>------------------------------------------------------------------------</p><p><br></p><p>This repository holds the code written by the PPMLHuskies for the 2nd Place solution in the PETs Prize Challenge, Track A.</p><p><br></p><p><strong>Description</strong></p><p><br></p><p>The task is to predict probabilities for anomalous transactions, from a</p><p>synthetic database of international transactions, and several synthetic</p><p>databases of banking account information. We provide two solutions. One</p><p>solution, our centralized approach, found in `solution_centralized.py`,</p><p>uses the transactions database (PNS) and the banking database with no</p><p>privacy protections. The second solution, which provides robust privacy</p><p>gurantees outlined in our report, follows a federated architecture,</p><p>found in `solution_federated.py` and model.py. In this approach, PNS</p><p>data resides in one client, banking data is divided up accross other</p><p>clients, and an aggregator handles all the communication between any</p><p>clients. We have built in privacy protections so that clients and the</p><p>aggregator learn minimal information about each other, while engaging in</p><p>communication to detect anomalous transactions in PNS.</p><p><br></p><p>The way in which we conduct training and inference in both the</p><p>centralized and the federated architectures is fundamentally the same</p><p>(other than the privacy protections in the latter). Several new features</p><p>are engineered from the given PNS data. Then a model is trained on those</p><p>features from PNS. Next, during inference, a check is made to determine</p><p>if attributes from a PNS transaction match with the banking data, or if</p><p>the associated account in the banking data is flagged. If any of these</p><p>attributes are amiss, we give it a value of 1, and a 0 otherwise.</p><p>Lastly, we take the maximum of the inferred probabilities from the PNS</p><p>model, and the result from the Banking data validation, which is used as</p><p>our final prediction for the probability that the transaction is</p><p>anomalous.</p><p><br></p><p>The difference between the federated and centralized logic is that in</p><p>the federated set up, where there are one or multiple partitions of the</p><p>banking data across clients, is that the PNS client engages in a</p><p>cryptographic protocol based on homomorphic encryption with the banking</p><p>clients, routed through the aggregator, to perform feature extraction.</p><p>This protocol, to ensure privacy, and that PNS does not learn anything</p><p>from the banks beyond the set membership of a select few features, is</p><p>carried out over several rounds, r. r = 7 + n, where n is the number of</p><p>bank clients.</p> %I 4TU.ResearchData