Federated Learning. Yang Liu
Читать онлайн книгу.conducted via a mask encrypted by Paillier’s scheme and the intermediate data computed by each party. The encrypted masked intermediate results are exchanged between the two parties in the secure gradient descent algorithm. Finally, the encrypted gradient is sent to a coordinator for decryption and model update.
CryptoNets [Gilad-Bachrach et al., 2016] is an HE-based methodology announced by Microsoft Research that allows secure evaluation (inference) of encrypted queries over already trained neural networks on cloud servers: queries from the clients can be classified securely by the trained neural network model on a cloud server without inferring any information about the query or the result. The CryptoDL [Hesamifard et al., 2017] framework is a leveled HE-based approach for secure neural network inference. In CryptoDL, several activation functions are approximated using low-degree polynomials and mean-pooling is used as a replacement for max-pooling. The GAZELLE [Juvekar et al., 2018] framework is proposed as a scalable and low-latency system for secure neural network inference. In GAZELLE, to conduct secure nonlinear function evaluation in neural network inference, HE and traditional secure two-party computation techniques (such as GC) are combined in an intricate way. The packed additive homomorphic encryption (PAHE) embraced in GAZELLE allows single instruction multiple data (SIMD) arithmetic homomorphic operations over encrypted data.
FedMF [Chai et al., 2019] uses Paillier’s HE for secure federated matrix factorization assuming honest-but-curious server and honest clients. Secure federated transfer learning is studied via Paillier’s HE scheme in Liu et al. [2019], where the semi-honest third party is into the discard by mixing HE with additive secret sharing in decryption process.
2.4.3 DIFFERENTIAL PRIVACY
DP was originally developed to facilitate secure analysis over sensitive data. With the rise of ML, DP has become an active research field again in the ML community. This is motivated by the fact that many exciting results from DP can be applied to PPML [Dwork et al., 2016, 2006]. The key idea of DP is to confuse the adversaries when they are trying to query individual information from the database so that adversaries cannot distinguish individual-level sensitivity from the query result.
Definition
DP is a privacy definition initially proposed by Dwork et al. [2006], developed in the context of statistical disclosure control. It provides an information-theoretic security guarantee that the outcome of a function to be insensitive to any particular record in the dataset. Therefore, DP can be used to resist the membership inference attack. The (∊, δ)-differential privacy is defined as follows.
Definition 2.2 (∊, δ)-differential privacy. A randomized mechanism M preserves (∊, δ)-differential privacy if given any two datasets D and D′ differing by only one record, and for all S ⊂ Range (M),
where ∊ is the privacy budget and δ is the failure probability.
The quantity
DP has utility-privacy trade-offs as it introduces noise to data. Jayaraman and Evans [2019] found out that current mechanisms for differential privacy for ML rarely offer acceptable utility-privacy trade-offs: settings that provide limited accuracy loss provide little effective privacy, and settings that provide strong privacy result in large accuracy loss.
Categorization of DP Schemes
Typically, there are mainly two ways to achieve DP by adding noise to the data. One is the addition of noise according to the sensitivity of a function [Dwork et al., 2006]. The other is choosing noise according to an exponential distribution among discrete values [McSherry and Talwar, 2007].
The sensitivity of a real-valued function expresses the maximum possible change in its value due to the addition or removal of a single sample.
Definition 2.3 Sensitivity. For two datasets D and D′ differing by only one record, and a function M : D → Rd over an arbitrary domain, the sensitivity of M is the maximum change in the output of M over all possible inputs:
where ‖·‖ is a norm of the vector. The l1-sensitivity or the l2-sensitivity is defined when the l1-norm or l2-norm is applied, respectively.
We denote the Laplace distribution with parameter b as Lap(b). Lap(b) has a probability density function
Theorem 2.4 Given a function M : D → Rd over an arbitrary domain D, for any input X, the function:
provides ∊-differential privacy. The ∊-differential privacy can also be achieved by adding independently generated Laplace noise from distribution Lap(ΔM/∊) to each of the d output terms.
The addition of Gaussian or binomial noise, scaled to the l2-sensitivity of the function, sometimes yields better accuracy, while only ensuring the weaker (∊, δ)-differential privacy [Dwork et al., 2006, Dwork and Nissim, 2004].
The exponential mechanism [McSherry and Talwar, 2007] is another way to obtain DP. The exponential mechanism is given a quality function q that scores outcomes of a calculation, where higher scores are better. For a given database and ∊ parameter, the quality function induces a probability distribution over the output domain, from which the exponential mechanism samples the outcome. This probability distribution favors high-scoring outcomes, while ensuring ∊-differential privacy.
Definition 2.5 Let q : (Dn × R) → R be a quality function, which given a dataset d ∈ Dn, assigns a score to each outcome r ∈ R. For any two datasets D and D′ differing by only one record, let
provides ∊-differential privacy.
The DP algorithms can be categorized according to how and where the perturbation is applied.
1. Input perturbation: The noise is added to the training data.
2. Objective perturbation: The noise is added to the objective function of the learning algorithms.
3. Algorithm perturbation: The noise is added to the intermediate values such as gradients in iterative algorithms.
4. Output perturbation: The noise is added to the output parameters after training.
DP still exposes the statistics of a party, which are sensitive in some cases, such as financial data, medical data and other commercial and health applications. Readers who are interested in DP and willing to learn more about it can refer to the tutorial given by Dwork and Roth [2014].
Application in PPML
In federated learning, to enable model training on distributed datasets held by multiple parties, local differential privacy (LDP) can be used. With local