https://arxiv.org/abs/1610.06918 Learning to Protect Communications with Adversarial Neural Cryptography

We ask whether neural networks can learn to use secret keys to protect information from other neural networks. Specifically, we focus on ensuring confidentiality properties in a multiagent system, and we specify those properties in terms of an adversary. Thus, a system may consist of neural networks named Alice and Bob, and we aim to limit what a third neural network named Eve learns from eavesdropping on the communication between Alice and Bob. We do not prescribe specific cryptographic algorithms to these neural networks; instead, we train end-to-end, adversarially. We demonstrate that the neural networks can learn how to perform forms of encryption and decryption, and also how to apply these operations selectively in order to meet confidentiality goals.

https://arxiv.org/abs/1610.05755v1 Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

We demonstrate a generally applicable approach to providing strong privacy guarantees for training data. The approach combines, in a black-box fashion, multiple models trained with disjoint datasets, such as records from different subsets of users. Because they rely directly on sensitive data, these models are not published, but instead used as teachers for a student model. The student learns to predict an output chosen by noisy voting among all of the teachers, and cannot directly access an individual teacher or the underlying data or parameters. The student's privacy properties can be understood both intuitively (since no single teacher and thus no single dataset dictates the student's training) and formally, in terms of differential privacy. These properties hold even if an adversary can not only query the student but also inspect its internal workings.

https://regmedia.co.uk/2016/09/30/sec16_paper_tramer.pdf Stealing Prediction APIs

https://arxiv.org/abs/1610.06940v1 Safety Verification of Deep Neural Networks

https://arxiv.org/abs/1611.03186 SoK: Applying Machine Learning in Security - A Survey

We examine the generalized system designs, underlying assumptions, measurements, and use cases in active research. Our examinations lead to 1) a taxonomy on ML paradigms and security domains for future exploration and exploitation, and 2) an agenda detailing open and upcoming challenges. Based on our survey, we also suggest a point of view that treats security as a game theory problem instead of a batch-trained ML problem.

https://blog.acolyer.org/2016/11/14/acing-the-ioc-game-toward-automatic-discovery-and-analysis-of-open-source-cyber-threat-intelligence/

Even if security isn’t your thing, the general approach of extracting useful information from large volumes of blog postings or other web pages containing semi-structured information mixed with informal prose has lots of applications. There many challenges both small and large to be overcome along the way, and this paper does a good job of describing approaches to many of them.

Notes from OpenAI: https://docs.google.com/document/d/1wtkYm1aVMI7fDxLgaAIhIoBrodbDQmbYDyQTfe1b5CM/edit

https://arxiv.org/abs/1702.07464v1 Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning

In our generative model attack, all techniques adopted to scramble or obfuscate shared parameters in collaborative deep learning are rendered ineffective with no possibility of a remedy under the threat model considered.

https://arxiv.org/abs/1701.04082 Embedding Watermarks into Deep Neural Networks

Deep neural networks have recently achieved significant progress. Sharing trained models of these deep neural networks is very important in the rapid progress of researching or developing deep neural network systems. At the same time, it is necessary to protect the rights of shared trained models. To this end, we propose to use a digital watermarking technology to protect intellectual property or detect intellectual property infringement of trained models. Firstly, we formulate a new problem: embedding watermarks into deep neural networks. We also define requirements, embedding situations, and attack types for watermarking to deep neural networks. Secondly, we propose a general framework to embed a watermark into model parameters using a parameter regularizer. Our approach does not hurt the performance of networks into which a watermark is embedded. Finally, we perform comprehensive experiments to reveal the potential of watermarking to deep neural networks as a basis of this new problem. We show that our framework can embed a watermark in the situations of training a network from scratch, fine-tuning, and distilling without hurting the performance of a deep neural network. The embedded watermark does not disappear even after fine-tuning or parameter pruning; the watermark completely remains even after removing 65% of parameters were pruned.

https://arxiv.org/abs/1705.05427 Repeated Inverse Reinforcement Learning for AI Safety

https://arxiv.org/pdf/1705.10720v1.pdf Low Impact Artificial Intelligences

https://arxiv.org/abs/1707.08476v1 Guidelines for Artificial Intelligence Containment

We propose a number of guidelines which should help AI safety researchers to develop reliable sandboxing software for intelligent programs of all levels. Such safety container software will make it possible to study and analyze intelligent artificial agent while maintaining certain level of safety against information leakage, social engineering attacks and cyberattacks from within the container.

https://arxiv.org/abs/1703.09471v2 Adversarial Image Perturbation for Privacy Protection – A Game Theory Perspective

Game theory provides tools for studying the interaction between agents with uncertainties in the strategies. We introduce a general game theoretical framework for the user-recogniser dynamics, and present a case study that involves current state of the art AIP and person recognition techniques. We derive the optimal strategy for the user that assures an upper bound on the recognition rate independent of the recogniser's counter measure.

https://arxiv.org/abs/1708.08022 On the Protection of Private Information in Machine Learning Systems: Two Recent Approaches

https://arxiv.org/abs/1702.07464v3 Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning

Unfortunately, we show that any privacy-preserving collaborative deep learning is susceptible to a powerful attack that we devise in this paper.

https://blog.acolyer.org/2017/11/01/deepxplore-automated-whitebox-testing-of-deep-learning-systems/

https://arxiv.org/abs/1802.08908 Scalable Private Learning with PATE

https://arxiv.org/abs/1803.04585 Categorizing Variants of Goodhart's Law

https://arxiv.org/abs/1606.06565 Concrete Problems in AI Safety

We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from

* having the wrong objective function (“avoiding side effects” and “avoiding reward hacking”), * an objective function that is too expensive to evaluate frequently (“scalable supervision”), or * undesirable behavior during the learning process (“safe exploration” and “distributional shift”).

https://arxiv.org/pdf/1801.05507.pdf GAZELLE: A Low Latency Framework for Secure Neural Network Inference

https://arxiv.org/abs/1808.07261 Increasing Trust in AI Services through Supplier's Declarations of Conformity

https://arxiv.org/abs/1810.08130 Private Machine Learning in TensorFlow using Secure Computation

https://ai.google/education/responsible-ai-practices?twitter=@bigdata

https://arxiv.org/abs/1812.00564v1 Split learning for health: Distributed deep learning without sharing raw patient data

https://arxiv.org/abs/1806.01186 Measuring and avoiding side effects using relative reachability

We introduce a general definition of side effects, based on relative reachability of states compared to a default state, that avoids these undesirable incentives. Using a set of gridworld experiments illustrating relevant scenarios, we empirically compare relative reachability to penalties based on existing definitions and show that it is the only penalty among those tested that produces the desired behavior in all the scenarios.