Abstract

Neural networks are vulnerable to input perturbations such as additive noise and adversarial attacks. In contrast, human perception is much more robust to such perturbations. The Bayesian brain hypothesis states that human brains use an internal generative model to update the posterior beliefs of the sensory input. This mechanism can be interpreted as a form of self-consistency between the maximum a posteriori (MAP) estimation of an internal generative model and the external environment. Inspired by such hypothesis, we enforce self-consistency in neural networks by incorporating generative recurrent feedback. We instantiate this design on convolutional neural networks (CNNs). The proposed framework, termed Convolutional Neural Networks with Feedback (CNN-F), introduces a generative feedback with latent variables to existing CNN architectures, where consistent predictions are made through alternating MAP inference under a Bayesian framework. In the experiments, CNN-F shows considerably improved adversarial robustness over conventional feedforward CNNs on standard benchmarks.

Introduction
Approach
Experiment
- Generative feedback promotes robustness
- Adversarial Training
Related work
Conclusion
Broader Impacts
- Acknowledgements

Introduction

An intuitive illustration of recurrent generative feedback in human visual perception system. [fig:intuition] — **Figure 1.** An intuitive illustration of recurrent generative feedback in human visual perception system.

Vulnerability in feedforward neural networks Conventional deep neural networks (DNNs) often contain many layers of feedforward connections. With the ever-growing network capacities and representation abilities, they have achieved great success. For example, recent convolutional neural networks (CNNs) have impressive accuracy on large scale image classification benchmarks (Szegedy et al. 2016). However, current CNN models also have significant limitations. For instance, they can suffer significant performance drop from corruptions which barely influence human recognition (Dodge and Karam 2017). Studies also show that CNNs can be misled by imperceptible noise known as adversarial attacks (Szegedy et al. 2014).

Feedback in the human brain To address the weaknesses of CNNs, we can take inspiration from of how human visual recognition works, and incorporate certain mechanisms into the CNN design. While human visual cortex has hierarchical feedforward connections, backward connections from higher level to lower level cortical areas are something that current artificial networks are lacking (Felleman and Van Essen 1991). Studies suggest these backward connections carry out top-down processing which improves the representation of sensory input (Kok, Jehee, and De Lange 2012). In addition, evidence suggests recurrent feedback in the human visual cortex is crucial for robust object recognition. For example, humans require recurrent feedback to recognize challenging images (Kar et al. 2019). Obfuscated images can fool humans without recurrent feedback (Elsayed et al. 2018). Figure 1 shows an intuitive example of recovering a sharpened cat from a blurry cat and achieving consistent predictions after several iterations.

Predictive coding and generative feedback Computational neuroscientists speculate that Bayesian inference models human perception (Knill and Richards 1996). One specific formulation of predictive coding assumes Gaussian distributions on all variables and performs hierarchical Bayesian inference using recurrent, generative feedback pathways (Rao and Ballard 1999). The feedback pathways encode predictions of lower level inputs, and the residual errors are used recurrently to update the predictions. In this paper, we extend the principle of predictive coding to explicitly incorporate Bayesian inference in neural networks via generative feedback connections. Specifically, we adopt a recently proposed model, named the Deconvolutional Generative Model (DGM) (Nguyen et al. 2018), as the generative feedback. The DGM introduces hierarchical latent variables to capture variation in images, and generates images from a coarse to fine detail using deconvolutional operations.

Our contributions are as follows:

Self-consistency We introduce generative feedback to neural networks and propose the self-consistency formulation for robust perception. Our internal model of the world reaches a self-consistent representation of an external stimulus. Intuitively, self-consistency says that given any two elements of label, image and auxillary information, we should be able to infer the other one. Mathematically, we use a generative model to describe the joint distribution of labels, latent variables and input image features. If the MAP estimate of each one of them are consistent with the other two, we call a label, a set of latent variables and image features to be self-consistent (Figure 4).

CNN with Feedback (CNN-F) We incorporate generative recurrent feedback modeled by the DGM into CNN and term this model as CNN-F. We show that Bayesian inference in the DGM is achieved by CNN with adaptive nonlinear operators (Figure 2). We impose self-consistency in the CNN-F by iterative inference and online update. Computationally, this process is done by propagating along the feedforward and feedback pathways in the CNN-F iteratively (Figure 3).

Adversarial robustness We show that the recurrent generative feedback in CNN-F promotes robustness and visualizes the behavior of CNN-F over iterations. We find that more iterations are needed to reach self-consistent prediction for images with larger perturbation, indicating that recurrent feedback is crucial for recognizing challenging images. When combined with adversarial training, CNN-F further improves adversarial robustness of CNN on both Fashion-MNIST and CIFAR-10 datasets.

Left: CNN, Graphical model for the DGM and the inference network for the DGM. We use the DGM to as the generative model for the joint distribution of image features h, labels y and latent variables z. MAP inference for h, y and z is denoted in red, green and blue respectively. f and g denotes feedforward features and feedback features respectively. Right: CNN with feedback (CNN-F). CNN-F performs alternating MAP inference via recurrent feedforward and feedback pathways to enforce self-consistency. — **Figure 2.** **Left: CNN, Graphical model for the DGM and the inference network for the DGM.** We use the DGM to as the generative model for the joint distribution of image features h, labels y and latent variables z. MAP inference for h, y and z is denoted in red, green and blue respectively. f and g denotes feedforward features and feedback features respectively. **Right: CNN with feedback (CNN-F)**. CNN-F performs alternating MAP inference via recurrent feedforward and feedback pathways to enforce self-consistency.

Approach

Feedforward and feedback pathway in CNN-F. a) {\hat{y}} and {\hat{z}} are computed by the feedforward pathway and {\hat{h}} is computed from the feedback pathway. b) Illustration of the AdaReLU operator. c) Illustration of the AdaPool operator. — **Figure 3.** **Feedforward and feedback pathway in CNN-F.** a) ŷ and ẑ are computed by the feedforward pathway and ĥ is computed from the feedback pathway. b) Illustration of the AdaReLU operator. c) Illustration of the AdaPool operator.

In this section, we first formally define self-consistency. Then we give a specific form of generative feedback in CNN and impose self-consistency on it. We term this model as CNN-F. Finally we show the training and testing procedure in CNN-F. Throughout, we use the following notations:

Let x ∈ ℝⁿ be the input of a network and y ∈ ℝ^K be the output. In image classification, x is image and y = (y⁽¹⁾, …, y^(K)) is one-hot encoded label. K is the total number of classes. K is usually much less than n. We use L to denote the total number of network layers, and index the input layer to the feedforward network as layer 0. Let h ∈ ℝ^m be encoded feature of x at layer k of the feedforward pathway. Feedforward pathway computes feature map f(ℓ) from layer 0 to layer L, and feedback pathway generates g(ℓ) from layer L to k. g(ℓ) and f(ℓ) have the same dimensions. To generate h from y, we introduce latent variables for each layer of CNN. Let z(ℓ) ∈ ℝ^{C × H × W} be latent variables at layer ℓ, where C, H, W are the number of channels, height and width for the corresponding feature map. Finally, p(h, y, z; θ) denotes the joint distribution parameterized by θ, where θ includes the weight W and bias term b of convolution and fully connected layers. We use ĥ, ŷ and ẑ to denote the MAP estimates of h, y, z conditioning on the other two variables.

Generative feedback and Self-consistency

Self-consistency among {\hat{h}},{\hat{z}},{\hat{y}} and consistency between {\hat{h}} and h. — **Figure 4.** Self-consistency among ĥ, ẑ, ŷ and consistency between ĥ and h.

Human brain and neural networks are similar in having a hierarchical structure. In human visual perception, external stimuli are first preprocessed by lateral geniculate nucleus (LGN) and then sent to be processed by V1, V2, V4 and Inferior Temporal (IT) cortex in the ventral cortical visual system. Conventional NN use feedforward layers to model this process and learn a one-direction mapping from input to output. However, numerous studies suggest that in addition to the feedforward connections from V1 to IT, there are feedback connections among these cortical areas (Felleman and Van Essen 1991).

Inspired by the Bayesian brain hypothesis and the predictive coding theory, we propose to add generative feedback connections to NN. Since h is usually of much higher dimension than y, we introduce latent variables z to account for the information loss in the feedforward process. We then propose to model the feedback connections as MAP estimation from an internal generative model that describes the joint distribution of h, z and y. Furthermore, we realize recurrent feedback by imposing self-consistency (Definition 1).

(Self-consistency) Given a joint distribution p(h, y, z; θ) parameterized by θ, (ĥ, ŷ, ẑ) are self-consistent if they satisfy the following constraints:
$$\begin{aligned} \label{eqn:selfconsis} {\hat{y}}= \arg\,\max_y p(y|{\hat{h}},{\hat{z}}), \qquad {\hat{h}}= \arg\,\max_h p(h|{\hat{y}},{\hat{z}}), \qquad {\hat{z}}= \arg\,\max_z p(z|{\hat{h}},{\hat{y}}) \end{aligned}$$

In words, self-consistency means that MAP estimates from an internal generative model are consistent with each other. In addition to self-consistency, we also impose the consistency constraint between ĥ and the external input features (Figure 4). We hypothesize that for easy images (familiar images to human, clean images in the training dataset for NN), the ŷ from the first feedforward pass should automatically satisfy the self-consistent constraints. Therefore, feedback need not be triggered. For challenging images (unfamiliar images to human, unseen perturbed images for NN), recurrent feedback is needed to obtain self-consistent (ĥ, ŷ, ẑ) and to match ĥ with h. Such recurrence resembles the dynamics in neural circuits (Kietzmann et al. 2019) and the extra effort to process challenging images (Kar et al. 2019).

Generative Feedback in CNN-F

CNN have been used to model the hierarchical structure of human retinatopic fields (Eickenberg et al. 2017, Horikawa and Kamitani 2017), and have achieved state-of-the-art performance in image classification. Therefore, we introduce generative feedback to CNN and impose self-consistency on it. We term the resulting model as CNN-F.

We choose to use the DGM (Nguyen et al. 2018) as generative feedback in the CNN-F. The DGM introduces hierarchical binary latent variables and generates images from coarse to fine details. The generation process in the DGM is shown in Figure 3 (a). First, y is sampled from the label distribution. Then each entry of z(ℓ) is sampled from a Bernoulli distribution parameterized by g(ℓ) and a bias term b(ℓ). g(ℓ) and z(ℓ) are then used to generate the layer below:
g(ℓ − 1) = W(*^⊺)(ℓ)(z(ℓ) ⊙ g(ℓ))
In this paper, we assume p(y) to be uniform, which is realistic under the balanced label scenario. We assume that h follows Gaussian distribution centered at g(k) with standard deviation σ.

Recurrence in CNN-F

In this section, we show that self-consistent (ĥ, ŷ, ẑ) in the DGM can be obtained via alternately propagating along feedforward and feedback pathway in CNN-F.

Feedforward and feedback pathway in CNN-F

The feedback pathway in CNN-F takes the same form as the generation process in the DGM (Equation (1)). The feedforward pathway in CNN-F takes the same form as CNN except for the nonlinear operators. In conventional CNN, nonlinear operators are σ_ReLU(f) = max (f, 0) and σ_MaxPool(f) = max_r × rf, where r is the dimension of the pooling region in the feature map (typically equals to 2 or 3). In contrast, we use σ_AdaReLU and σ_AdaPool given in Equation (2) in the feedforward pathway of CNN-F. These operators adaptively choose how to activate the feedforward feature map based on the sign of the feedback feature map. The feedforward pathway computes f(ℓ) using the recursion f(ℓ) = W(ℓ) * σ(f(ℓ − 1))} + b(ℓ).¹
$${\sigma_{\text{AdaReLU}}}(f) = \begin{cases} {\sigma_{\text{ReLU}}}(f), \quad\text{if } g \geq 0 \\ {\sigma_{\text{ReLU}}}(-f), \quad\text{if } g<0 \end{cases} \quad {\sigma_{\text{AdaPool}}}(f) = \begin{cases} {\sigma_{\text{MaxPool}}}(f), \quad\text{if } g \geq 0 \\ -{\sigma_{\text{MaxPool}}}(-f), \quad\text{if } g<0 \end{cases}$$

MAP inference in the DGM

Given a joint distribution of h, y, z modeled by the DGM, we aim to show that we can make predictions using a CNN architecture following the Bayes rule (Theorem 5). To see this, first recall that generative classifiers learn a joint distribution p(x, y) of input data x and their labels y, and make predictions by computing p(y|x) using the Bayes rule. A well known example is the Gaussian Naive Bayes model (GNB). The GNB models p(x, y) by p(y)p(x|y), where y is Boolean variable following a Bernoulli distribution and p(x|y) follows Gaussian distribution. It can be shown that p(y|x) computed from GNB has the same parametric form as logistic regression.

Assumption 2

(Constancy assumption in the DGM). A. The generated image g(k) at layer k of DGM satisfies ||g(k)||₂² = const. B. Prior distribution on the label is a uniform distribution: p(y) = const. C. Normalization factor in p(z|y) for each category is constant: ∑_ze^η(y, z) = const.

To meet Assumption 2.A, we can normalize g(k) for all k. This results in a form similar to the instance normalization that is widely used in image stylization (Ulyanov, Vedaldi, and Lempitsky 2016). See Appendix for more detailed discussion. Assumption 2.B assumes that the label distribution is balanced. η in Assumption 2.C is used to parameterize p(z|y). See Appendix for the detailed form.

Theorem 5

Under Assumption 2 and given a joint distribution p(h, y, z) modeled by the DGM, p(y|h, z) has the same parametric form as a CNN with σ_AdaReLU and σ_AdaPool.

Please refer to Appendix.

Theorem 5 says that DGM and CNN is a generative-discriminative pair in analogy to GNB and logistic regression.

We also find the form of MAP inference for image feature ĥ and latent variables ẑ in the DGM. Specifically, we use z_R and z_P to denote latent variables that are at a layer followed by AdaReLU and AdaPool respectively. 𝟙( ⋅ ) denotes indicator function.

Proposition 6

[MAP inference in the DGM] Under Assumption 2, the following hold:

A. Let h be the feature at layer k, then ĥ = g(k).

B. MAP estimate of z(ℓ) conditioned on h, y and {z(j)}_{j ≠ ℓ} in the DGM is:
$$\begin{aligned} {\hat{z}}_{R}(\ell) &= \mathbb{1}{({\sigma_{\text{AdaReLU}}}(f(\ell)) \geq 0)} \\ {\hat{z}}_{P}(\ell) &= \mathbb{1}{(g(\ell) \geq 0)}\odot \arg\,\max_{r\times r} (f(\ell)) + \mathbb{1}{(g(\ell)<0)}\odot \arg\,\min_{r\times r} (f(\ell)) \label{eqn:mainlatentp}\end{aligned}$$

Proof

For part A, we have ĥ = arg max_hp(h|ŷ, ẑ) = arg max_hp(h|g(k)) = g(k). The second equality is obtained because g(k) is a deterministic function of ŷ and ẑ. The third equality is obtained because h ∼ 𝒩(g(k), diag(σ²)). For part B, please refer to Appendix.

Remark

Proposition 6.A show that ĥ is the output of the generative feedback in the CNN-F.
Proposition 6.B says that ẑ_R = 1 if the sign of the feedforward feature map matches with that of the feedback feature map. ẑ_P = 1 at locations that satisfy one of these two requirements: 1) the value in the feedback feature map is non-negative and it is the maximum value within the local pooling region or 2) the value in the feedback feature map is negative and it is the minimum value within the local pooling region. Using Proposition 6.B, we approximate {ẑ(ℓ)}_{ℓ = 1 : L} by greedily finding the MAP estimate of ẑ(ℓ) conditioning on all other layers.

Iterative inference and online update in CNN-F

We find self-consistent (ĥ, ŷ, ẑ) by iterative inference and online update (Algorithm 7). In the initialization step, image x is first encoded to h by k convolutional layers. Then h passes through a standard CNN, and latent variables are initialized with conventional σ_ReLU and σ_MaxPool. The feedback generative network then uses ŷ₀ and {ẑ₀(ℓ)}_{ℓ = k : L} to generate intermediate features {g₀(ℓ)}_{ℓ = k : L}, where the subscript denotes the number of iterations. In practice, we use logits instead of one-hot encoded label in the generative feedback to maintain uncertainty in each category. We use g₀(k) as the input features for the first iteration. Starting from this iteration, we use σ_AdaReLU and σ_AdaPool instead of σ_ReLU and and σ_MaxPool in the feedforward pathway to infer ẑ (Equation (3) and (4)). In practice, we find that instead of greedily replacing the input with generated features and starting a new inference iteration, online update eases the training and gives better robustness performance. The online update rule of CNN-F can be written as:
$$\begin{aligned} {\hat{h}}_{t+1} & \leftarrow {\hat{h}}_t + \eta (g_{t+1}(k) - {\hat{h}}_t) \label{eqn:upd_h} \\ f_{t+1}(\ell) & \leftarrow f_{t+1}(\ell) + \eta (g_t(\ell) - f_{t+1}(\ell)), \ell=k,\dots,L \label{eqn:upd_f}\end{aligned}$$
where η is the step size. Greedily replacement is a special case for the online update rule when η = 1.

Encode image x to h₀ with k convolutional layers Initialize {ẑ(ℓ)}_{ℓ = k : L} by σ_ReLU and σ_MaxPool in the standard CNN

Training the CNN-F

During training, we have three goals: 1) train a generative model to model the data distribution, 2) train a generative classifier and 3) enforce self-consistency in the model. We first approximate self-consistent (ĥ, ŷ, ẑ) and then update model parameters based on the losses listed in Table 8. All losses are computed for every iteration. Minimizing the reconstruction loss increases data likelihood given current estimates of label and latent variables log p(h|ŷ_t, ẑ_t) and enforces consistency between ĥ_t and h. Minimizing the cross-entropy loss helps with the classification goal. In addition to reconstruction loss at the input layer, we also add reconstruction loss between intermediate feedback and feedforward feature maps. These intermediate losses helps stabilizing the gradients when training an iterative model like the CNN-F.

Table 8 Training losses in the CNN-F.

	Form	Purpose
Cross-entropy loss	log p(y \| ĥ_t, ẑ_t; θ)	classification
Reconstruction loss	log p(h \| ŷ_t, ẑ_t; θ) = \|\|h − ĥ\|\|₂².	generation, self-consistency
Intermediate reconstruction loss	\|\|f₀(ℓ) − g_t(ℓ)\|\|₂²	stabilizing training

Experiment

Generative feedback promotes robustness

As a sanity check, we train a CNN-F model with two convolution layers and one fully-connected layer on clean Fashion-MNIST images. We expect that CNN-F reconstructs the perturbed inputs to their clean version and makes self-consistent predictions. To this end, we verify the hypothesis by evaluating adversarial robustness of CNN-F and visualizing the restored images over iterations.

Adversarial robustness

Since CNN-F is an iterative model, we consider two attack methods: attacking the first or last output from the feedforward streams. We use “first” and “e2e” (short for end-to-end) to refer to the above two attack approaches, respectively. Due to the approximation of non-differentiable activation operators and the depth of the unrolled CNN-F, end-to-end attack is weaker than first attack (Appendix). We report the adversarial accuracy against the stronger attack in Figure 5. We use the Fast Gradient Sign Attack Method (FGSM) (Goodfellow, Shlens, and Szegedy 2015) Projected Gradient Descent (PGD) method to attack. For PGD attack, we generate adversarial samples within L_∞-norm constraint, and denote the maximum L_∞-norm between adversarial images and clean images as ϵ.

Figure 5 (a, b) shows that the CNN-F improves adversarial robustness of a CNN on Fashion-MNIST without access to adversarial images during training. The error bar shows standard deviation of 5 runs. Figure 5 (c) shows that training a CNN-F with more iterations improves robustness. Figure 5 (d) shows that the predictions are corrected over iterations during testing time for a CNN-F trained with 5 iterations. Furthermore, we see larger improvements for higher ϵ. This indicates that recurrent feedback is crucial for recognizing challenging images.

Adversarial robustness of CNN-F with standard training on Fashion-MNIST. CNN-F-k stands for CNN-F trained with k iterations. a) Attack with FGSM. b) Attack with PGD using 40 steps. c) Train with different number of iterations. Attack with PGD-40. d) Evaluate a trained CNN-F-5 model with various number of iterations against PGD-40 attack. — **Figure 5.** **Adversarial robustness of CNN-F with standard training on Fashion-MNIST.** CNN-F-k stands for CNN-F trained with k iterations. a) Attack with FGSM. b) Attack with PGD using 40 steps. c) Train with different number of iterations. Attack with PGD-40. d) Evaluate a trained CNN-F-5 model with various number of iterations against PGD-40 attack.

Image restoration

Given that CNN-F models are robust to adversarial attacks, we examine the models’ mechanism for robustness by visualizing how the generative feedback moves a perturbed image over iterations. We select a validation image from Fashion-MNIST. Using the image’s two largest principal components, a two-dimensional hyperplane ⊂ ℝ^28 × 28 intersects the image with the image at the center. Vector arrows visualize the generative feedback’s movement on the hyperplane’s position. In Figure 6 (a), we find that generative feedback perturbs samples across decision boundaries toward the validation image. This demonstrates that the CNN-F’s generative feedback can restore perturbed images to their uncorrupted objects.

The generative feedback in CNN-F models restores perturbed images. a) The decision cell cross-sections for a CNN-F trained on Fashion-MNIST. Arrows visualize the feedback direction on the cross-section. b) Fashion-MNIST classification accuracy on PGD adversarial examples; Grad-CAM activations visualize the CNN-F model’s attention from incorrect (iter. 1) to correct predictions (iter. 2). c) Grad-CAM activations across different feedback iterations in the CNN-F. d) From left to right: clean images, corrupted images, and images restored by the CNN-F’s feedback. — **Figure 6.** **The generative feedback in CNN-F models restores perturbed images.** a) The decision cell cross-sections for a CNN-F trained on Fashion-MNIST. Arrows visualize the feedback direction on the cross-section. b) Fashion-MNIST classification accuracy on PGD adversarial examples; Grad-CAM activations visualize the CNN-F model’s attention from incorrect (iter. 1) to correct predictions (iter. 2). c) Grad-CAM activations across different feedback iterations in the CNN-F. d) From left to right: clean images, corrupted images, and images restored by the CNN-F’s feedback.

We further explore this principle with regard to adversarial examples. The CNN-F model can correct initially wrong predictions. Figure 6 (b) uses Grad-CAM activations to visualize the network’s attention from an incorrect prediction to a correct prediction on PGD-40 adversarial samples (Selvaraju et al. 2017). To correct predictions, the CNN-F model does not initially focus on specific features. Rather, it either identifies the entire object or the entire image. With generative feedback, the CNN-F begins to focus on specific features. This is reproduced in clean images as well as images corrupted by blurring and additive noise 6 (c). Furthermore, with these perceptible corruptions, the CNN-F model can reconstruct the clean image with generative feedback 6 (d). This demonstrates that the generative feedback is one mechanism that restores perturbed images.

Adversarial Training

Adversarial training is a well established method to improve adversarial robustness of a neural network (Madry et al. 2017). Adversarial training often solves a minimax optimization problem where the attacker aims to maximize the loss and the model parameters aims to minimize the loss. In this section, we show that CNN-F can be combined with adversarial training to further improve the adversarial robustness.

Training methods

Figure 7 illustrates the loss design we use for CNN-F adversarial training. Different from standard adversarial training on CNNs, we use cross-entropy loss on both clean images and adversarial images. In addition, we add reconstruction loss between generated features of adversarial samples from iterative feedback and the features of clean images in the first forward pass.

Experimental setup

We train the CNN-F on Fashion-MNIST and CIFAR-10 datasets respectively. For Fashion-MNIST, we train a network with 4 convolution layers and 3 fully-connected layers. We use 2 convolutional layers to encode the image into feature space and reconstruct to that feature space. For CIFAR-10, we use the WideResNet architecture (Zagoruyko and Komodakis 2016) with depth 40 and width 2. We reconstruct to the feature space after 5 basic blocks in the first network block. For more detailed hyper-parameter settings, please refer to Appendix. During training, we use PGD-7 to attack the first forward pass of CNN-F to obtain adversarial samples. During testing, we also perform SPSA (Uesato et al. 2018) and transfer attack in addition to PGD attack to prevent the gradient obfuscation (Athalye, Carlini, and Wagner 2018) issue when evaluating adversarial robustness of a model. In the transfer attack, we use the adversarial samples of the CNN to attack CNN-F.

Main results

CNN-F further improves the robustness of CNN when combined with adversarial training. Table 9 and Table 10 list the adversarial accuracy of CNN-F against several attack methods on Fashion-MNIST and CIFAR-10. On Fashion-MNIST, we train the CNN-F with 1 iterations. On CIFAR-10, we train the CNN-F with 2 iterations. We report two evaluation methods for CNN-F: taking the logits from the last iteration (last), or taking the average of logits from all the iterations (avg). We also report the lowest accuracy among all the attack methods with bold font to highlight the weak spot of each model. In general, we find that the CNN-F tends to be more robust to end-to-end attack compared with attacking the first forward pass. This corresponds to the scenario where the attacker does not have access to internal iterations of the CNN-F. Based on different attack scenarios, we can tune the hyper-paramters and choose whether averaging the logits or outputting the logits from the last iteration to get the best robustness performance (Appendix).

Table 9 Adversarial accuracy on Fashion-MNIST over 3 runs (ϵ = 0.1).

	Clean	PGD (first)	PGD (e2e)	SPSA (first)	SPSA (e2e)	Transfer	Min
CNN	89.97 ± 0.10	77.09 ± 0.19	77.09 ± 0.19	87.33 ± 1.14	87.33 ± 1.14	—	77.09 ± 0.19
CNN-F (last)	89.87 ± 0.14	79.19 ± 0.49	78.34 ± 0.29	87.10 ± 0.10	87.33 ± 0.89	82.76 ± 0.26	78.34 ± 0.29
CNN-F (avg)	89.77 ± 0.08	79.55 ± 0.15	79.89 ± 0.16	88.27 ± 0.91	88.23 ± 0.81	83.15 ± 0.17	79.55 ± 0.15

Table 10 Adversarial accuracy on CIFAR-10 over 3 runs (ϵ = 8/255)

	Clean	PGD (first)	PGD (e2e)	SPSA (first)	SPSA (e2e)	Transfer	Min
CNN	79.09 ± 0.11	42.31 ± 0.51	42.31 ± 0.51	66.61 ± 0.09	66.61 ± 0.09	—	42.31 ± 0.51
CNN-F (last)	78.68 ± 1.33	48.90 ± 1.30	49.35 ± 2.55	68.75 ± 1.90	51.46 ± 3.22	66.19 ± 1.37	48.90 ± 1.30
CNN-F (avg)	80.27 ± 0.69	48.72 ± 0.64	55.02 ± 1.91	71.56 ± 2.03	58.83 ± 3.72	67.09 ± 0.68	48.72 ± 0.64

Robust neural networks with latent variables

Latent variable models are a unifying theme in robust neural networks. The consciousness prior (Bengio 2019) postulates that natural representations—such as language—operate in a low-dimensional space, which may restrict expressivity but also may facilitate rapid learning. If adversarial attack introduce examples outside this low-dimensional manifold, latent variable models can map these samples back to the manifold. A related mechanism for robustness is state reification (Lamb et al. 2019). Similar to self-consistency, state reification models the distribution of hidden states over the training data. It then maps less likely states to more likely states. MagNet and Denoising Feature Matching introduce similar mechanisms: using autoencoders on the input space to detect adversarial examples and restore them in the input space (Meng and Chen 2017, Warde-Farley and Bengio 2017). Lastly, Defense-GAN proposes a generative adversarial network to approximate the data manifold (Samangouei, Kabkab, and Chellappa 2018). CNN-F generalizes these themes into a Bayesian framework. Intuitively, CNN-F can be viewed as an autoencoder. In contrast to standard autoencoders, CNN-F requires stronger constraints through Bayes rule. CNN-F—through self-consistency—constrains the generated image to satisfy the maximum a posteriori on the predicted output.

Computational models of human vision

Recurrent models and Bayesian inference have been two prevalent concepts in computational visual neuroscience. Recently, Kubilius et al. (2018) proposed CORnet as a more accurate model of human vision by modeling recurrent cortical pathways. Like CNN-F, they show CORnet has a larger V4 and IT neural similarity compared to a CNN with similar weights. Linsley et al. (2018) suggests hGRU as another recurrent model of vision. Distinct from other models, hGRU models lateral pathways in the visual cortex to global contextual information. While Bayesian inference is a candidate for visual perception, a Bayesian framework is absent in these models. The recursive cortical network (RCN) proposes a hierarchal conditional random field as a model for visual perception (George et al. 2017). In contrast to neural networks, RCN uses belief propagation for both training and inference. With the representational ability of neural networks, we propose CNN-F to approximate Bayesian inference with recurrent circuits in neural networks.

Feedback networks

Feedback Network (Zamir et al. 2017) uses convLSTM as building blocks and adds skip connections between different time steps. This architecture enables early prediction and enforces hierarchical structure in the label space. Nayebi et al. (2018) uses architecture search to design local recurrent cells and long range feedback to boost classification accuracy. Wen et al. (2018) designs a bi-directional recurrent neural network by recursively performing bottom up and top down computations. The model achieves more accurate and definitive image classification. In addition to standard image classification, neural networks with feedback have been applied to other settings. Wang, Yamaguchi, and Ordonez (2018) propose a feedback-based propagation approach that improves inference in CNN under partial evidence in the multi-label setting. Piekniewski et al. (2016) apply multi-layer perceptrons with lateral and feedback connections to visual object tracking.

Combining top-down and bottom-up signals in RNNs

Mittal et al. (2020) proposes combining attention and modularity mechanisms to route bottom-up (feedforward) and top-down (feedback) signals. They extend the Recurrent Independent Mechanisms (RIMs) (Goyal et al. 2019) framework to a bidirectional structure such that each layer of the hierarchy can send information in both bottom-up direction and top-down direction. Our approach uses approximate Bayesian inference to provide top-down communication, which is more consistent with the Bayesian brain framework and predictive coding.

Inference in generative classifiers

Sulam et al. (2019) derives a generative classifier using a sparse prior on the layer-wise representations. The inference is solved by a multi-layer basis pursuit algorithm, which can be implemented via recurrent convolutional neural networks. Nimmagadda and Anandkumar (2015) propose to learn a latent tree model in the last layer for multi-object classification. A tree model allows for one-shot inference in contrast to iterative inference.

Target propagation

The generative feedback in CNN-F shares a similar form as target propagation, where the targets at each layer are propagated backwards. In addition, difference target propagation uses auto-encoder like losses at intermediate layers to promote network invertibility (Meulemans et al. 2020, Lee et al. 2015). In the CNN-F, the intermediate reconstruction loss between adversarial and clean feature maps during adversarial training promotes the feedback to project perturbed image back to its clean version in all resolution scales.

Conclusion

Inspired by the recent studies in Bayesian brain hypothesis, we propose to introduce recurrent generative feedback to neural networks. We instantiate the framework on CNN and term the model as CNN-F. In the experiments, we demonstrate that the proposed feedback mechanism can considerably improve the adversarial robustness compared to conventional feedforward CNNs. We visualize the dynamical behavior of CNN-F and show its capability of restoring corrupted images. Our study shows that the generative feedback in CNN-F presents a biologically inspired architectural design that encodes inductive biases to benefit network robustness.

Broader Impacts

Convolutional neural networks (CNNs) can achieve superhuman performance on image classification tasks. This advantage allows their deployment to computer vision applications such as medical imaging, security, and autonomous driving. However, CNNs trained on natural images tend to overfit to image textures. Such flaw can cause a CNN to fail against adversarial attacks and on distorted images. This may further lead to unreliable predictions potentially causing false medical diagnoses, traffic accidents, and false identification of criminal suspects. To address the robustness issues in CNNs, CNN-F adopts an architectural design which resembles human vision mechanisms in certain aspects. The deployment of CNN-F renders more robust AI systems.

Despite the improved robustness, current method does not tackle other social and ethical issues intrinsic to a CNN. A CNN can imitate human biases in the image datasets. In automated surveillance, biased training datasets can improperly calibrate CNN-F systems to make incorrect decisions based on race, gender, and age. Furthermore, while robust, human-like computer vision systems can provide a net positive societal impact, there exists potential use cases with nefarious, unethical purposes. More human-like computer vision algorithms, for example, could circumvent human verification software. Motivated by these limitations, we encourage research into human bias in machine learning and security in computer vision algorithms. We also recommend researchers and policymakers examine how people abuse CNN models and mitigate their exploitation.

Acknowledgements

We thank Chaowei Xiao, Haotao Wang, Jean Kossaifi, Francisco Luongo for the valuable feedback. Y. Huang is supported by DARPA LwLL grants. J. Gornet is supported by supported by the NIH Predoctoral Training in Quantitative Neuroscience 1T32NS105595-01A1. D. Y. Tsao is supported by Howard Hughes Medical Institute and Tianqiao and Chrissy Chen Institute for Neuroscience. A. Anandkumar is supported in part by Bren endowed chair, DARPA LwLL grants, Tianqiao and Chrissy Chen Institute for Neuroscience, Microsoft, Google, and Adobe faculty fellowships.

Athalye, Anish, Nicholas Carlini, and David Wagner. 2018. “Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples.” In ICLR.

Bengio, Yoshua. 2019. “The Consciousness Prior.” arXiv:1709.08568.

Dodge, S., and L. Karam. 2017. “A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions.” In ICCCN.

Eickenberg, Michael, Alexandre Gramfort, Gaël Varoquaux, and Bertrand Thirion. 2017. “Seeing It All: Convolutional Network Layers Map the Function of the Human Visual System.” NeuroImage. Elsevier.

Elsayed, Gamaleldin, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alexey Kurakin, Ian Goodfellow, and Jascha Sohl-Dickstein. 2018. “Adversarial Examples That Fool Both Computer Vision and Time-Limited Humans.” In NeurIPS.

Felleman, D. J., and D. C. Van Essen. 1991. “Distributed Hierarchical Processing in the Primate Cerebral Cortex.” Cerebral Cortex.

George, Dileep, Wolfgang Lehrach, Ken Kansky, Miguel Lázaro-Gredilla, Christopher Laan, Bhaskara Marthi, Xinghua Lou, et al. 2017. “A Generative Vision Model That Trains with High Data Efficiency and Breaks Text-Based CAPTCHAs.” Science.

Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. 2015. “Explaining and Harnessing Adversarial Examples.” In ICLR.

Goyal, Anirudh, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Schölkopf. 2019. “Recurrent Independent Mechanisms.” arXiv:1909.10893.

Horikawa, Tomoyasu, and Yukiyasu Kamitani. 2017. “Hierarchical Neural Representation of Dreamed Objects Revealed by Brain Decoding with Deep Neural Network Features.” Front Comput Neurosci. Frontiers.

Kar, Kohitij, Jonas Kubilius, Kailyn Schmidt, Elias B Issa, and James J DiCarlo. 2019. “Evidence That Recurrent Circuits Are Critical to the Ventral Stream’s Execution of Core Object Recognition Behavior.” Nature Neuroscience.

Kietzmann, Tim C, Courtney J Spoerer, Lynn KA Sörensen, Radoslaw M Cichy, Olaf Hauk, and Nikolaus Kriegeskorte. 2019. “Recurrence Is Required to Capture the Representational Dynamics of the Human Visual System.” PNAS. National Acad Sciences.

Knill, David C, and Whitman Richards. 1996. Perception as Bayesian Inference. Cambridge University Press.

Kok, Peter, Janneke FM Jehee, and Floris P De Lange. 2012. “Less Is More: Expectation Sharpens Representations in the Primary Visual Cortex.” Neuron. Elsevier.

Kubilius, Jonas, Martin Schrimpf, Aran Nayebi, Daniel Bear, Daniel LK Yamins, and James J DiCarlo. 2018. “CORnet: Modeling the Neural Mechanisms of Core Object Recognition.” bioRxiv Preprint. Cold Spring Harbor Laboratory.

Lamb, Alex, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Denis Kazakov, Yoshua Bengio, and Michael C. Mozer. 2019. “State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations.” In ICML.

Lee, Dong-Hyun, Saizheng Zhang, Asja Fischer, and Yoshua Bengio. 2015. “Difference Target Propagation.” In ECML-Pkdd.

Linsley, Drew, Junkyung Kim, Vijay Veerabadran, Charles Windolf, and Thomas Serre. 2018. “Learning Long-Range Spatial Dependencies with Horizontal Gated Recurrent Units.” In NeurIPS.

Madry, Aleksander, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. “Towards Deep Learning Models Resistant to Adversarial Attacks.” arXiv:1706.06083.

Meng, Dongyu, and Hao Chen. 2017. “MagNet: A Two-Pronged Defense Against Adversarial Examples.” In CCS.

Meulemans, Alexander, Francesco S Carzaniga, Johan AK Suykens, João Sacramento, and Benjamin F Grewe. 2020. “A Theoretical Framework for Target Propagation.” arXiv:2006.14331.

Mittal, Sarthak, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, and Yoshua Bengio. 2020. “Learning to Combine Top-down and Bottom-up Signals in Recurrent Neural Networks with Attention over Modules.” In ICML.

Nayebi, Aran, Daniel Bear, Jonas Kubilius, Kohitij Kar, Surya Ganguli, David Sussillo, James J DiCarlo, and Daniel L Yamins. 2018. “Task-Driven Convolutional Recurrent Models of the Visual System.” In NeurIPS.

Nguyen, Tan, Nhat Ho, Ankit Patel, Anima Anandkumar, Michael I. Jordan, and Richard G. Baraniuk. 2018. “A Bayesian Perspective of Convolutional Neural Networks Through a Deconvolutional Generative Model.” arXiv:1811.02657.

Nimmagadda, Tejaswi, and Anima Anandkumar. 2015. “Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models.” arXiv:1505.00308.

Piekniewski, Filip, Patryk Laurent, Csaba Petre, Micah Richert, Dimitry Fisher, and Todd Hylton. 2016. “Unsupervised Learning from Continuous Video in a Scalable Predictive Recurrent Network.” arXiv:1607.06854.

Rao, Rajesh P. N., and Dana H. Ballard. 1999. “Predictive Coding in the Visual Cortex: A Functional Interpretation of Some Extra-Classical Receptive-Field Effects.” Nature Neuroscience.

Samangouei, Pouya, Maya Kabkab, and Rama Chellappa. 2018. “Defense-Gan: Protecting Classifiers Against Adversarial Attacks Using Generative Models.” In ICLR.

Selvaraju, Ramprasaath R, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.” In ICCV.

Sulam, Jeremias, Aviad Aberdam, Amir Beck, and Michael Elad. 2019. “On Multi-Layer Basis Pursuit, Efficient Algorithms and Convolutional Neural Networks.” IEEE Trans. PAMI.

Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2016. “Rethinking the Inception Archi-Tecture for Computer Vision.” In CVPR.

Szegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. “Intriguing Properties of Neural Networks.” In ICLR.

Uesato, Jonathan, Brendan O’Donoghue, Aaron van den Oord, and Pushmeet Kohli. 2018. “Adversarial Risk and the Dangers of Evaluating Against Weak Attacks.” In ICML.

Ulyanov, Dmitry, Andrea Vedaldi, and Victor Lempitsky. 2016. “Instance Normalization: The Missing Ingredient for Fast Stylization.” arXiv:1607.08022.

Wang, Tianlu, Kota Yamaguchi, and Vicente Ordonez. 2018. “Feedback-Prop: Convolutional Neural Network Inference Under Partial Evidence.” In CVPR.

Warde-Farley, David, and Yoshua Bengio. 2017. “Improving Generative Adversarial Networks with Denoising Feature Matching.” In ICLR.

Wen, Haiguang, Kuan Han, Junxing Shi, Yizhen Zhang, Eugenio Culurciello, and Zhongming Liu. 2018. “Deep Predictive Coding Network for Object Recognition.” In ICML.

Zagoruyko, Sergey, and Nikos Komodakis. 2016. “Wide Residual Networks.” arXiv:1605.07146.

Zamir, Amir R., Te-Lin Wu, Lin Sun, William B. Shen, Bertram E. Shi, Jitendra Malik, and Silvio Savarese. 2017. “Feedback Networks.” In CVPR.

σ takes the form of σ_AdaPool or σ_AdaReLU.↩

Neural Networks with Recurrent Generative Feedback

Yujia Huang¹, James Gornet¹, Sihui Dai¹, Zhiding Yu², Tan Nguyen³, Doris Y. Tsao¹, Anima Anandkumar^1, 2

¹California Institute of Technology
²NVIDIA
³Rice University

Abstract

Introduction

Approach

Generative feedback and Self-consistency

Generative Feedback in CNN-F

Recurrence in CNN-F

Feedforward and feedback pathway in CNN-F

MAP inference in the DGM

Assumption 2

Theorem 5

Proposition 6

Proof

Remark

Iterative inference and online update in CNN-F

Training the CNN-F

Experiment

Generative feedback promotes robustness

Adversarial robustness

Image restoration

Adversarial Training

Training methods

Experimental setup

Main results

Robust neural networks with latent variables

Computational models of human vision

Feedback networks

Combining top-down and bottom-up signals in RNNs

Inference in generative classifiers

Target propagation

Conclusion

Broader Impacts

Acknowledgements

Neural Networks with Recurrent Generative Feedback

Yujia Huang1, James Gornet1, Sihui Dai1, Zhiding Yu2, Tan Nguyen3, Doris Y. Tsao1, Anima Anandkumar1, 2

1California Institute of Technology 2NVIDIA 3Rice University

Abstract

Introduction

Approach

Generative feedback and Self-consistency

Generative Feedback in CNN-F

Recurrence in CNN-F

Feedforward and feedback pathway in CNN-F

MAP inference in the DGM

Assumption 2

Theorem 5

Proposition 6

Proof

Remark

Iterative inference and online update in CNN-F

Training the CNN-F

Experiment

Generative feedback promotes robustness

Adversarial robustness

Image restoration

Adversarial Training

Training methods

Experimental setup

Main results

Related work

Robust neural networks with latent variables

Computational models of human vision

Feedback networks

Combining top-down and bottom-up signals in RNNs

Inference in generative classifiers

Target propagation

Conclusion

Broader Impacts

Acknowledgements

Yujia Huang¹, James Gornet¹, Sihui Dai¹, Zhiding Yu², Tan Nguyen³, Doris Y. Tsao¹, Anima Anandkumar^1, 2

¹California Institute of Technology
²NVIDIA
³Rice University