Network Dissection

Beware the Chimera

In Greek mythology, the Chimera was a monster with the body of a lion, the head of a goat, and the snake of a tail: an beast made of the parts of many other animals. Today the term is still used to represent a crazy cross, but it is also used to describe a highly desired fantasy that is a figment of your imagination.

As we scrutinize internals of deep neural networks hunting for meaning, we must beware the Chimera. In other words, we need to make sure that the interpretable phenomena we see are faithful reflections of the operation of the network, and not illusiory combinations created by our limited imagination.

Network Dissection

Discerning chimeras has been one of my concerns as I have worked in Antonio Torralba's lab, together with Bolei Zhou and Aditya Khosla, on understanding internal neuron interpretaitons. My work is motivated by the amazing emergent structure discovered by Bolei and Aditya in 2014. They found that when you train a deep network to solve a whole-image scene classifcation problem (deciding whether an image is in one of 205 place categories), hidden internal units emerge that that identify and localize the presence of specific types of objects such as dogs, cars, houses, and boats, that were not labeled in the training set. Since object segmentation is a difficult problem, it is amazing to see the problem being solved without any specific training for that problem.

Examples of emergent detectors can be seen below. Each block of images represents the areas of highest activation of single convolutional units inside a deep network.

Network Dissection samples

The emergence of these common-sense concepts is striking. When you look inside smaller networks (such as the MNIST networks I examined last summer) you do not find this effect.

Nevertheless, I am still left with the questions, "are our interpretations of these units a chimera"? Is the network truly discovering the visual concept of a "dog" or an "airplane", or is the network discovering a more general way of organizing the images, and when we peer into an aspect of that organization, we impose our own concept of object classes on it.

Isotropic Phenomena

How can we tell the difference between a chimera and a true interpretatation? One way was proposed by Szegedy in 2013: if random combinations of units are just as interpretable as individual units, then

more ...

Neuron Specialization

Video by David Bau; music by David Szesztay used under creative commons license.

Why does a deep network need to have so many neurons to work?

Intuitively, the hidden units must be needed to represent different subfeatures that are necessary for recognizing the target classes. For example, there are two major ways of writing a handwritten 7: most people write just a single horizontal stroke attached to a single diagonal stroke. However, some people add a cross, striking a second horizontal stroke through the diagonal.

A good reader of handwritten digits must recognize both forms, but since they look somewhat different, it would make sense to recognize them separately. The same situation is true for the digit "2" written with or without an open loop in the bottom-left corner, or other variations.

In LeNet-5, is there a neuron devoted to recognizing the seven-with-stroke?

To determine this, I think it is necessary to:

  1. Use visualizations to determine which neurons might have that role.
  2. Show that by removing that neuron, the network loses the knowledge.

My previous attempts to ablate all knowledge of entire classes by ablating sets of hidden units (driven by the visualizations) have not been very successful. However, I am still trying to make things work In the movie above, I show both the gradient examples (as also shown in the last blog entry) adjacent to neuron activations (shown in the row underneath each gradient visualization).

In most cases, strong gradients for a class correspond to average strong activations for a class.

However, there are a few anomolies.

For example, notice neuron #58 in the linear-0 layer: during training it is very clear that this neuron is getting positive gradients for "7" digits (as well as "4" digits, and negative on 9). However, the average activation for 7 is negative.

The image of the strong positive gradients during training shows many 7 digits with strokes through the middle. So neuron 58 is a natural one to look at to see if it is actually a stroked-7 detector. Is it?

Currently my visualizations are not interactive, and it requires programming to ask any question. I should build a tool that allows questions to be asked and answered interactively.

But here is a visualization of neuron #58 where we have rectified the activation, i.e., rather than applying a negative weight when the neuron does not fire, we zero non-firing contributions. As you can see, the average "7" which activates this neuron does indeed have a crossbar. This differs from most other neurons that fire on 7.

unit linear-0-58 showing a crossed 7

The phenomenon of a neuron on which the gradient is strong but the activation is not strong seem to be interesting. Here is another such neuron, unit 28 of the linear_0 layer. This one shows yet another way to draw a 7, which is with a vertical hook overhanging the left end of the horizontal bar.

unit linear-0-28 showing a crossed 7

cannot discern any particular kind of specialization. For example, unit 8 in linear-0 learns with very strong gradient for 8, but it activates only weakly for 8, activating much more strongly for 1. However, I cannot see anything special about the set of 8 instances that activate it: perhaps this neuron specializes in something negative, for example, its ability to fire on 8 while not firing on any other digit other than 1.

unit linear-0-8 showing a low-activation 8

more ...

The Purpose of a Neuron

Video by David Bau; music by David Szesztay used under creative commons license.

One of the great scientific debates of the 19th century was whether the structure of the brain could be decomposed. Reticulists such as Golgi believed the brain was monolithic, a single complex body of protoplasm, crisscrossed by dendrites and axons, but fundamentally indivisible. Neuronists, led by Santiago Ramón y Cajal, believed the brain to be composed of a collection of separate individual cells that communicated with each other.

Of course, Cajal was proven correct: brains are made up of separate neurons, and his neuron doctrine became the driving model for neuroscientists. However, the conceptual debate at the heart of the matter is still not completely settled, even today. Although there is no question that a brain, or an artificial neural network, is physically decomposable into separate neurons, there is still a question of whether this physical decomposition corresponds to a decomposition of high-level thoughts, concepts, and knowledge.

Grandmother Cells

Teaching at MIT in 1969, pioneering cognitive scientist Jerry Lettvin would tell a humorous story imagining removing a specific high-level concept from a brain by removing a selection of neurons. He imagined how remarkable it might be that there could be cells representing high-level concepts such as "mother" and "grandmother". Surprisingly, some evidence has emerged that grandmother cells might exist. For example, in 2005, Quian Quiroga discovered individual cells in the medial temporal lobe of an individual that respond to a specific concept such as "Halle Berry", firing for different views of the same person, even when dressed up as Catwoman, and even firing in response to the written text "HALLE BERRY". This seemed like compelling evidence for the presence of "Grandmother neurons:" individual neurons that strongly localize a high-level concept.

However, Quiroga and others have vociferously argued that the simplistic model of a localized representation need not be implied by these experimental results: he argues that a distributed code is potentially much more efficient (for example, 10 bits in a localized code can represent 10 concepts, while 10 bits in a distributed code could represent 2^10 = 1024 concepts), and given the large numbers of untested neurons and concepts, it would be difficult to distinguish a localized code from a distributed code on the basis of his experiments.

So the debate today is whether individual neurons localize meaning, or whether meaning is an emergent property of many neurons working together in a distributed code.

Localizing Meaning Intentionally

Regardless of whether neurons in biological brains do localize meaning, any engineer would agree that it would be useful for neurons to localize meaning. A well-engineered system is modular, which means that it can be split on meaningful boundaries. A monolithic system is "spaghetti code," hard to understand and hard to debug. So when we create neural networks, we should strive for localized meaning and clean boundaries. Whether this is possible is an open question.

I think the science of modular neural networks comes down to three steps:

  1. Find a way to measure the purpose of a neuron, localized or not.
  2. Prove that we can identify and use localized knowledge by using ablation or transplant.
  3. Create ways of intentionally localizing important knowledge, so it can be modularized.

On step 1, I have been helping Aditya Khosla on work that localizes interpretable concepts that emerge within neural networks. The work follows in the tradition of the neuroscientists: to discern the meaning of a neuron, Khosla exposes the network to various input, and then identifies which stimulus causes the neuron to fire the strongest. We are finding some interesting properties of networks …

more ...

Norvig on Debugging with Machine Learining

At an EmTech digital talk last week, Peter Norvig spoke about the challenges presented by using machine learning. He talked about understandability, testability, and debugging of machine learned systems. The points he makes are the same that drive my research. For example:

  • Traditional software is modular, which means that you can decompose it and understand it. Each module has inputs and outputs that can be defined and isolated.
  • Machine-learned systems appear to be monolitic, which means that it seems like everything depends on everything else, and changing any one thing changes everything else.

In the machine learning world, we can identify mistakes. We can also retrain a network from scratch to try to fix a mistake. However, we do not know how to make small local changes, fixing a small bug without changing everything at once, which is the everyday practice that defines bugfixing in traditional programming. In machine learning, fixing a bug means restarting and rebuilding the whole system.

Norvig points out that this "rebuild the whole thing" approach impedes understanding, quality assurance, and stability of behavior, and he concludes by saying that we need and entirely new toolset for dealing with programming with machine learning. The talk is worth a watch.

Starting in on a New Toolset

My goal is to develop tools that attack these problem. For example:

  • I am developing a way to localize and explain knowledge within a deep neural network. In particular, I am trying to get to the bottom of "why" a specific neuron appears in a neural network: not only what it does, but what it is for.

  • I am developing a way of altering neural networks in small ways, without destroying all the other behavior. In particular, I am looking for ways of transplanting portions of networks from one instance to another. I am also interested in targeted ablations.

There are plenty of things to try here - it is a very interesting area. One theme of my current work is to see if we can break through the feeling that neural networks are monolithic. If you make small changes in the wrong way, then it does feel like everything can be destroyed very easily; but it is also possible to make small changes in the right way, which does not perturb behavior too much. Similarly, individual neurons do seem to have semantic roles, and I am working on getting a clearer picture of these roles in a robust way.

Modularity does not need to be in oppositition to neural networks; there are hints that neural networks already have some emergent modularity. We just need to find ways to measure it, maximize it, and exploit it.

more ...

Seeing Dandelions

I spent the morning with my son Cody digging dandelions out of our lawn. Some dandelions are easy to spot since they are sprouting yellow flowers, but other dandelions are harder to see. Cody helped me on the hunt. Also, since dandelions spread out, when you dig them out, you need to also look carefully to find the center of the plant to locate the taproot. We filled half of a trash bag full of dandelions.

My brain doing an odd thing now that I am resting. Whenever I close my eyes, I see dandelion plants, a different plant each time I blink. My brain is imagining symmetric top-down images of these plants, with white and red veins radiating from center, festooned with spiked green leaves. This is not how I saw the plants in the garden: I spotted them from far away in diagonal and side-view. But I am now replaying images of dandelions in a abstract symmetrical circular perfection.

What is my brain doing? Neuroscientists have long suspected that memories "consolidate" during rest. But recently, a Nature Neuroscience paper by H Freyja Ólafsdóttir actually measured this! The UCL scientists measured rat brains during 30 minutes of running on a track, and then during 90 minutes of rest afterwards. During rest, the rat brains reproduced the signals of their experience running on the track, but replaying the experience 10-20 times faster than realtime.

It is interesting to contemplate why replay is necessary, or what might be in the transformation between initial experience and replay. For example, when training artificial neural networks, programmers use "training set augmentation", applying transformations on the training set data without altering labels, in order to generate a larger training set. Is this what consolidation is all about?

Or it is the opposite? My brain seems to be imagining canonicalized dandelions, centered in the field of view with perspective skew removed. Is it canonicalizing the view for better compression? Should we be trying to do the same thing in artificial neural networks?

more ...

Choosing a Research Path

I am approaching the end of my first year of my EECS graduate program at MIT, and although a PhD typically stretches for years, I feel how acutely limited and precious this time is. It is a unique moment of academic freedom. What really interests me?

Creating a Programmable World

The problem that motivates me most is the problem of programmability. How can we put people in charge of our computational systems, instead of the other way around? I am very proud of my work in helping to implement programmable web standards, democratizing the internet by making web browsers radically easy to program. I am also proud of my work helping to teach young people to program. When I came to MIT, I had planned to continue work on programming tools geared toward making programming understandable to everyday users and beginners.

But in doing my required coursework, I was struck by a seductive new problem.

We all know that deep neural networks are making remarkable strides. For the first time, simple optimization techniques are automatically creating complexity that is worthy of being called artificial intelligence. We are training neural networks with dozens of layers and hundreds of millions of learned parameters, the equivalent of thousands of lines of automatically-generated code. And yet we do not know, really, how they work, why they fail when they do, or how to intentionally create beavhior within them.

Meanwhile, within the deep network community, understandable AI is generally perceived as an ineffective approach: people can only comprehend a few things at once, but neural networks are capable balancing thousands of signals at every neuron - and this capability seems essential to their function. For example, biasing a network towards sparse connections to aid understandability appears to penalize performance. So practitioners generally believe it is best to set aside human comprehension and let the algorithms optimize freely.

For my whole career, I have built tools to make it easier to for people to program, debug, and control increasingly complex software systems. So this is a disquieting moment. Whereas previously all my work has made human programmers progressively more capable, more aware, and more expressive, for the first time, the best way to program a computer is to take the human out of the loop. For the first time, it is best to let the computers devise their own algorithms.

Can Deep Neural Networks be Designed?

But does stocastic gradient descent really mean the end of human software engineering?

I think there are reasons to believe that opening the black box of deep neural networks is still worthwhile, even just measured with the metric of pursuing performance. Here are a few reasons.

  1. There has been significant success with transfer learning, where models trained on large data sets have been retrained on different problems with good success. Oquab et al found that hidden layers of a network pretrained on object recogntition were able to be reused and achieve state-of-the-art performance in other contexts.
  2. When examining the activations of individual hidden units, some interpretable meaning seems to emerge. Zhou and Khosla et al observe that object detectors emerge on a network trained only to classify places.
  3. There are relatively simple problems that are not amenable to direct learning by a deep network via stochastic gradient descent, but that are easy to learn when an intermediate goal is learned first. Gulcehre and Bengio demonstrate this on a problem of identifying images of pentominos in which all the pentominos in are of the same type.
  4. Biological brains seem to do a good job at creating single neurons that represent well-factored single …
more ...