Sunday, June 2, 2019

More on machine learning and materials

In the last post, I looked at a paper in which machine learning had been used to predict properties of doped graphene. One of my thoughts on this was that the study seemed unsatisfying because it only concluded that the neural network could be trained to make the prediction, but there was no attempt to figure out how it made the prediction - even though that might have told us something interesting both about the network and about doped graphene.

Oddly enough the paper contained references to an interesting study where researchers have done exactly that, albeit for a very different problem. This paper by Ziletti et al, published in Nature Communications in 2018, considers the problem of finding a method to classify crystal structures that is robust and not dependent on a myriad of hand-tuned thresholds and parameters. Along the way they adapt a method for probing the internal workings of the neural network to their own application.

Admittedly, crystal structure classification doesn't sound like the most exciting problem in the world, but within materials science and condensed matter physics it is very important. A lot of materials are crystalline, i.e. they consist of periodically repeating arrangements of atoms. Knowing what these periodically repeating arrangements look like and in what ways they are symmetric is important for understanding, investigating and modelling the material - and often for figuring out what it can be useful for or how it can be improved. It is also a rather tedious process with a lot of potential for error due to noisy measurements and the fact that real-world materials are not perfect crystals, but will always contain defects of various kinds. In the study, the aim is to develop a robust classification method that can handle the presence of defects without misclassifying the structures.

The first step is to decide what sort of input data to use. This is more complex than it seems, since just using atomic positions might make the classifier inherently sensitive to defects. Instead, the researchers have chosen to use simulated diffraction patterns, which condense the information about atom placements and inter-atomic distances to a number of bright spots. (If you recall being shown diffraction in some high-school physics class, this is the same thing only with periodic atomic structures instead of the slits and using electromagnetic radiation with much shorter wavelenght). The diffraction patterns are fed into a convolutional neural networks with multiple layers, which extracts features from the patterns and then classifies the patterns based on these features. Tests of the networks show good performance, even when the data was noisy or the structures contained a high number of defects.

Now for the interesting part. As described in the previous post, feature extraction in a convolutional neural network can be likened to a process where small sections of an image are compared to a smaller image, and a positive response is given if they match. The output of the first comparison is then used in another comparison that extracts more complicated features, and so on. Training of the neural network amounts to adjusting the smaller images, or filters, to respond to features of the image that enable the network to make the correct classification. If picking out straight lines enable correct classification, at least some of the filters will end up responding to straight lines. If curves are important, some of the filters will respond to curves.

This also means that when the neural network has been trained and an image is fed into it, at some deep level in the neural network there will be a vector representing the features that are present in the image and that the neural network has been trained to extract and classify. This vector could tell us exactly what information the network is using when classifying a particular image, but due to the complexity of the preceding layers of the network it is hard to interpret. It is, however, possible to start from this representation of the extracted features and essentially go through all the layers of the network in reverse, finally arriving at a generated picture that shows just the features picked out by the network in a way that can easily be recognized by humans (these pictures are also known as attentive response maps). Using this method, the researchers found that the neural network had in fact learned to use many of the characteristics that humans use when classifying crystal structures, such as distances between atomic planes.

So why is this interesting? For one thing, it demonstrates a method of checking if the classification performed by the network is based on something we would consider significant, or if it has learned to classify based on something obviously irrelevant - say, some kind of noise that is more common in some types of images than others. It also suggests that we could use neural networks not just to make predictions or classify data points, but also to understand the differences between the data points better. It is after all entirely possible that the networks could extract some feature that we do not realize the importance of yet. Personally, I think this is the way to use machine learning in physics - not just looking for the how, but also the why.

Finally, I should mention that the method in itself is adapted from a 2018 paper on classifying X-ray images of body parts, which in turn references a much earlier paper on understanding how convolutional networks classify more ordinary images. It is perhaps telling that it was picked up in the medical field, since knowing that neural networks classify based on the right information could be vital there.

What do we need 5G for?

During the spring of 2020 the fifth generation of cell phone systems, 5G, was launched in Sweden where I happen to live. Cell phone operator...