LLM Prompts for Learning About Hopfield Networks

John Hopfield and Geoffrey Hinton won the 2024 Nobel Prize for physics. There has been a lot of debate about whether their work counts as physics. I’m biased: I think it does. WHY MACHINES LEARN has an entire chapter describing the physics rationale for Hopfield networks. The chapter is titled: “With A Little Help from Physics”!

This blog post is about using a large language model (I used Claude 3.5 Sonnet) to generate code and help us learn about the inner workings of Hopfield networks. I’m only showing the prompts and the outputs of running the code here. Your mileage will vary depending on the LLM/Code Assistant you use, but this should give you a good idea of how to go about coding a Hopfield network. Here’s a brief intro (more details of how they work, i.e. the algorithm, can be found in Ch8 of WHY MACHINES LEARN):

Training a Hopfield network, which is akin to storing some information in the network, is a simple procedure. First, the size of the network is dictated by the number of bits of information you want to store. Let’s say that you want to store a 28x28 image. If you turn the image into a vector, it’ll be 784 bits long. So, you need 784 bidirectional neurons in your Hopfield network.

Of course, you don’t really need neurons: you just need the 784x784 matrix that gives you the weights of the connections between the neurons. In a Hopfield network, any given neuron doesn’t connect to itself. So, the diagonal of this weight matrix will be zero. Also, the weights are symmetric, meaning the weight for a connection going from the output of neuron i to the input of neuron j is the same as the weight of the connection going from the output of neuron j to the input of neuron i. This was one of Hopfield’s key insights! Weights are either +1 or -1.

The essence of a Hopfield network is that once you store some information, such that the outputs of the neurons taken together make up the memory, then that state is designed to be an energy minimum. If you now corrupt the memory, by setting the outputs of neurons to other values, then that raises the energy of the network. Its dynamics are such that the network will find its way to an energy minimum; if you read off the neurons now, you’d have retrieved the memory.

Here’s a series of prompts that begin with MNIST images, which are wonderful images to store and restore:

(Yann LeCun (Courant Institute, NYU) and Corinna Cortes (Google Labs, New York) hold the copyright of MNIST dataset, which is a derivative work from original NIST datasets. MNIST dataset is made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license).

Prompt 1: Please write code to load the MNIST dataset and turn each image in that dataset into an image where each pixel is either 0 or 1, depending on whether the grayscale value in the original image is less than or equal to 120 or greater than 120, respectively

CLAUDE:

This code does the following:

  1. Imports necessary libraries: TensorFlow for loading the MNIST dataset, and NumPy for array operations.

  2. Loads the MNIST dataset using tf.keras.datasets.mnist.load_data().

  3. Defines a binarize_image function that takes an image and a threshold (default 120) and returns a binary version of the image.

  4. Applies the binarization to both the training and test sets using list comprehension.

  5. Prints the shapes of the original and binarized datasets to verify the operation.

  6. Includes an optional visualization step to display an original image and its binarized version side by side.

I noticed that the code was simply showing me the first image in the list. I wanted to select a digit based on its label. This needed additional prompting.

Prompt 2: These images have associated labels. Could you also provide a way to select an appropriate digit, such as 8 or 5, and plot only that?

If you look at the two rows of images shown above, it’s clear that the five images from MNIST and the five binarized images are different. It required further prompting to get the code to use the same set of five images.

Prompt 3: You are selecting different random indices for the original and binarized images. Make sure that the random indices are the same.

It’s clear that the code is working. The two sets of images shown above are of the same digits, one from MNIST and the other binarized. Now, I want to turn the binarized image with 0 and 1 for pixels, into images that have -1 and +1. I guess I could done it at one shot, but well…

Prompt 4:

Okay, now, create function to map the binarized images into images where 0 is -1 and 1 is 1.

Note: It’s worth looking at Claude’s response, because it “gets” the reason for the prompt!

CLAUDE:

Certainly! I'll create a function to map the binarized images from {0, 1} to {-1, 1}. This kind of mapping is often useful in machine learning applications, particularly when working with certain types of neural networks or algorithms that benefit from centered data.

Let's update our code to include this new functionality:

[Stuff here from Claude about the details of the coding changes]

This updated version allows you to visualize the original image, its binarized form, and the version mapped to {-1, 1} all at once. This can be particularly useful for understanding how the data is transformed at each step and how these transformations might affect downstream machine learning tasks.

Prompt 5: Now write code for a Hopfield Network that can store a mapped image. We should then be able to corrupt the mapped image using Gaussian noise, plot the corrupted image, and then give the corrupted image to the trained Hopfield Network, and have it retrieve the uncorrupted image. Plot what is retrieved.

This code generated an error. Instead of debugging it myself, I showed Claude the error and asked it to fix it.

Prompt 6: The code produces this error: Traceback (most recent call last):   File "/Volumes/mydisc/Dropbox/BACKUP/Why Machines Learn/ChaptersNew/python/code/ai-assist/hopfield/test5.py", line 51, in <module>     (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data ^^ NameError: name 'tf' is not defined

The code worked! You can see the stored image (above left), the corrupted image in the center, and the retrieved image (above right). I now wanted the corrupted image to have more noise.

Prompt 7: Make the Gaussian noise stronger

You can see that the image in the center (above) is noisier. Yet, the Hopfield network retrieves the stored image. The next thing I did was to use a series of prompts to first generate an interactive user interface and then ask Claude to turn that code to be run inside a Jupyter notebook. Here are the prompts:

Prompt 8: Provide a button on the user interface to do the same process for another digit, and replace the plots for the new digit

Prompt 9: Can you only use matplotlib; don't use Flask or React and all that jazz.

Prompt 10: Can you turn the entire code into something that can run inside a Jupyter Notebook?

Claude generated the code.

The output of running the code inside the Jupyter Notebook was almost the same as what’s shown on the left. There were some changes. The Process New Digit button was above the images. Also, the code in the Jupyter Notebook generated two rows of such images, only one of which was updated when you press the Process New Digit button. I didn’t debug it further.

Sometimes you will see results like the one shown above. I changed the amount of Gaussian noise that was added to the image to corrupt it (tweaked the mean and standard deviation from (0, 1.5) to (2, 3.5)). The Hopfield network recovered a bit-flipped image: black became white and vice-versa. Can you figure out why? Think about energy minimums. For more detail, please have a look at Chapter 8 of WHY MACHINES LEARN.

Previous
Previous

The Conceptual Simplicity of Machine Learning

Next
Next

The Monty Hall Problem: Could an LLM have convinced Paul Erdős?