LLM Prompts for Learning About Hopfield Networks
Ill. Niklas Elmehed © Nobel Prize Outreach
John Hopfield and Geoffrey Hinton won the 2024 Nobel Prize for physics. There has been a lot of debate about whether their work counts as physics. I’m biased: I think it does. WHY MACHINES LEARN has an entire chapter describing the physics rationale for Hopfield networks. The chapter is titled: “With A Little Help from Physics”!
This blog post is about using a large language model (I used Claude 3.5 Sonnet) to generate code and help us learn about the inner workings of Hopfield networks. I’m only showing the prompts and the outputs of running the code here. Your mileage will vary depending on the LLM/Code Assistant you use, but this should give you a good idea of how to go about coding a Hopfield network. Here’s a brief intro (more details of how they work, i.e. the algorithm, can be found in Ch8 of WHY MACHINES LEARN):
Training a Hopfield network, which is akin to storing some information in the network, is a simple procedure. First, the size of the network is dictated by the number of bits of information you want to store. Let’s say that you want to store a 28x28 image. If you turn the image into a vector, it’ll be 784 bits long. So, you need 784 bidirectional neurons in your Hopfield network.
Of course, you don’t really need neurons: you just need the 784x784 matrix that gives you the weights of the connections between the neurons. In a Hopfield network, any given neuron doesn’t connect to itself. So, the diagonal of this weight matrix will be zero. Also, the weights are symmetric, meaning the weight for a connection going from the output of neuron i to the input of neuron j is the same as the weight of the connection going from the output of neuron j to the input of neuron i. This was one of Hopfield’s key insights! Weights are either +1 or -1.
The essence of a Hopfield network is that once you store some information, such that the outputs of the neurons taken together make up the memory, then that state is designed to be an energy minimum. If you now corrupt the memory, by setting the outputs of neurons to other values, then that raises the energy of the network. Its dynamics are such that the network will find its way to an energy minimum; if you read off the neurons now, you’d have retrieved the memory.
Here’s a series of prompts that begin with MNIST images, which are wonderful images to store and restore:
(Yann LeCun (Courant Institute, NYU) and Corinna Cortes (Google Labs, New York) hold the copyright of MNIST dataset, which is a derivative work from original NIST datasets. MNIST dataset is made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license).
CLAUDE:
This code does the following:
Imports necessary libraries: TensorFlow for loading the MNIST dataset, and NumPy for array operations.
Loads the MNIST dataset using
tf.keras.datasets.mnist.load_data()
.Defines a
binarize_image
function that takes an image and a threshold (default 120) and returns a binary version of the image.Applies the binarization to both the training and test sets using list comprehension.
Prints the shapes of the original and binarized datasets to verify the operation.
Includes an optional visualization step to display an original image and its binarized version side by side.
I noticed that the code was simply showing me the first image in the list. I wanted to select a digit based on its label. This needed additional prompting.
If you look at the two rows of images shown above, it’s clear that the five images from MNIST and the five binarized images are different. It required further prompting to get the code to use the same set of five images.
It’s clear that the code is working. The two sets of images shown above are of the same digits, one from MNIST and the other binarized. Now, I want to turn the binarized image with 0 and 1 for pixels, into images that have -1 and +1. I guess I could done it at one shot, but well…
CLAUDE:
Certainly! I'll create a function to map the binarized images from {0, 1} to {-1, 1}. This kind of mapping is often useful in machine learning applications, particularly when working with certain types of neural networks or algorithms that benefit from centered data.
Let's update our code to include this new functionality:
[Stuff here from Claude about the details of the coding changes]
This updated version allows you to visualize the original image, its binarized form, and the version mapped to {-1, 1} all at once. This can be particularly useful for understanding how the data is transformed at each step and how these transformations might affect downstream machine learning tasks.
Prompt 5: Now write code for a Hopfield Network that can store a mapped image. We should then be able to corrupt the mapped image using Gaussian noise, plot the corrupted image, and then give the corrupted image to the trained Hopfield Network, and have it retrieve the uncorrupted image. Plot what is retrieved.
This code generated an error. Instead of debugging it myself, I showed Claude the error and asked it to fix it.
Prompt 6: The code produces this error: Traceback (most recent call last): File "/Volumes/mydisc/Dropbox/BACKUP/Why Machines Learn/ChaptersNew/python/code/ai-assist/hopfield/test5.py", line 51, in <module> (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data ^^ NameError: name 'tf' is not defined
You can see that the image in the center (above) is noisier. Yet, the Hopfield network retrieves the stored image. The next thing I did was to use a series of prompts to first generate an interactive user interface and then ask Claude to turn that code to be run inside a Jupyter notebook. Here are the prompts:
Prompt 8: Provide a button on the user interface to do the same process for another digit, and replace the plots for the new digit
Prompt 9: Can you only use matplotlib; don't use Flask or React and all that jazz.
Prompt 10: Can you turn the entire code into something that can run inside a Jupyter Notebook?
Claude generated the code.
The Monty Hall Problem: Could an LLM have convinced Paul Erdős?
Could a large language model have convinced one of the most prolific mathematicians—Paul Erdős—that he was wrong about his intuition about the Monty Hall Problem, as were so many others?
In WHY MACHINES LEARN, I begin the chapter on probability and statistics with a brief description of the Monty Hall Problem. But here’s how Claude 3.5 Sonnet answered when asked to describe it:
Prompt: Do you know the Monty Hall Dilemma?
Claude: Yes, I'm familiar with the Monty Hall Dilemma. It's a famous probability puzzle named after Monty Hall, the host of the television game show "Let's Make a Deal."
The basic scenario is:
There are three doors. Behind one door is a prize (usually a car), and behind the other two doors are goats (or some other non-prize).
The contestant chooses a door, but it isn't opened yet.
The host, who knows what's behind all the doors, opens one of the other two doors, always revealing a goat.
The host then offers the contestant the chance to switch their choice to the other unopened door.
The key question is: Should the contestant switch doors?
The counterintuitive correct answer is that the contestant should always switch doors, as it increases their chances of winning from 1/3 to 2/3.
You can read the chapter in WHY MACHINES LEARN for a thorough analysis of multiple ways of thinking about this problem, and why most of us get it wrong, because our reasoning often gets messed up in the presence of uncertainty. Erdős thought there’s no point in switching, because the probability of winning is 1/2 regardless. He was wrong.
In 1995, the Hungarian mathematician Andrew Vázsonyi used a computer program that ran 100,000 simulations of the game (where each trial involved making a choice about whether to switch or not), to show that you win 2/3rds of the time if you switch.
For WHY MACHINES LEARN, I too wrote some code to do the simulation and create an illustration (with due credit, for inspiration, to data scientist Paul van der Laken’s blog post on the same).
Could one use an LLM to do the same? Of course! This blog post is about the way you’d go about prompting an LLM.
But first, here’s the final output of my efforts at prompting Claude to generate an animation, depicting how the probability of winning converges to 2/3 if you switch, and to 1/3 if you don’t.
Here’s how you can use an LLM (I used Claude 3.5 Sonnet) to get it to simulate the game. You can tell that the LLM has most likely seen Paul van der Laken’s code during training, because the plot it generates is very much like his! I’m only showing the prompts and the outputs here, not the code.
Prompt: Do you know the Monty Hall Dilemma?
Claude: Yes, I'm familiar with the Monty Hall Dilemma. It's a famous probability puzzle named after Monty Hall, the host of the television game show "Let's Make a Deal."
The basic scenario is:
There are three doors. Behind one door is a prize (usually a car), and behind the other two doors are goats (or some other non-prize).
The contestant chooses a door, but it isn't opened yet.
The host, who knows what's behind all the doors, opens one of the other two doors, always revealing a goat.
The host then offers the contestant the chance to switch their choice to the other unopened door.
The key question is: Should the contestant switch doors?
The counterintuitive correct answer is that the contestant should always switch doors, as it increases their chances of winning from 1/3 to 2/3.
Prompt: Please describe the code you generated using pseudo-code
Claude: Certainly! I'll describe the code using pseudo-code to give you a high-level overview of the algorithm and structure.
Pseudo-code for Monty Hall Simulation
This pseudo-code outlines the main structure and logic of the Monty Hall simulation:
We define two main functions:
simulate_monty_hall_game
: This simulates a single game of the Monty Hall problem.run_simulation
: This runs multiple trials of the game and calculates probabilities.
In the
simulate_monty_hall_game
function:We randomly select the prize door and the contestant's initial choice.
We determine which door the host opens (it must not be the prize door or the contestant's choice).
Based on whether the contestant switches, we determine their final choice.
We return whether they won or not.
In the
run_simulation
function:We run the specified number of trials, keeping track of wins for both switching and staying strategies.
For each trial, we calculate and store the current probability of winning for each strategy.
In the main part of the script:
We run the simulation for 10,000 trials.
We plot the results, showing how the probabilities change as more trials are conducted.
We print the final probabilities for both strategies.
This pseudo-code captures the essence of the simulation without going
At this stage, I wanted Claude to generate a video file or a GIF, to show the probabilities changing with the number of trials. Again, it’s worth looking at the prompts to see how you might get to clean, working code that does what you want. I’m only showing the series of prompts here, and not their outputs.
Prompt: Great. Can you change the plot, so that it appears as an animation. Show the probabilities changing over a span of about 5 seconds, for a total of 10,000 trials
Prompt: The animation takes way too long to complete. Finish it in 5 seconds
Prompt: At the end, please save the animation as video file
Prompt: Running the code generates the following error; please fix it:
Saving animation as video file... MovieWriter ffmpeg unavailable; using Pillow instead. Traceback (most recent call last): File "/Users/anilananth/anaconda3/lib/python3.11/site-packages/PIL/Image.py", line 2390, in save format = EXTENSION[ext] ~~~~~~~~~^^^^^ KeyError: '.mp4'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/Volumes/mydisc/Dropbox/BACKUP/Why Machines Learn/ChaptersNew/python/code/ai-assist/prob/test3.py", line 67, in <module> ani.save('monty_hall_animation.mp4', writer='ffmpeg', fps=20) File "/Users/anilananth/anaconda3/lib/python3.11/site-packages/matplotlib/animation.py", line 1089, in save with writer.saving(self._fig, filename, dpi), \ File "/Users/anilananth/anaconda3/lib/python3.11/contextlib.py", line 144, in exit next(self.gen) File "/Users/anilananth/anaconda3/lib/python3.11/site-packages/matplotlib/animation.py", line 245, in saving self.finish() File "/Users/anilananth/anaconda3/lib/python3.11/site-packages/matplotlib/animation.py", line 515, in finish self._frames[0].save( File "/Users/anilananth/anaconda3/lib/python3.11/site-packages/PIL/Image.py", line 2393, in save raise ValueError(msg) from e ValueError: unknown file extension: .mp4
Claude: I apologize for the error. It seems that the ffmpeg
writer is not available on your system, and Matplotlib is falling back to using Pillow, which doesn't support MP4 format. Let's modify the code to save the animation as a GIF instead, which should work with the Pillow writer. Here's the updated version:
Prompt: Please annotate the plot so that it shows the final probabilities after 10,000 trials
Prompt: Move the final probability labels to inside the bounding box of the plot
Prompt: Make the labels for the two probabilities BOLD font
The final GIF that appears above in this post is the final output of running the code generated by Claude after all the prompts.
From Rosenblatt to Claude
Getting a modern large language model to generate code to implement and visualize Frank Rosenblatt’s perceptron algorithm is in some way paying tribute to Rosenblatt’s visionary ideas. In 1958, he designed the first learnable artificial neurons and the first single-layer artificial neural networks. LLMs are descendants of those early networks. It’s particularly sweet to get an LLM to help us code/visualize Rosenblatt’s perceptron.
When I started writing WHY MACHINES LEARN, one of the first algorithms I coded for the book was the perceptron algorithm. I designed a simple, interactive user interface that would allow me select my data points on the 2D X-Y plane, so that I could visualize the algorithm as it tried to find a line separating two clusters of data.
The figures in the book are images generated using the same UI (using the Python plotting library matplotlib). I did all this sometime in late 2020, well before ChatGPT came on the scene, and certainly well before any LLM-based coding assistants such as CoPilot.
But it’s a different world now. As Harvard professor Boaz Barak said in a recent tweet: “Just realized that the next time I teach my ML foundations course, the primary programming language we use will likely be English. (Students will still need to know math, and be able to read model-generated python.)”
I have been thinking along the same lines: creating a Codebook for WHY MACHINES LEARN using code assistants, so that interested readers could read about the algorithms and basic mathematical ideas in WHY MACHINES LEARN and then prompt an LLM to generate the code and learn how the algorithms work in code, if they are so interested (I used Anthropic’s Claude 3.5 Sonnet, the paid version; but I’m sure there are many open-source models out there that would do the job just as well).
This post is about the process of generating Python code, so that you can engage with the perceptron algorithm and see it working. Details of Rosenblatt’s work, the history and the math, etc., can be found in the first two chapters of WHY MACHINES LEARN.
Some lessons I learned regarding code generation: It really helps if you know exactly what you want, so that your prompts can be precise. You also need to be reasonably familiar with coding, to be able to understand the coding mistakes made by the LLM, so that you can ask it to correct the errors.
The first thing I did was take one of the images of the perceptron algorithm from the book, which shows a linearly separating hyperplane (in this case a line, as the data is 2D), dropping it into Claude’s context window, and giving it my first prompt (I find myself being weirdly polite while interacting with an LLM, hence the over-the-top usage of “please”!).
Prompt: Can you modify the code such that you draw every 3rd line the perceptron finds. Show the wrong lines as gray dotted lines, and the final correct line as a solid, black line. But plot it slowly, so that there is a 1-second delay between the plotting of each line.
Prompt: Something is not right. The code is creating a separate plot for each line. Please don't redraw the plot each time, but use the same canvas. It should seem like an animation.
Prompt: Also, for drawing the line, please use the same fig and ax you use for drawing the circles and triangles. This means your perceptron class will need extra arguments: to take in the fig and ax. Once you have the fig and ax inside the perceptron class, then use the artist to draw the line.
Prompt: Also, the code doesn't have a check to see if the perceptron has found a solution. Modify code to check if the perceptron has found a solution and then terminate the loop.
Prompt: Instead of drawing the perceptron's lines for every 3 iterations, do it for every iteration. Also, make the circles and triangles a little bigger.
Prompt: You removed the 1-second pause between drawing the perceptron's lines. Reintroduce the pause, but keep it to 0.5 seconds.
Prompt: So, everything is great, except for one detail. You have used values of 0 for circles and 1 for triangles, for the classification. The perceptron algorithm requires it to be -1 for circles and 1 for triangles. Can you redo the code with this change?
Prompt: After the perceptron has converged and you have drawn the black solid line, can you turn the entire sequence of lines drawn to convergence into a GIF file?
For readers of WHY MACHINES LEARN: I’ll be writing a series of blog posts, detailing my attempts to generate code using Claude or some open-source code assistant (preferably). I think it’s a great way to learn both the conceptual and mathematical basics of machine learning—which is the subject of WHY MACHINES LEARN—and also learn how to use code assistants, inspect the generated code, and understand HOW the machines work, by seeing/coding the algorithms at work.