Simulated Annealing (SA) is a very basic, yet very useful optimization technique. In principle, it’s a modification of what’s sometimes called a “hill climbing” algorithm.
Let’s look at a practical example to explain what hill climbing is, and what SA addresses. Imagine you’re in a 1-dimensional landscape and you want to get to the highest possible point. Further, a crazed optimization expert has blindfolded you so you can’t see anything; all you can do is randomly try to go either left or right, by tapping your foot to feel if a step in that direction is higher than where you’re currently standing. If it is, you take that step, and repeat.
A while ago, I did a post on beating OpenAI games using neuroevolution (NE). Go read that if you’re interested, but here’s the gist: a typical strategy for training an agent to beat those games is to have a neural network (NN) play the games a bunch, and then improve the weights of the NN using a reinforcement learning algorithm that uses gradient descent (GD), and it of course works pretty well.
However, an alternative to those methods is to use a gradient free method (which I’ll call “GD-free”), like I did in that post: you try a bunch of random changes to the NN’s weights, and only keep the resulting NNs that play the game well. That’s the “evolutionary” aspect of it, and using methods like that to create NNs is often called “neuroevolution” (NE).
Similar to…most of? my ideas, I don’t remember why I thought of this. I think after I made the reinforcement learning robot, I was on a robot kick, and came up with this. Hexapods are of course a robot classic, but I don’t think I had ever seen a centipede robot.
Why a centipede? Well… I can make up a few “practical” reasons: because of its length, it could potentially bridge gaps, or bend “upwards” to have height, or possibly even climb. But the real reason is because they haven’t been done that much and I thought it would be cool, funny, and creepy.
After I trained an agent to play “puckworld” using Q-learning, I thought “hey, maybe I should make a real robot that learns this. It can’t be that hard, right?”
Hooooooooo boy. I did not appreciate how much harder problems in the physical world can be. Examples of amateurs doing Reinforcement Learning (RL) projects are all over the place on the internet, and robotics are certainly touted as one of the main applications for RL, but in my experience, I’ve only found a few examples of someone actually using RL to train a robot. Here’s a (very abridged!) overview of my adventure getting a robot to learn to play a game called puckworld.
Let’s start with a fun gif!
Something I’ve been thinking about recently is neuroevolution (NE). NE is changing aspects of a neural network (NN) using principles from evolutionary algorithms (EA), in which you try to find the best NN for a given problem by trying different solutions (“individuals”) and changing them slightly (and sometimes combining them), and taking the ones that have better scores.
In my first post on Genetic Algorithms (GA), I mentioned at the end that I wanted to try doing some other applications of them, rather than just the N Queens Problem. In the next post, I built the “generic” GA algorithm structure, so it should be easy to test with other “species”, but didn’t end up using it for any applications.
I thought I’d do a bunch of applications, but the first one actually ended up being pretty interesting, so… here we are.
Last time I messed around with RL, I solved the classic Mountain Car problem using Q-learning and Experience Replay (ER).
However, it was very basic in a lot of ways:
- There are really only two actions, and the state space had only two dimensions (position and velocity).
- The way I was representing the state space was very simple, “coarse coding”, which breaks the continuous state space into discrete chunks, so in a way it still has discrete states. More interesting problems have continuous, many dimensional state spaces.
- The representation of Q was just a state vector times a weight vector, so just linear. You can actually get a decent amount done with linear, but of course all the rage these days is in using neural networks to create the Q function.
- The problem was very “stationary”, in the sense that the flag (where the car wanted to go) was always in the same place. Even if I had the flag move around from episode to episode, the strategy would always be the same: try to pick up enough momentum by going back and forth. A more interesting problem is one where the goal moves.
Last time, in case you missed it, I left off with a laundry list of things I wanted to expand on with Genetic Algorithms (GA). Let’s see which of those I can do this time!
This is pretty wordy and kind of dry, since I was just messing around and figuring stuff out, but I promise the next one will have some cool visuals.
Mountain Car (MC) is a classic Reinforcement Learning (RL) problem. It was briefly shown in a video I was watching, so I figured I’d give it a shot.