This is a follow up to my article “Training a real robot to play Puckworld with reinforcement learning”. In that one, to make it a little punchier, I showed the overview and end results of the project, but left out the insane number of little hurdles and decisions I had to figure out.
So this article will be about those details instead, partly for me to justify the pain, but maybe more charitably to show that for any project with a neat (hopefully?) presentation, there’s probably a harrowing saga of hair-pulling roadblocks behind it. Here, it’s roughly in the order I encountered things. There are lots I’m leaving out too.
Simulated Annealing (SA) is a very basic, yet very useful optimization technique. In principle, it’s a modification of what’s sometimes called a “hill climbing” algorithm.
Let’s look at a practical example to explain what hill climbing is, and what SA addresses. Imagine you’re in a 1-dimensional landscape and you want to get to the highest possible point. Further, a crazed optimization expert has blindfolded you so you can’t see anything; all you can do is randomly try to go either left or right, by tapping your foot to feel if a step in that direction is higher than where you’re currently standing. If it is, you take that step, and repeat.
A while ago, I did a post on beating OpenAI games using neuroevolution (NE). Go read that if you’re interested, but here’s the gist: a typical strategy for training an agent to beat those games is to have a neural network (NN) play the games a bunch, and then improve the weights of the NN using a reinforcement learning algorithm that uses gradient descent (GD), and it of course works pretty well.
However, an alternative to those methods is to use a gradient free method (which I’ll call “GD-free”), like I did in that post: you try a bunch of random changes to the NN’s weights, and only keep the resulting NNs that play the game well. That’s the “evolutionary” aspect of it, and using methods like that to create NNs is often called “neuroevolution” (NE).
Similar to…most of? my ideas, I don’t remember why I thought of this. I think after I made the reinforcement learning robot, I was on a robot kick, and came up with this. Hexapods are of course a robot classic, but I don’t think I had ever seen a centipede robot.
Why a centipede? Well… I can make up a few “practical” reasons: because of its length, it could potentially bridge gaps, or bend “upwards” to have height, or possibly even climb. But the real reason is because they haven’t been done that much and I thought it would be cool, funny, and creepy.
I’m not sure when I got the urge to make a CNC… or maybe it was always there. I did a summer job in a machine shop when I was 19, where I was given a minimum of training by the 88 year old machinist: “keep the pink things [wiggling his fingers] away from the sharp things [pointing to the milling machine’s cutting edge], boy.”
(I’m not joking. He did show me more things later, but only because I asked. He would mostly just motion for me to come over so he could tell me filthy jokes from the silent film era and cackle to himself.)
I just got back from a trip to Chile with my friends! We were visiting our friend Will. After defending, he did a pretty similar trip to the one I did after defending, but the South American version instead of Southeast Asia. I believe he started in Peru, then went through Bolivia and met us in Santiago, Chile.
It worked out fairly perfectly, because I think he was reaching the same level of “travel done-ness” that I did around the same length of time I did. So I think he wanted to come back, but we also wanted to visit him down there before he left. We had also been wanting to go to Patagonia for ages (long before his trip was on the table), so it was perfect.
After I trained an agent to play “puckworld” using Q-learning, I thought “hey, maybe I should make a real robot that learns this. It can’t be that hard, right?”
Hooooooooo boy. I did not appreciate how much harder problems in the physical world can be. Examples of amateurs doing Reinforcement Learning (RL) projects are all over the place on the internet, and robotics are certainly touted as one of the main applications for RL, but in my experience, I’ve only found a few examples of someone actually using RL to train a robot. Here’s a (very abridged!) overview of my adventure getting a robot to learn to play a game called puckworld.
Let’s start with a fun gif!
Something I’ve been thinking about recently is neuroevolution (NE). NE is changing aspects of a neural network (NN) using principles from evolutionary algorithms (EA), in which you try to find the best NN for a given problem by trying different solutions (“individuals”) and changing them slightly (and sometimes combining them), and taking the ones that have better scores.
I’m pretty late to hop on the 3D printing bandwagon, but I heard its siren song and couldn’t stay away much longer!
After the briefest search online and asking a few friends, I decided to go with the Monoprice Delta Mini. My main reasons were that I didn’t want to have to tinker and build much (at least to start), I wanted to be able to get decent quality, I didn’t want to spend a ton, and I don’t especially need a large print size. The MPDM matched all of this (supposedly works out of box, can do 0.05 mm layer height, 160 bucks, 4″ height by 3″ diameter print volume), so I went for it.