Training a real robot to play Puckworld with reinforcement learning

After I trained an agent to play “puckworld” using Q-learning, I thought “hey, maybe I should make a real robot that learns this. It can’t be that hard, right?”

Hooooooooo boy. I did not appreciate how much harder problems in the physical world can be. Examples of amateurs doing Reinforcement Learning (RL) projects are all over the place on the internet, and robotics are certainly touted as one of the main applications for RL, but in my experience, I’ve only found a few examples of someone actually using RL to train a robot. Here’s a (very abridged!) overview of my adventure getting a robot to learn to play a game called puckworld. read more

Beating OpenAI games with neuroevolution agents: pretty NEAT!

Let’s start with a fun gif!

Something I’ve been thinking about recently is neuroevolution (NE). NE is changing aspects of a neural network (NN) using principles from evolutionary algorithms (EA), in which you try to find the best NN for a given problem by trying different solutions (“individuals”) and changing them slightly (and sometimes combining them), and taking the ones that have better scores. read more

First project with the new 3D printer: a TOF sensor mount

I’m pretty late to hop on the 3D printing bandwagon, but I heard its siren song and couldn’t stay away much longer!

After the briefest search online and asking a few friends, I decided to go with the Monoprice Delta Mini. My main reasons were that I didn’t want to have to tinker and build much (at least to start), I wanted to be able to get decent quality, I didn’t want to spend a ton, and I don’t especially need a large print size. The MPDM matched all of this (supposedly works out of box, can do 0.05 mm layer height, 160 bucks, 4″ height by 3″ diameter print volume), so I went for it. read more

Solving the Brachistochrone and a cool parallel between diversity in genetic algorithms and simulated annealing

In my first post on Genetic Algorithms (GA), I mentioned at the end that I wanted to try doing some other applications of them, rather than just the N Queens Problem. In the next post, I built the “generic” GA algorithm structure, so it should be easy to test with other “species”, but didn’t end up using it for any applications.

I thought I’d do a bunch of applications, but the first one actually ended up being pretty interesting, so… here we are. read more

Animation stand: from design to build with Onshape

My mom does lots of animation, and one day mentioned that she would love to have an animation stand, but professional ones are either cheap and too small, or very expensive. With the holidays coming up, this seemed like a good present!

I Googled “animation stand” first, to see what others have done and get some inspiration.

Training an RL agent to play Puckworld with a DDQN

Last time I messed around with RL, I solved the classic Mountain Car problem using Q-learning and Experience Replay (ER).

However, it was very basic in a lot of ways:

  • There are really only two actions, and the state space had only two dimensions (position and velocity).
  • The way I was representing the state space was very simple, “coarse coding”, which breaks the continuous state space into discrete chunks, so in a way it still has discrete states. More interesting problems have continuous, many dimensional state spaces.
  • The representation of Q was just a state vector times a weight vector, so just linear. You can actually get a decent amount done with linear, but of course all the rage these days is in using neural networks to create the Q function.
  • The problem was very “stationary”, in the sense that the flag (where the car wanted to go) was always in the same place. Even if I had the flag move around from episode to episode, the strategy would always be the same: try to pick up enough momentum by going back and forth. A more interesting problem is one where the goal moves.

The Red Lama (Red Llama clone)

After making the worst fuzz pedal ever (that’s for another post) and Orange Ya Glad (which was fine, but didn’t add quite as much fuzz as I wanted and adds a weird buzz even when you’re not playing on some speakers), I just wanted a normal fuzz pedal. After doing a bit of reading, I found that the Red Llama overdrive pedal (by Way Huge) is a classic, and after watching a few YouTube demos, it seemed good (to be honest, people are crazy about the “different” sounds of various fuzz/distortion/overdrive that various antique/obscure transistors or configurations will give you, but they all sound pretty similar to me, and I suspect people think they’re hearing differences more often than there actually are).

Anyway, I wanted to tribute the original Red Llama circuit I was cloning, so I went for… read more

Genetic Algorithms, part 2

Last time, in case you missed it, I left off with a laundry list of things I wanted to expand on with Genetic Algorithms (GA). Let’s see which of those I can do this time!

This is pretty wordy and kind of dry, since I was just messing around and figuring stuff out, but I promise the next one will have some cool visuals.

Using Reinforcement Learning to solve the Egg drop puzzle

So last time, I solved the egg drop puzzle in a few ways. One of them was using a recent learn, Markov Decision Processes (MDP). It worked, which got me really stoked about them, because it was such a cool new method to me.

However, it’s kind of a baby process that’s mostly used as a basis to learn about more advanced techniques. In that solution to the problem, I defined the reward matrix and the transition probability matrix , and then used them explicitly to iteratively solve for the value function v and the policy p. This works, but isn’t very useful for the real world, because in practice you don’t know  and , you just get to try stuff and learn the best strategy through experience. So the real challenge would be letting my program try a bunch of actual egg drops, and have it learn the value function and policy from them.