Last time I messed around with RL, I solved the classic Mountain Car problem using Q-learning and Experience Replay (ER).

However, it was very basic in a lot of ways:

Last time I messed around with RL, I solved the classic Mountain Car problem using Q-learning and Experience Replay (ER).

However, it was very basic in a lot of ways:

Last time, in case you missed it, I left off with a laundry list of things I wanted to expand on with Genetic Algorithms (GA). Let’s see which of those I can do this time!

This is pretty wordy and kind of dry, since I was just messing around and figuring stuff out, but I promise the next one will have some cool visuals.

Hey there!

Mountain Car (MC) is a classic Reinforcement Learning (RL) problem. It was briefly shown in a video I was watching, so I figured I’d give it a shot.

So last time, I solved the egg drop puzzle in a few ways. One of them was using a recent learn, Markov Decision Processes (MDP). It worked, which got me really stoked about them, because it was such a cool new method to me.

However, it’s kind of a baby process that’s mostly used as a basis to learn about more advanced techniques. In that solution to the problem, I defined the reward matrix and the transition probability matrix , and then used them explicitly to iteratively solve for the value function v and the policy p. This works, but isn’t very useful for the real world, because in practice you don’t *know* and , you just get to try stuff and learn the best strategy through experience. So the real challenge would be letting my program try a bunch of actual egg drops, and have it learn the value function and policy from them.

My friend Mike recently showed me a puzzle game called Skyscrapers, which you can play here. It’s a neat idea, in the general theme of “fill in the numbers with these constraints” puzzles like Sudoku or Verbal Arithmetic.

The rules are like so. You’re given a board like this, representing a group of city blocks (one building per square), with numbers around the sides:

I first heard this puzzle when taking an algorithms class in undergrad. The prof presented it as a teaser for the type of thing you could solve using algorithmic thinking, though he never told us the answer, or what the way of thinking is. Then, it more recently came up with my friends while we were hiking, and we were talking about it. I’ll talk about what I have so far, but first let’s say what the puzzle actually is.

There’s a building with 100 floors. You have two identical crystal eggs. They will break if dropped from (or above) some height (the same height for both), and you’d like to find that height using the fewest number of drops possible. If you drop an egg from some height and it doesn’t break, you can use that egg again. Once an egg is broken (i.e., you dropped it from that breaking height or above), you can’t use that egg again. So the question is, what’s the best dropping strategy?

Round 3! Okay, this is where I try and polish it up in a couple ways.

Here are the things I said last time I needed to make better:

Genetic Algorithms are cool!

I was recently skimming through Russel and Norvig’s AI: A Modern Approach and came to the section on Genetic Algorithms (GA). Briefly, they’re a type of algorithm inspired by genetics and evolution, in which you have a problem you’d like to solve and some initial attempts at solutions to the problem, and you combine those solutions (and randomly alter them slightly) to hopefully produce better solutions. It’s cool for several reasons, but one really cool one is that they’re often used to “evolve” to an optimal solution in things like design of objects (see the antenna in the Wikipedia article). So, that’s kind of doing evolution on *objects* rather than living things. Just take a look at the applications they’re used for.

Week 4 is where it gets *really* good. Week 3 was cool because it got into heuristic search, which is the start of what feels like a glimmer of “intelligence”, but week 4 is on *adversarial search and games*. Hot damn that’s cool. Additionally (skip to the bottom if you’re only interested in that), the project for the week was to make an AI that plays the game 2048!

Theory

I’ve been doing this Coursera Discrete Optimization course with my friends. It’s a lot of fun and I’m learning a bunch. The instructor is a total goofball too, which is a plus. I’ve taken a handful of online courses before, but to be honest, the assignments have usually been pretty easy. Not so with this Discrete Optimization (DO (my initials!)) course! Each week, you have to solve 6 problems, and each is graded out of 10 depending on how well you do. I believe the breakdown is: 0/10 if your answer doesn’t even match the output format required, 3/10 if you do basically anything (even if your answer is quite wrong), 7/10 if you have an answer above some threshold but still not perfect, and 10/10 if your answer is optimal (I guess they know the optimal solution for all of them?). Usually, the problems increase in hardness throughout the set; often, the last one is difficult enough that (I believe we saw this said in the course forums by the instructor) it would be a challenge for people who specialize in DO for a living. I think that’s pretty cool! They usually give you a ton of practice problems of various difficulties, and (though I’m not 100% sure) I think the 6 you’re graded on are usually among those.

So what is DO? I certainly didn’t know when I started this course, though I guess I should’ve been able to guess. Optimization is what it sounds like, finding the best solution you can for given problems. The “discrete” part is that the quantities involved are integers or discrete (*that’s the name in the title!!!)* components. It turns out I had actually heard of many of the problems that DO applies to before, but didn’t know they were DO. I had heard most of them in the context of P vs NP complexity.