Last time I messed around with RL, I solved the classic Mountain Car problem using Q-learning and Experience Replay (ER).

However, it was very basic in a lot of ways:

Last time I messed around with RL, I solved the classic Mountain Car problem using Q-learning and Experience Replay (ER).

However, it was very basic in a lot of ways:

Hey there!

Mountain Car (MC) is a classic Reinforcement Learning (RL) problem. It was briefly shown in a video I was watching, so I figured I’d give it a shot.

So last time, I solved the egg drop puzzle in a few ways. One of them was using a recent learn, Markov Decision Processes (MDP). It worked, which got me really stoked about them, because it was such a cool new method to me.

However, it’s kind of a baby process that’s mostly used as a basis to learn about more advanced techniques. In that solution to the problem, I defined the reward matrix and the transition probability matrix , and then used them explicitly to iteratively solve for the value function v and the policy p. This works, but isn’t very useful for the real world, because in practice you don’t *know* and , you just get to try stuff and learn the best strategy through experience. So the real challenge would be letting my program try a bunch of actual egg drops, and have it learn the value function and policy from them.

I first heard this puzzle when taking an algorithms class in undergrad. The prof presented it as a teaser for the type of thing you could solve using algorithmic thinking, though he never told us the answer, or what the way of thinking is. Then, it more recently came up with my friends while we were hiking, and we were talking about it. I’ll talk about what I have so far, but first let’s say what the puzzle actually is.

There’s a building with 100 floors. You have two identical crystal eggs. They will break if dropped from (or above) some height (the same height for both), and you’d like to find that height using the fewest number of drops possible. If you drop an egg from some height and it doesn’t break, you can use that egg again. Once an egg is broken (i.e., you dropped it from that breaking height or above), you can’t use that egg again. So the question is, what’s the best dropping strategy?