EDX Artificial Intelligence, week 4

Week 4 is where it gets really good. Week 3 was cool because it got into heuristic search, which is the start of what feels like a glimmer of “intelligence”, but week 4 is on adversarial search and games. Hot damn that’s cool. Additionally (skip to the bottom if you’re only interested in that), the project for the week was to make an AI that plays the game 2048!

Theory

Adversarial search problems are basically the same as a game (a certain type, anyway). The idea is that there’s an opponent, whom we can’t control, and they’re planning against us! So the main difference between a game and a typical search is that the optimal solution for a game isn’t a sequence of actions, but a “policy” for what to do in response to the opponent’s actions. Fortunately, they can still be very well modeled by search trees if you know how.

You can divide games up into the cross of two categories, each with two possibilities: information you see (full vs non-full) and determinism (fully vs some randomness), which will change exactly how you play the game. We’re mostly going to look at fully available information, deterministic games. Further, we’re also going to look at zero-sum games, where by definition a good thing for your opponent is bad thing for you, and the amount of “goodness” (i.e., points) in the game is conserved.

A big part of what we do here is called “embedded thinking”. This means that, to decide what choice to make, you’ll need to consider what it will allow your opponent to do, but of course what he does (to win) will need to consider what you’ll do. So you end up going back and forth (thinking alternately as yourself and your opponent) to decide your best move. To illustrate the following, we’re gonna consider two players, Max (who tries to maximize the resulting score) and Min (who…tries to minimize it).

The first search tactic we look at is the minimax algorithm. This is a recursive algorithm (but in a slightly indirect way, like I’ll explain below) that starts at the top of the game tree and tries to figure out the choice that should be made to maximize Max’s possible score, given that at each choice Min can make, Min will try to minimize the score (it’s really solving for Min as well, because you just take the opposite choice to get the correct plays for Min). It works like this. There are two pretty similar functions, Minimize() and Maximize(). Maximize is called on the top, first node (because we’re trying to maximize the tree). Within Maximize, if it’s a terminal node (game over), it returns the value of that node. If it’s not, it calls Minimize on each of the child nodes, and then, if any of those returns from Minimize are bigger than the best value achieved so far (which starts off as negative infinity in each function call so at least one child gets chosen), that node is selected and that value becomes the new best value. Finally, it returns whatever is the best node and value after calling Minimize on each of the children. What does Minimize do? Well, nearly the exact same, but with the sign switched at each place. It calls Maximize on each of its children, and tries to find the minimum value of them (starting with a minimum value of positive infinity).

So, the algorithm is recursive, but in a “flip flop” way, where Maximize() calls Minimize(), which then calls Maximize(), etc, until a terminal node is found (as opposed to what I’ve usually seen recursive algorithms do, just call themselves).

This is actually pretty familiar to anyone who’s played chess (or lots of strategy games, really). When considering a move to make, you then put yourself in your opponent’s shoes and think, “what move would I make if I were them?” Of course, you don’t always get to complete it to “terminal nodes” (i.e., checkmate), so you just kind of say “okay, 5 moves down the line, if we both do our best that I can see, this looks like it maximizes the game for me” (and of course sometimes you didn’t look far enough, and there are surprises!). But what you’re really doing is some version of the minimax algorithm!

So what can you do when you can’t really get to a terminal node? They offer 3 strategies:

  1. Use an evaluation function for non terminal nodes. That is, only go D layers deep, but have some function that tells you how good you’re doing at that node when you stop.
  2. Use Iterative Deepening Search (IDS). You basically limit the depth, keep re-calculating it for deeper depths, and then take the best result you got when you had to stop.
  3. Prune, which means cut out big chunks of the tree when you either know or suspect that it won’t give you better results.

They focus on pruning first, and specifically, a famous algorithm called alpha-beta pruning. The basic idea of pruning is that if you’re smart, and keep track of the right stuff while traversing the tree, sometimes you can skip whole sections of the tree because you realize they just can’t do better than what you’ve already done. So this is pruning generally, but alpha-beta pruning is more specifically about passing the best values for max (alpha) and min (beta) down the tree, and the propagating new discoveries back up the tree.

The specifics can actually get a little confusing, so I’ll try and explain it here for my own knowledge. The first thing to note is that alpha and beta are values at each (expanded) node, not global variables. Similar to minimax, there will be a “flip flop” recursion between Maximize and Minimize. The difference here is that each time one of them is called, they’re called with an alpha and beta value. The first time, Maximize is called on the root node with alpha = -infinity and beta = +infinity. At the beginning of Maximize (everything will be the same for Minimize, but backwards), the (best node, best value) combo for that node’s expansion is set to (Null, -infinity). Then, they go through a for loop of each of the children. For each child, Minimize() is called on it, and the resulting value is gotten. If this value is better than the best value, that value becomes the new best value. So, the first time this happens, anything will be bigger than -infinity, so it will be the new best (biggest) value. Then, this best value (whether or not it got updated) is compared to beta (the one passed to Maximize). If it’s bigger, that means that Minimize will certainly choose that beta, which is worse (for Max) than the best Max can already achieve on another branch. Therefore, we can ignore the rest of that branch, and prune.

There’s also a brief section on move ordering, which is the order of the nodes you evaluate. If alpha-beta pruning potentially allows you to not have to evaluate the rest of a node’s children, if evaluation of one of them gives you alpha > beta, you can imagine that you’d want to evaluate that node first.

Project

Alright, now for the actual project! The cool part. The goal here is to make an AI that plays 2048. There are a few constraints. You only get 0.2 second per move, so you have to cut off your search of the game tree after that time and just return what you’ve found for your best move. Additionally, typically in the minimax algorithm, you assume your opponent (if you’re Max) wants to minimize, that is, give you the worst possible move. However, in 2048, the 2 or 4 tile that’s placed each turn is random. So there’s a good chance that you could get better scores if you programmed your AI with this in mind (i.e., if you make a move where, if the computer was playing randomly as it actually does, and there’s a 1% chance it would make the move that’s worst for you, but that other 99% is great for you, that would actually be a pretty great move to make). However, the instructions say to program it as though the game is actively playing against you, giving you the worst move at each point.

I mentioned above what you have to do when you can’t get to a terminal mode. So far they showed you trees that seemed to assume that you could see the outcomes of Max and Min choosing their best moves, and finishing the game to completion or something. I’m not sure you even could theoretically (given infinite calculation time) get all the ending game states for 2048, because it seems possible that there are some rare branches of the tree where the game could continue indefinitely (actually, is this true? If you could also place tiles yourself, and choose the moves, could the game go on forever? I’m not sure, because it seems like as the game goes on, to get higher numbers, you need to have the lower ones that build up to them present as well, which would clog the board up (edit: actually, see below for an example of this)).

Anyway, the point is that when you can’t reach terminal states in the game tree (that is, the game doesn’t end, giving you a “definite vaule” for that state), you basically have to make up your own value for a state. And that’s a heuristic, in a sense! It turns out choice of heuristic is very important and the hard problem of this assignment. At each node, you’ll calculate the heuristic for that game state, which will let you make choices.

They suggest starting by just implementing a-b pruning with the heuristic of “number of free cells”, which should let you get some combo of 256’s and 512’s if you run it repeatedly (the way they score it is to run your program 10 times and take the 5 best scores). This actually isn’t a bad heuristic, but could use a little tweaking. In general, you probably do want more open cells, and that also encourages combining them, which will tend to give you higher number tiles. So, I implemented this, and it indeed gave me a bunch of 256’s and 512’s. I also implemented IDS to start, so for a given move, I have a “depth limit” to search in the tree, which I increase, until I run out of time. That still had me at the same scores.

Okay, to the code. I’m mostly going to focus on what I did for the heuristic here, because that’s the interesting part, and honestly, I did the main parts in Minimize() and Maximize() pretty ugly.

The simplest, the starting heuristic, was just:

def heuristic(self,grid):
        #For the heuristic, higher is better for the playerAI.
        N_free_cells = len(grid.getAvailableCells())
        return(N_free_cells)

But now let’s try to get better. I don’t know a ton about 2048 but the one thing I did know was that you want your highest value in the corner. To do that, I added another piece to the heuristic function, that just adds the value of the tile in the corner to the heuristic, still keeping the N_free_cells thing:

def heuristic(self,grid): #For the heuristic, higher is better for the playerAI. N_free_cells = len(grid.getAvailableCells()) corner_weight_val = grid.getCellValue((0,0)) return(self.weight1*N_free_cells + self.weight2*corner_weight_val) read more