Fun with Genetic Algorithms and the N Queens Problem

Genetic Algorithms are cool!

I was recently skimming through Russel and Norvig’s AI: A Modern Approach and came to the section on Genetic Algorithms (GA). Briefly, they’re a type of algorithm inspired by genetics and evolution, in which you have a problem you’d like to solve and some initial attempts at solutions to the problem, and you combine those solutions (and randomly alter them slightly) to hopefully produce better solutions. It’s cool for several reasons, but one really cool one is that they’re often used to “evolve” to an optimal solution in things like design of objects (see the antenna in the Wikipedia article). So, that’s kind of doing evolution on objects rather than living things. Just take a look at the applications they’re used for.

A lot of it makes more sense when you look at it in the context of evolution, because it’s a pretty decent analogy. A GA “solution attempt” I mentioned above is called an “individual”, like the individual of a species. It could be in many different formats (a string, a 2D array, etc), depending on the problem. You call the group of individuals you currently have the “population”. For species in nature, the “problem” they’re trying to solve is finding the best way to survive and pass on their genes, given the environment. For real evolution and genetics, the success of an individual is somewhat binary: does it pass on its genes, or not? (I guess you could actually also consider that there are grades of passing on your genes; i.e., it might be better to have 10 offspring than 1.) For GA, the success is measured by a “fitness function”, which is a heuristic that you have to choose, depending on the problem.

For each generation, you have a population of different individuals. To continue the analogy, real species mate, and create offspring with combined/mixed genes. In GA we do something called “crossover”, in which the attributes of two individuals are mixed to produce another individual. Similarly, we introduce “mutations” to this new individual, where we slightly change some attribute of them. This is actually very important (see evidence below!), because it allows new qualities to be introduced to the population, which you wouldn’t necessarily get if you just mixed together the current population repeatedly (exactly analogous with real evolution).

So, that’s the rough idea: you have a population, you have them “mate” in some aspect to produce new individuals, you mutate those new ones a little, and now apply your fitness function (i.e., how good is this individual?), and keep some subset of this new population. You could keep the whole population if you wanted to — but the number would quickly grow so large that you’d basically just be doing the same thing as a brute force search over all possible individuals.

I was aware of GA already, but had never actually implemented one. The example they use in AIMA was the 8 Queens problem (8QP), which is a fun little puzzle. Embarrassingly, for a chess player, I had never actually solved it! So I thought I’d dip my toe into GA and do it, and also maybe characterize it a little.

So, let’s set it up! An “individual” here is obviously going to be a board state (i.e., where all the queens are). Most naively, you might think that means a board state is an 8×8 2D array, with 0’s for “no queen on this spot” vs 1 for “queen here”. But, if you look at the 8QP for a second, you’ll quickly see that they each have to be on a different row, and different column. Therefore, you can really specify a board by an 8-long array of integers, where each index/entry represents a row, and the value of that entry is the column number than queen is in. So it’s automatically constraining them to be in different rows, and it makes the coding a lot simpler.

What’s the fitness function (FF)? You want some way of deciding how good a board is. A pretty obvious one for this problem is the number of pairs of queens that are attacking each other, for a given board state. If you solve the problem, FF = 0. So for this problem, you want a lower FF.

Here, crossover is combining two boards. To do this, we just choose a row number, and then split the two parents at that index, and create two new boards by swapping the sides. It’s probably more easily explained in the code:

def mate(self,board1,board2): board1 = deepcopy(board1) board2 = deepcopy(board2) crossover = randint(1,board1.board_size-1) temp = board1.board_state[:crossover] board1.board_state[:crossover] = board2.board_state[:crossover] board2.board_state[:crossover] = temp return(board1,board2) read more

Word clouds for Slack

Hey there! It’s been a while. I’ve been working on lots of stuff, but here’s a small thing I did recently.

My friends and I have a Slack we’ve now been using casually for a few years. You can download the entire logs of your Slack workspace, even if you use the free one (which will cut off the messages it shows you after 10,000 messages, I believe). So I wanted to do a few little projects with it.

One thing my friends and I were talking about was making bots that were crappy, funny imitations of us. So there would be a Declan-bot, Ben-bot, etc., that would talk like we do. Maybe we’ll try that in the future, but after doing the thing in this post, I have a suspicion that the bots might be kind of indistinguishable without extreme tailoring (though I’d love if they weren’t!).

So, what I wanted to do here was make a word cloud of each person’s total corpus in Slack. It actually all came together pretty quick, mostly because I did it in a quick, hack-y way.

First, to get the Slack data, you have to be an administrator. You go to the menu, administration, workspace settings, import/export, export, and then choose the date range. The folder is pretty huge, several GB. You’ll only get public data (not private messages), which makes sense. When you download it, it’s organized by folders corresponding to the channels everything was in, and in each of those, .json files for each day.

json is fortunately super easy to use in python, since it basically gets read as a dictionary. I made a little program that goes through recursively and gets all the files, and then piles everything together into one huge json… dictionary? That lets me easily select everything from a single user, or channel, or other aspect.

Another little detail is that it doesn’t label the users by our names (which you can change), it labels us by a unique identifier string, like U2189232 or something. So I had to make a little translation dictionary to go back and forth between names and IDs.

I decided to use this guy’s great python word cloud generator to make the word clouds. It’s even installable via pip3!

So, that’s the basics. Import all the data into one big json database/dictionary thing, choose a user, translate to their ID, grab all the text with that ID, turn it into a big ol’ list of words (with repeats), and then feed it into that word cloud generator. And it works! Here are a few:

But you’ll immediately notice a few things. One is that there’s some stuff we probably don’t want there, like http and user tags (because if you say someone’s name with the @, it just technically calls their ID and renders it as their name). Additionally, there are a ton of common words. It turns out that we like saying “yeah” and “one” a lot, so that tends to give rise to kind of lame word clouds.

This problem is actually a lot more interesting than you might think at first glance. I wanted to give the clouds a little more “personality”. That is, there are a few unique words in each word cloud that, knowing my friends, I’m able to point out and say “yeah, Max plays Magic, so he probably does say ‘deck’ a lot”, but there aren’t many of those words. What I’d really like is if the top N words of each person’s word cloud were pretty unique to them, but it turns out this is actually kind of tricky.

One thing I tried, that worked with some success, was taking the “megacorpus” from everyone’s combined corpuses (corpi? or is it like octopuses?), taking the 400 most common words from that, removing them from each person’s corpus, and then making them. This is definitely a slight improvement:

It’s not great, though. For example, there could be a word that’s used a ton by one or two users, and no one else, that might get removed. This would be a very “personal” word that I’d definitely want kept in their word cloud(s). I think it’s also hard to know where the point is where you stop removing common, lame words and start removing interesting, personal words.

Here’s what I’d ideally like: to make a list of the top N words, for each user, that are in the top N words of at most X other users I’m considering. So, if I’m making the top 10 lists for 8 of my friends, I might say that a “top 10” word can stay in a given list if 2 people have it in their top 10, but not if 3 do (then it’s just too common).

How do you do this, though? The naive way I tried it was this. Get each user’s corpus, sort by commonality (just for that corpus). So, you have a list with one occurrence for each word, sorted by decreasing use. Then, start with user 0. Start with their most common word (index 0 of that list), and check if it’s in the top N of each other user. If it is, add to the tally of how many others share that word (that starts at 0 for each new word). If more users have that word in their top N than are allowed, remove that word from everyone’s corpus. If you removed the word, you keep the index the same, but now it will be looking at a new word in the top N because the one it was just referring to just got removed. If the word didn’t have to be removed, increase the index (so now it’s looking at the next most common word in user 0’s top N). When you get to index N of user 0, go to the next user and restart. Here’s the relevant code for that:

for user in users: print('\n\ngetting unique words for',user) other_users = copy(users) other_users.remove(user) print('other users:',other_users) index = 0 while index<=N_unique: others_with_word = 0 cur_word = users_corpuses[user][index] print('\ncur word is \'{}\''.format(cur_word)) for other_user in other_users: if cur_word in users_corpuses[other_user][:N_unique]: print('{} also has this word'.format(other_user)) others_with_word += 1 if others_with_word>allowed_others: print('removing \'{}\' from all users'.format(cur_word)) for tempuser in users: users_corpuses[tempuser].remove(cur_word) else: index += 1 print('\n\nTop {} unique words for {} at this point'.format(N_unique,user)) [print('{}. {}'.format(i,word)) for i,word in enumerate(users_corpuses[user][:N_unique])] read more

EDX Artificial Intelligence, weeks 2 and 3


I started this AI course with my friends a while ago, but we never ended up finishing it. I’m interested in AI these days, so I thought I’d try it on my own. Week 1 is some fluff that’s not worth going over. I’m doing weeks 2 and 3 together because there is one project for both combined. The first couple sections are just notes I took on the videos and concepts. The stuff for the project is at the bottom. read more

Getting back on the horse…er, Python

As of this writing, I just defended and I’m considering various options for what I’ll do next. That’s a whole other story, but the important part for this post is that, probably for whatever I do, I’ll be coding.

I’ve coded a decent amount in my life. I started with dinky web stuff wayyy back, then picked up a now-tattered and probably outdated “C++ for Dummies” book in highschool. I did small programs with that, as well as some silly things for crappy AVR projects I did. In college, I used mostly Java because that’s what the computer science classes I took asked for. Towards the end of college, though, I was working on my own research, and used C++ instead (why? I honestly don’t remember. Wait, I just did! My advisor had heard of some multiprocessor module for C++ that he wanted me to try, so that’s why I didn’t stick with Java).

I didn’t code a ton for my first couple years of grad school. When I began again, I don’t remember exactly how, but I got into using Mathematica (I think because I had seen a few people do what seemed like ~~magick~~ at the time during my first year of grad school; stuff I stupidly spent a while doing with pencil and paper).

Oh, Mathematica (M). What a blessing and a curse. Let me just briefly tout its blessings: it’s very fully featured. A common response I’ve gotten is “but you can do <<XYZ thing>> in any language!”, and that’s usually true — but it’s not always really easy, like it is with M. The documentation (with a few rare exceptions) is pretty excellent. What I (and I suspect most users) want most of all in a manual/doc page is examples. It drives me nuts when I go to the man page for a bash command, and it gives the command syntax in some general form; yeah, I can figure it out if I spend a few minutes, but why make me waste time parsing some command syntax? M gets this, and if you look at the doc for a function, there’s a really solid chance that the exact application you want is already one of the examples and you can just copy and paste.

The other thing is that, because it’s all part of a central program (aside from using user-generated packages, which I’ve almost never done), it follows the same syntax, is generally coherent, and works together. I’ve just been amazed time and time again when I’ve wanted to do something fairly complex, googled “Mathematica <<complex thing>>”, and found that there’s already a pretty fully featured set of functions for it: graph theory stuff, FEA stuff, 3D graphics, etc.

Here’s the thing: a lot of this is essentially just lauding the benefits of any huge, expensive piece of software. Almost all of the things I just said would apply equally well to something like Adobe Photoshop: thorough, well documented, easy to use, etc.

And this brings me to the curse of M: it is a great piece of software in many respects, but it’s proprietary and huge. The proprietary part is a big deal, because a company or school has to pay for a large number of potential users, and site licenses can be really expensive (I tried to find a number but all they have online is “request a quote”; anyone have a rough estimate?). So this eliminates a lot of potential users, like startups that was to be “lean” or whatever. Additionally, I’m guessing that for a company, having a ton of their framework depending solely on another company is something they’d like to avoid, if possible.

Briefly looking into a few career options (data science is the best example of this) and talking to people, I quickly realized how un-used Mathematica is outside of academia. I’m sure there are some users, but it’s definitely not something employers are looking for, from what I gather.

Data science seems like a very possible route to take, so I was looking into the most commonly used languages in it, and the consensus seems to be: Python and R. I went with Python for a few reasons: 1) a couple videos said that if you’re starting new to both (which I essentially am), go with Python, 2) to contradict that first point, I’m actually not starting totally fresh with Python; my experience with it is definitely minimal but I’ve used it a tiny bit, and 3) it seems like, and correct me if I’m wrong here, Python is used for lots of applications outside of data science/stats, such as application building, machine control, etc, whereas R isn’t (true? eh?).

So I’m getting back on the Python. I’m a fairly firm believer that the best method to learn a coding language (or maybe anything, really) is to just start using it. Pick a task, and try doing it. When you run into something you don’t know how to do, Google it.

(Obviously, this doesn’t work at extremes; at some level of ignorance of a subject you simply don’t know what questions you should be asking. But by now, I’ve used bits of enough languages to know concepts that tend to span all languages, to search for.)

The thing I’m starting with is good ol’ Project Euler. If you’re not familiar with it, it’s a series of coding challenges that start very simple and get harder. For example, they list the number of people who have successfully done each problem. The first few problems are in the several hundred thousand range; the later ones are in the ~100 range (you could argue that that’s more about most people just not being that into spending a decent amount of effort on a task with essentially no outward reward, but they actually are a lot harder). The first bunch of them are really simple, being things like string manipulation, slightly tricky sums, and little counting tasks, where you really just need to think about how you’d do it in the most naive way, and then code it (perfect for getting back into a language!)… but they quickly get devilish and far from trivial. One type they’re a fan of, when you get to the slightly trickier ones, are problems where the naive, brute force approach is obvious, but would take an impossibly long time to calculate. However, there’s a trick involved that allows it to be calculated “on any modern computer in under a minute”, I believe is their promise.

So I’ve done the first 25 or so problems using python. I’m definitely going about it in a better way than I did before, trying to use neater notation (like list comprehension rather than a for loop, when I can). I think I’ve definitely benefited from my time with Mathematica, which has a strong emphasis on that type of stuff (for example, using inline functions and the Map shorthand /@).

Overall, it’s going pretty well and I’m liking it. I remember not liking whitespace-based syntax (or whatever it’s called), but I’m finding that with even a basic text editor like Notepad++ or Atom, it’s actually pretty neat.

But of course I have a couple complaints, so let me kvetch a bit.

First, there seems to be a dearth of simple solutions for simple problems that I’d expect to be kind of common. For example, in a few PE problems, I had a list of lists of the same length (so, a matrix), that I wanted to transpose. Now, in M, you’d literally just do Transpose@mat. However, I was having trouble finding how to do it neatly in Python. Basically, the exact problem being asked about here. Now, what I’m looking for is something nice and simple like one of the answers given:

import numpy as np

a = np.array([(1,2,3), (4,5,6)])

b = a.transpose()

But unfortunately, if you notice, for the same reason the OP in that post didn’t choose that answer, your matrix has to be in a different form for np.array (with parentheses, not square brackets, as they would be for a list). Now, I could recreate the matrix into an np array, but… now we’re talking about another operation, and I’d have to also do it back that way if I wanted it in square brackets at the end. I guess I could have built it as an np array from the get go, but you might not always have the option.

The solution that works for this is:

>>> lis = [[1,2,3], ... [4,5,6], ... [7,8,9]] >>> [list(x) for x in zip(*lis)] [[1, 4, 7], [2, 5, 8], [3, 6, 9]] read more