RPi camera, part 3: a few incremental fixes

Round 3! Okay, this is where I try and polish it up in a couple ways.

Here are the things I said last time I needed to make better:

  • Send pics more conventionally
  • Fix detection sensitivity (still often picking up strong shade/sunlight quirks)
  • Total design flaw: since the log file currently gets sent with each picture, but is updated when each picture is written, it is actually sometimes more updated than the pics in the folder. That is, if 30 pictures are created by the camera function, and those are immediately added to the log file, the log file is sent with the first of those 30, and it contains all 30 of them even though only one has been sent
  • Better image classifier architecture
  • Better labeled image dataset (32×32 is tiny)
  • Sliding windows over detected images

Sliding windows

What is “sliding windows”? It’s a pretty simple idea. Here’s the motivating problem, first. When my convnet looks at an image I give it and tries to guess whether it has a car in it, it was trained on images where the car mostly filled the frame, so it will tend to only correctly classify images where the car also mostly fills the frame. If you have an image (like the one below) where there is a car but it’s mostly in one corner, it may not get it (because it’s looking for features bigger than that. There are a couple other effects too. One is that if we’re only classifying 32×32 square images, then what I’ve been doing so far (resizing the image to have the smaller side be 32 and then squeezing the bigger side to also be 32) will distort the image, which will make it harder to classify. Lastly, you can imagine that if the actual image size we’re giving it is something like 256×512, then even if it actually would have correctly classified it given these other problems, by the time it smooshes it down to 32×32, it might not have the resolution in the car region of the image to do it.

So what can fix a lot of these problems? You define a “window”, a subset of the image, and “slide” it over the image in steps. Then, you pass this window subset to the classifier, so it’s actually only classifying the subset. You might choose a stride for the sliding window such that you get M windows in the vertical dimension and N windows in the horizontal. So you’re really doing MxN total classifications for the whole window, and then if one of them says “car!”, you say that the image contains a car.

Here’s a little illustration of mine, where the red grid over the green outlined window shows the windows being used (it’s a little hard to tell them apart, but they’re squares. There are three in the vertical direction):

There are of course a million little quirks and variants and choices to make with this. For example, I think it’s common to choose two sizes for the window, which should let you look at two different “scales”. Also, you have to choose some balance between having more sub windows and the computation time it will take to actually process them. I’m also pretty sure some convnets can have this built in by choosing different filter sizes (like, one that would group a block of pixels as a single pixel to make a larger “window”).

Anyway, how does it work? Here are the results using my CIFAR-10 trained convnet from last time, on the same little group of detected images. I show the certainty distribution, which is the certainty that it thinks it detects a car.

No sliding windows:


Not detected:

Sliding windows:


Not detected:

Definitely better! But still getting a ton of false positives, which is annoying. Honestly it may be because they’re 32×32.

Fixing image sending

So I had a bit of a mystery on my hands. I was finding that after a while, my program was just…stopping. Not crashing, not giving any error, just stopping after about an hour. What I mean by stopping is that, while normally it outputs a line for each image it detects (on the RPi side), it would stop outputting anything, but not stop running. It took me embarrassingly long to figure out, but here’s what I did. I first made a Debug class that basically logs everything the program does, right at the moment of doing it. This is actually a pretty handy thing to have around anyway, and basically doesn’t slow it down. You’ll notice that I’m periodically logging the CPU/Mem/temp, since I read somewhere that that can cause a problem, but all the values I’ve seen on mine are fine. Anyway, here was the first clue, you can see where it stops, after about an hour:

So you can see that it’s saving them steadily for a while, and then stops saving them, but continues to send them. Welp, you probably guessed before I did, but while I was aware of how little space my RPi had on it (~700MB to play with), I thought that because I was removing the files right after sending them, I’d be okay. Howeverrrr:

So I was running out of space!

One thing I did was immediately get a 32GB micro SD card and clone my RPi onto it, just to have a bit more wiggle room. To be honest, that might solve the problem, since I doubt I’d ever keep the program running long enough to generate that much data, but that would be not addressing the real problem here, which is that my files are sending way too heckin’ slow!

My files are usually ~100kB, which should be easy to send and keep up with, even if something like 10 a second are being produced. For example, I know off the top of my head that when I send files via scp between my desktop and RPi, the transfer rate it shows is usually something like 1.5 MB/s. So what’s going on?

It turns out that that “S” that stands for “secure” in SCP (or SSH, which it’s based on) is pretty important! As they discuss in this thread where it seems like the person was having exactly my problem, there’s actually some pretty nasty overhead involved in encrypting the file you’re going to send. Of course, I don’t care about that! I’m just sending stuff I don’t care about over my LAN.

So one option in that thread was using a weaker cipher, while another was to use the rcp command, which is kind of like a totally unencrypted scp. I’m going to do a little diversion for a minute here because I wanted to know just how much these compared.

What I did was create a few dummy files, smallfile.txt (100 kB), mediumfile.txt (1 MB), and bigfile.txt (5 MB). First I just sent smallfile.txt 10 times to get a rough sense of the speed and overhead:

for i in range(10): file = files[0] print('processing file',file) start = time() subprocess.check_call(['scp',file,remoteHostPath]) total = time()-start print('time elapsed:',total) times.append(total) print('done') print(times) read more

Fun with Genetic Algorithms and the N Queens Problem

Genetic Algorithms are cool!

I was recently skimming through Russel and Norvig’s AI: A Modern Approach and came to the section on Genetic Algorithms (GA). Briefly, they’re a type of algorithm inspired by genetics and evolution, in which you have a problem you’d like to solve and some initial attempts at solutions to the problem, and you combine those solutions (and randomly alter them slightly) to hopefully produce better solutions. It’s cool for several reasons, but one really cool one is that they’re often used to “evolve” to an optimal solution in things like design of objects (see the antenna in the Wikipedia article). So, that’s kind of doing evolution on objects rather than living things. Just take a look at the applications they’re used for.

A lot of it makes more sense when you look at it in the context of evolution, because it’s a pretty decent analogy. A GA “solution attempt” I mentioned above is called an “individual”, like the individual of a species. It could be in many different formats (a string, a 2D array, etc), depending on the problem. You call the group of individuals you currently have the “population”. For species in nature, the “problem” they’re trying to solve is finding the best way to survive and pass on their genes, given the environment. For real evolution and genetics, the success of an individual is somewhat binary: does it pass on its genes, or not? (I guess you could actually also consider that there are grades of passing on your genes; i.e., it might be better to have 10 offspring than 1.) For GA, the success is measured by a “fitness function”, which is a heuristic that you have to choose, depending on the problem.

For each generation, you have a population of different individuals. To continue the analogy, real species mate, and create offspring with combined/mixed genes. In GA we do something called “crossover”, in which the attributes of two individuals are mixed to produce another individual. Similarly, we introduce “mutations” to this new individual, where we slightly change some attribute of them. This is actually very important (see evidence below!), because it allows new qualities to be introduced to the population, which you wouldn’t necessarily get if you just mixed together the current population repeatedly (exactly analogous with real evolution).

So, that’s the rough idea: you have a population, you have them “mate” in some aspect to produce new individuals, you mutate those new ones a little, and now apply your fitness function (i.e., how good is this individual?), and keep some subset of this new population. You could keep the whole population if you wanted to — but the number would quickly grow so large that you’d basically just be doing the same thing as a brute force search over all possible individuals.

I was aware of GA already, but had never actually implemented one. The example they use in AIMA was the 8 Queens problem (8QP), which is a fun little puzzle. Embarrassingly, for a chess player, I had never actually solved it! So I thought I’d dip my toe into GA and do it, and also maybe characterize it a little.

So, let’s set it up! An “individual” here is obviously going to be a board state (i.e., where all the queens are). Most naively, you might think that means a board state is an 8×8 2D array, with 0’s for “no queen on this spot” vs 1 for “queen here”. But, if you look at the 8QP for a second, you’ll quickly see that they each have to be on a different row, and different column. Therefore, you can really specify a board by an 8-long array of integers, where each index/entry represents a row, and the value of that entry is the column number than queen is in. So it’s automatically constraining them to be in different rows, and it makes the coding a lot simpler.

What’s the fitness function (FF)? You want some way of deciding how good a board is. A pretty obvious one for this problem is the number of pairs of queens that are attacking each other, for a given board state. If you solve the problem, FF = 0. So for this problem, you want a lower FF.

Here, crossover is combining two boards. To do this, we just choose a row number, and then split the two parents at that index, and create two new boards by swapping the sides. It’s probably more easily explained in the code:

def mate(self,board1,board2): board1 = deepcopy(board1) board2 = deepcopy(board2) crossover = randint(1,board1.board_size-1) temp = board1.board_state[:crossover] board1.board_state[:crossover] = board2.board_state[:crossover] board2.board_state[:crossover] = temp return(board1,board2) read more

EDX Artificial Intelligence, week 4

Week 4 is where it gets really good. Week 3 was cool because it got into heuristic search, which is the start of what feels like a glimmer of “intelligence”, but week 4 is on adversarial search and games. Hot damn that’s cool. Additionally (skip to the bottom if you’re only interested in that), the project for the week was to make an AI that plays the game 2048!


Adversarial search problems are basically the same as a game (a certain type, anyway). The idea is that there’s an opponent, whom we can’t control, and they’re planning against us! So the main difference between a game and a typical search is that the optimal solution for a game isn’t a sequence of actions, but a “policy” for what to do in response to the opponent’s actions. Fortunately, they can still be very well modeled by search trees if you know how.

You can divide games up into the cross of two categories, each with two possibilities: information you see (full vs non-full) and determinism (fully vs some randomness), which will change exactly how you play the game. We’re mostly going to look at fully available information, deterministic games. Further, we’re also going to look at zero-sum games, where by definition a good thing for your opponent is bad thing for you, and the amount of “goodness” (i.e., points) in the game is conserved.

A big part of what we do here is called “embedded thinking”. This means that, to decide what choice to make, you’ll need to consider what it will allow your opponent to do, but of course what he does (to win) will need to consider what you’ll do. So you end up going back and forth (thinking alternately as yourself and your opponent) to decide your best move. To illustrate the following, we’re gonna consider two players, Max (who tries to maximize the resulting score) and Min (who…tries to minimize it).

The first search tactic we look at is the minimax algorithm. This is a recursive algorithm (but in a slightly indirect way, like I’ll explain below) that starts at the top of the game tree and tries to figure out the choice that should be made to maximize Max’s possible score, given that at each choice Min can make, Min will try to minimize the score (it’s really solving for Min as well, because you just take the opposite choice to get the correct plays for Min). It works like this. There are two pretty similar functions, Minimize() and Maximize(). Maximize is called on the top, first node (because we’re trying to maximize the tree). Within Maximize, if it’s a terminal node (game over), it returns the value of that node. If it’s not, it calls Minimize on each of the child nodes, and then, if any of those returns from Minimize are bigger than the best value achieved so far (which starts off as negative infinity in each function call so at least one child gets chosen), that node is selected and that value becomes the new best value. Finally, it returns whatever is the best node and value after calling Minimize on each of the children. What does Minimize do? Well, nearly the exact same, but with the sign switched at each place. It calls Maximize on each of its children, and tries to find the minimum value of them (starting with a minimum value of positive infinity).

So, the algorithm is recursive, but in a “flip flop” way, where Maximize() calls Minimize(), which then calls Maximize(), etc, until a terminal node is found (as opposed to what I’ve usually seen recursive algorithms do, just call themselves).

This is actually pretty familiar to anyone who’s played chess (or lots of strategy games, really). When considering a move to make, you then put yourself in your opponent’s shoes and think, “what move would I make if I were them?” Of course, you don’t always get to complete it to “terminal nodes” (i.e., checkmate), so you just kind of say “okay, 5 moves down the line, if we both do our best that I can see, this looks like it maximizes the game for me” (and of course sometimes you didn’t look far enough, and there are surprises!). But what you’re really doing is some version of the minimax algorithm!

So what can you do when you can’t really get to a terminal node? They offer 3 strategies:

  1. Use an evaluation function for non terminal nodes. That is, only go D layers deep, but have some function that tells you how good you’re doing at that node when you stop.
  2. Use Iterative Deepening Search (IDS). You basically limit the depth, keep re-calculating it for deeper depths, and then take the best result you got when you had to stop.
  3. Prune, which means cut out big chunks of the tree when you either know or suspect that it won’t give you better results.

They focus on pruning first, and specifically, a famous algorithm called alpha-beta pruning. The basic idea of pruning is that if you’re smart, and keep track of the right stuff while traversing the tree, sometimes you can skip whole sections of the tree because you realize they just can’t do better than what you’ve already done. So this is pruning generally, but alpha-beta pruning is more specifically about passing the best values for max (alpha) and min (beta) down the tree, and the propagating new discoveries back up the tree.

The specifics can actually get a little confusing, so I’ll try and explain it here for my own knowledge. The first thing to note is that alpha and beta are values at each (expanded) node, not global variables. Similar to minimax, there will be a “flip flop” recursion between Maximize and Minimize. The difference here is that each time one of them is called, they’re called with an alpha and beta value. The first time, Maximize is called on the root node with alpha = -infinity and beta = +infinity. At the beginning of Maximize (everything will be the same for Minimize, but backwards), the (best node, best value) combo for that node’s expansion is set to (Null, -infinity). Then, they go through a for loop of each of the children. For each child, Minimize() is called on it, and the resulting value is gotten. If this value is better than the best value, that value becomes the new best value. So, the first time this happens, anything will be bigger than -infinity, so it will be the new best (biggest) value. Then, this best value (whether or not it got updated) is compared to beta (the one passed to Maximize). If it’s bigger, that means that Minimize will certainly choose that beta, which is worse (for Max) than the best Max can already achieve on another branch. Therefore, we can ignore the rest of that branch, and prune.

There’s also a brief section on move ordering, which is the order of the nodes you evaluate. If alpha-beta pruning potentially allows you to not have to evaluate the rest of a node’s children, if evaluation of one of them gives you alpha > beta, you can imagine that you’d want to evaluate that node first.


Alright, now for the actual project! The cool part. The goal here is to make an AI that plays 2048. There are a few constraints. You only get 0.2 second per move, so you have to cut off your search of the game tree after that time and just return what you’ve found for your best move. Additionally, typically in the minimax algorithm, you assume your opponent (if you’re Max) wants to minimize, that is, give you the worst possible move. However, in 2048, the 2 or 4 tile that’s placed each turn is random. So there’s a good chance that you could get better scores if you programmed your AI with this in mind (i.e., if you make a move where, if the computer was playing randomly as it actually does, and there’s a 1% chance it would make the move that’s worst for you, but that other 99% is great for you, that would actually be a pretty great move to make). However, the instructions say to program it as though the game is actively playing against you, giving you the worst move at each point.

I mentioned above what you have to do when you can’t get to a terminal mode. So far they showed you trees that seemed to assume that you could see the outcomes of Max and Min choosing their best moves, and finishing the game to completion or something. I’m not sure you even could theoretically (given infinite calculation time) get all the ending game states for 2048, because it seems possible that there are some rare branches of the tree where the game could continue indefinitely (actually, is this true? If you could also place tiles yourself, and choose the moves, could the game go on forever? I’m not sure, because it seems like as the game goes on, to get higher numbers, you need to have the lower ones that build up to them present as well, which would clog the board up (edit: actually, see below for an example of this)).

Anyway, the point is that when you can’t reach terminal states in the game tree (that is, the game doesn’t end, giving you a “definite vaule” for that state), you basically have to make up your own value for a state. And that’s a heuristic, in a sense! It turns out choice of heuristic is very important and the hard problem of this assignment. At each node, you’ll calculate the heuristic for that game state, which will let you make choices.

They suggest starting by just implementing a-b pruning with the heuristic of “number of free cells”, which should let you get some combo of 256’s and 512’s if you run it repeatedly (the way they score it is to run your program 10 times and take the 5 best scores). This actually isn’t a bad heuristic, but could use a little tweaking. In general, you probably do want more open cells, and that also encourages combining them, which will tend to give you higher number tiles. So, I implemented this, and it indeed gave me a bunch of 256’s and 512’s. I also implemented IDS to start, so for a given move, I have a “depth limit” to search in the tree, which I increase, until I run out of time. That still had me at the same scores.

Okay, to the code. I’m mostly going to focus on what I did for the heuristic here, because that’s the interesting part, and honestly, I did the main parts in Minimize() and Maximize() pretty ugly.

The simplest, the starting heuristic, was just:

def heuristic(self,grid):
        #For the heuristic, higher is better for the playerAI.
        N_free_cells = len(grid.getAvailableCells())

But now let’s try to get better. I don’t know a ton about 2048 but the one thing I did know was that you want your highest value in the corner. To do that, I added another piece to the heuristic function, that just adds the value of the tile in the corner to the heuristic, still keeping the N_free_cells thing:

def heuristic(self,grid): #For the heuristic, higher is better for the playerAI. N_free_cells = len(grid.getAvailableCells()) corner_weight_val = grid.getCellValue((0,0)) return(self.weight1*N_free_cells + self.weight2*corner_weight_val) read more

Word clouds for Slack

Hey there! It’s been a while. I’ve been working on lots of stuff, but here’s a small thing I did recently.

My friends and I have a Slack we’ve now been using casually for a few years. You can download the entire logs of your Slack workspace, even if you use the free one (which will cut off the messages it shows you after 10,000 messages, I believe). So I wanted to do a few little projects with it.

One thing my friends and I were talking about was making bots that were crappy, funny imitations of us. So there would be a Declan-bot, Ben-bot, etc., that would talk like we do. Maybe we’ll try that in the future, but after doing the thing in this post, I have a suspicion that the bots might be kind of indistinguishable without extreme tailoring (though I’d love if they weren’t!).

So, what I wanted to do here was make a word cloud of each person’s total corpus in Slack. It actually all came together pretty quick, mostly because I did it in a quick, hack-y way.

First, to get the Slack data, you have to be an administrator. You go to the menu, administration, workspace settings, import/export, export, and then choose the date range. The folder is pretty huge, several GB. You’ll only get public data (not private messages), which makes sense. When you download it, it’s organized by folders corresponding to the channels everything was in, and in each of those, .json files for each day.

json is fortunately super easy to use in python, since it basically gets read as a dictionary. I made a little program that goes through recursively and gets all the files, and then piles everything together into one huge json… dictionary? That lets me easily select everything from a single user, or channel, or other aspect.

Another little detail is that it doesn’t label the users by our names (which you can change), it labels us by a unique identifier string, like U2189232 or something. So I had to make a little translation dictionary to go back and forth between names and IDs.

I decided to use this guy’s great python word cloud generator to make the word clouds. It’s even installable via pip3!

So, that’s the basics. Import all the data into one big json database/dictionary thing, choose a user, translate to their ID, grab all the text with that ID, turn it into a big ol’ list of words (with repeats), and then feed it into that word cloud generator. And it works! Here are a few:

But you’ll immediately notice a few things. One is that there’s some stuff we probably don’t want there, like http and user tags (because if you say someone’s name with the @, it just technically calls their ID and renders it as their name). Additionally, there are a ton of common words. It turns out that we like saying “yeah” and “one” a lot, so that tends to give rise to kind of lame word clouds.

This problem is actually a lot more interesting than you might think at first glance. I wanted to give the clouds a little more “personality”. That is, there are a few unique words in each word cloud that, knowing my friends, I’m able to point out and say “yeah, Max plays Magic, so he probably does say ‘deck’ a lot”, but there aren’t many of those words. What I’d really like is if the top N words of each person’s word cloud were pretty unique to them, but it turns out this is actually kind of tricky.

One thing I tried, that worked with some success, was taking the “megacorpus” from everyone’s combined corpuses (corpi? or is it like octopuses?), taking the 400 most common words from that, removing them from each person’s corpus, and then making them. This is definitely a slight improvement:

It’s not great, though. For example, there could be a word that’s used a ton by one or two users, and no one else, that might get removed. This would be a very “personal” word that I’d definitely want kept in their word cloud(s). I think it’s also hard to know where the point is where you stop removing common, lame words and start removing interesting, personal words.

Here’s what I’d ideally like: to make a list of the top N words, for each user, that are in the top N words of at most X other users I’m considering. So, if I’m making the top 10 lists for 8 of my friends, I might say that a “top 10” word can stay in a given list if 2 people have it in their top 10, but not if 3 do (then it’s just too common).

How do you do this, though? The naive way I tried it was this. Get each user’s corpus, sort by commonality (just for that corpus). So, you have a list with one occurrence for each word, sorted by decreasing use. Then, start with user 0. Start with their most common word (index 0 of that list), and check if it’s in the top N of each other user. If it is, add to the tally of how many others share that word (that starts at 0 for each new word). If more users have that word in their top N than are allowed, remove that word from everyone’s corpus. If you removed the word, you keep the index the same, but now it will be looking at a new word in the top N because the one it was just referring to just got removed. If the word didn’t have to be removed, increase the index (so now it’s looking at the next most common word in user 0’s top N). When you get to index N of user 0, go to the next user and restart. Here’s the relevant code for that:

for user in users: print('\n\ngetting unique words for',user) other_users = copy(users) other_users.remove(user) print('other users:',other_users) index = 0 while index<=N_unique: others_with_word = 0 cur_word = users_corpuses[user][index] print('\ncur word is \'{}\''.format(cur_word)) for other_user in other_users: if cur_word in users_corpuses[other_user][:N_unique]: print('{} also has this word'.format(other_user)) others_with_word += 1 if others_with_word>allowed_others: print('removing \'{}\' from all users'.format(cur_word)) for tempuser in users: users_corpuses[tempuser].remove(cur_word) else: index += 1 print('\n\nTop {} unique words for {} at this point'.format(N_unique,user)) [print('{}. {}'.format(i,word)) for i,word in enumerate(users_corpuses[user][:N_unique])] read more

Motion detection with the Raspberry Pi, part 2

Hi hi!

In this post, I’m really just going to concentrate on building the whole pipeline. It’s going to be rife with inefficiencies, inaccuracies, and stuff I 100% plan on fixing, but I think it’s good to get a working product, even if it’s very flawed. Someone I once worked for told me that projects in the US gov’t kind of work that way: there was high emphasis on getting a product out the door, even if it was hacky and awful (though hopefully not). I think that makes sense a lot of the time. It’s probably more motivating to see a project that does something to completion, even if it’s crappy, than a project that is partly carefully done, but still very incomplete. A crappy car is cooler than a really nice wheel. Also, it seems like iterative, smaller fixes are relatively easy.

ANYWAY, that said, last time I left off, I said that the things that needed to be done were:

  • Fix the sending thing to do in parallel
  • Make monitoring program on other side that adds the files, etc to a CSV file to be analyzed with pandas
  • Use keras with CIFAR datasets to figure out if detected object is car, person, etc
  • Attach lens to get better view of cars
  • Make rain shield with PVC pipe so I can leave it out for days or weeks

In retrospect, a lot of these were obvious pretty incremental, silly things (like the lens and rain shield (I guess it was also a “someday in the future” list)). In this post, I’m actually gonna cover three main things:

  • Sending detected images in parallel with the sensing
  • Making a “monitoring” program on my desktop
  • Using keras to recognize cars vs not cars in the images that are sent over

Here’s an extremely bootleg flowchart of how stuff is connected:

Parallel image detection and sending

At the end of last time, I mentioned that the images being detected and sent over weren’t great because it was detecting stuff immediately, but taking a while to send, which, in the meantime prevented new images from being detected. This is called “blocking”, since the sending is “blocking” the program from continuing until it’s done sending. There are a few solutions to this, but the one that intuitively appealed to me was using multiple processes, one responsible for capturing and saving the images, and the other for sending them to my desktop. You could also just spawn a new process for each time you want to send, I think, but I went for this.

I was a little worried that this wouldn’t speed stuff up much, because this would still be saving the image inside the camera/detection part of the program, which I assumed would be a slow operation. But, I timed it, and a whole iteration of detecting/image manipulation/saving/etc is about 30ms! So it’s a huge speedup.

So, I won’t paste the whole code because it’s large, but here are the new/instrumental parts:

def processFile(fName,remoteHost,remotePath): remoteHostPath = '{}:{}'.format(remoteHost,remotePath) subprocess.check_call(['scp','-q',fName,remoteHostPath]) subprocess.check_call(['rm',fName]) def fileMonitor(logFileName,localPath,remoteHost,remotePath): print('entering filemonitor') processedFiles = [] while True: #files = os.listdir(dir) files = glob(localPath+'/'+'*.jpg') if len(files)>0: #print('sending these files:',files) [processFile(file,remoteHost,remotePath) for file in files if file not in processedFiles] [processedFiles.append(file) for file in files if file not in processedFiles] remoteHostPath = '{}:{}'.format(remoteHost,remotePath) time.sleep(0.5) subprocess.check_call(['scp','-q',localPath+'/'+logFileName,remoteHostPath]) def cameraStream(logFileName,localPath,startDateTimeString): #Camera stuff #............................ tempFName = dateString + '_' + str(boxCounter) tempPicName = tempFName + ext cv2.imwrite(localPath + '/' + tempPicName,frameDraw) fLog = open(localPath + '/' + logFileName,'a') fLog.write("{}\t{}\t{}\t{}\t{}\n".format(tempFName,x,y,x + w,y + h)) fLog.close() #Main section pool = Pool(processes=2) p1 = pool.apply_async(fileMonitor,args=(logFileName,localPath,remoteHost,remotePath)) p2 = pool.apply_async(cameraStream,args=(logFileName,localPath,startDateTimeString)) print(p1.get(timeout=3600)) print(p2.get(timeout=3600)) read more

EDX Artificial Intelligence, weeks 2 and 3


I started this AI course with my friends a while ago, but we never ended up finishing it. I’m interested in AI these days, so I thought I’d try it on my own. Week 1 is some fluff that’s not worth going over. I’m doing weeks 2 and 3 together because there is one project for both combined. The first couple sections are just notes I took on the videos and concepts. The stuff for the project is at the bottom. read more

Motion detection with the Raspberry Pi, part 1

Okay Declan, let’s try making this post a short and sweet update, not a rambling Homerian epic about simple stuff.

I got a Raspberry Pi (RPi) and an RPi camera because I wanted to learn about them and mess around with them. If I could do image recognition with them, that’d be a good platform to do ML, NN, and if I got enough data, maybe even DS type stuff. Luckily, there’s a ton of resources and code out there already. I drew upon heavily from www.pyimagesearch.com, which is a REALLY useful site, explained very great for beginners. Two articles that I basically copied code from and then butchered were this and this.

He’s not quite doing “image recognition” in this code, it’s more like “difference recognition”. Very simply, he has a stream of frames coming in from the camera. He starts off by taking what will be considered a “background frame”. Then, for all subsequent frames, he subtracts the background from the current frame, and then looks at the absolute difference (all done in grayscale, to make it simpler) of pixels. If two frames were identical, you’d expect very little different. If an object appeared in the new frame, the difference would show that object. Then, he uses some opencv tools to figure out where the object is, and draw a box around it.

I was able to put his code together and run it pretty quickly (though I removed some stuff like uploading it to dropbox, instead doing a kind of naive thing of sending the files via scp to my other machine), producing this gif of local traffic outside my window:

Of course, the devil is in the details. If you watch it a few times, you’ll notice some weird behavior. Most obviously, boxes are detected around the objects, but then the boxes appear to remain where the object was for several frames. Here you can see it frame by frame:

Why does this happen? Well it’s actually a smart feature, but done in a somewhat clumsy way. In his code, he has the following (I combined the few relevant snippets) inside the main frame capturing loop:

if avg is None: print("[INFO] starting background model...") avg = gray.copy().astype("float") rawCapture.truncate(0) continue cv2.accumulateWeighted(gray, avg, alpha) frameDelta = cv2.absdiff(gray, cv2.convertScaleAbs(avg)) read more

Kaggle Housing challenge, my take

In this article, I’m doing the Kaggle Housing challenge, which is probably the second most popular after Titanic. This was very much a “keeping track of what I’m doing for learning/my own sake” thing, but by the end I’ve gotten a ranking of 178/5419 on the public leaderboard (LB). That said, this is super long because I tried a million things and it’s kind of a full log of my workflow on this problem.

I’ve really learned a bunch from going through this very carefully. What I did here was to try the few techniques I knew when I started, and then I looked at notebooks/kernels for this challenge on Kaggle. A word on these kernels: even the very most top rated ones vary in quality immensely. Some are excellently explained and you can tell they tried different things to try and get an optimal result. Others are clearly people just trying random stuff they’ve heard of, misapplying relatively basic techniques, and even copying code from other kernels. So I viewed these as loose suggestions and guideposts for techniques.

This challenge is a good one. In contrast to the Titanic’s classification, this is a regression for the label “SalePrice”, the price a given property sold for. Another key characteristic of this one is that it has 80 features, which is a large number (for me, anyway), and of those, a decent number of missing values (NA’s).

I’m evaluating the scores by sometimes looking at the accuracy from the score() function of sklearn models (which is out of 100, increasing score is better), and then towards the end only looking at the Root mean square logarithmic error (RMSLE), for which lower is better.

Starting off:

Using literally only the feature “GrLivArea”, the most obviously linear:

TTdat = train_df[["GrLivArea","SalePrice"]] display(TTdat.isnull().sum()) X_train, X_test, y_train, y_test = train_test_split(TTdat.drop("SalePrice",axis=1), TTdat["SalePrice"], test_size=0.3, random_state=42) lm = linear_model.LinearRegression() lm.fit(X_train,y_train) print("\nLM score: {}".format(lm.score(X_test,y_test))) print("\n\n\tLM RMSLE: {}".format(rmsle(np.array(y_test),np.array(lm.predict(X_test))))) rfr = RandomForestRegressor(n_estimators=300) rfr.fit(X_train,y_train) print("\n\nRFR score: {}".format(rfr.score(X_test,y_test))) print("\n\n\tRFR RMSLE: {}".format(rmsle(np.array(y_test),np.array(rfr.predict(X_test))))) read more

The Knapsack Problem: Discrete Optimization, week 2

I’ve been doing this Coursera Discrete Optimization course with my friends. It’s a lot of fun and I’m learning a bunch. The instructor is a total goofball too, which is a plus. I’ve taken a handful of online courses before, but to be honest, the assignments have usually been pretty easy. Not so with this Discrete Optimization (DO (my initials!)) course! Each week, you have to solve 6 problems, and each is graded out of 10 depending on how well you do. I believe the breakdown is: 0/10 if your answer doesn’t even match the output format required, 3/10 if you do basically anything (even if your answer is quite wrong), 7/10 if you have an answer above some threshold but still not perfect, and 10/10 if your answer is optimal (I guess they know the optimal solution for all of them?). Usually, the problems increase in hardness throughout the set; often, the last one is difficult enough that (I believe we saw this said in the course forums by the instructor) it would be a challenge for people who specialize in DO for a living. I think that’s pretty cool! They usually give you a ton of practice problems of various difficulties, and (though I’m not 100% sure) I think the 6 you’re graded on are usually among those.

So what is DO? I certainly didn’t know when I started this course, though I guess I should’ve been able to guess. Optimization is what it sounds like, finding the best solution you can for given problems. The “discrete” part is that the quantities involved are integers or discrete (that’s the name in the title!!!) components. It turns out I had actually heard of many of the problems that DO applies to before, but didn’t know they were DO. I had heard most of them in the context of P vs NP complexity.

Anyway, enough of that. Week 1 didn’t have an assignment, so the first one was in week 2, and it was the famous “knapsack problem”. I’m guessing if you’re reading this (sike, no one reads this!), you’re familiar with it, but in case you aren’t, it’s as follows. You have a knapsack with a certain (weight) capacity, K. There are a bunch of items in front of you, and each one has a certain weight (w) and value (v). You wanna fill your backpack with the most valuable (i.e., largest combined value) set of items you can, and you can only take the items whole (i.e., you can’t break one in half to fill the bag completely).

Because I’ll be referring to it a few times, the format of a given problem (just a text file) they give you looks like this:

n K
v_0 w_0
v_1 w_1
v_n-1 w_n-1

n is the number of items there are, and K is the capacity of your bag in this problem. v and w are the value and weight of each item. (I’ll refer to the items by their line number, starting with 0, in the following.)

Here’s where the “discrete-iness” comes into play. If you were allowed to take non-whole pieces of the items, the solution would be trivial: you would calculate the density (d = v/w) of each item, and start taking the most (densely) valuable items in decreasing order. When you get to the point where your bag can’t take the next whole item, you take as much of it as you can fit, and this is the theoretically most valuable set.

Now, for the actual problem (where you can’t take fractional pieces of items) it turns out that this is still often a decent solution, for intuitive reasons; denser items are giving you more bang for your buck. This is called the “greedy solution”, because it just greedily takes the best item it can at any moment. However, the problem gets interesting when that strategy will actually give you sub-optimal results. In fact, helpfully, the simplest problem they give you already has a non-greedy optimal solution (OS):

4 11
8 4
10 5
15 8
4 3

The items already happen to be ordered by decreasing density. If you did the greedy solution and took item 0 (8, 4) and then item 1 (10, 5), you couldn’t take any more items and your total value would be 18. However, this isn’t OS! You can see that you actually wanna take the least two densest items, 2 and 3, which perfectly fill the bag and also add up to 19, the OS.

So, that’s the problem! Now, how do you actually solve it? (And, what does it mean to solve it?)

The most naive way you can imagine is to try every possible set, and take the largest valid (i.e., all the items fit in the bag) solution. You have a list of n items, so for each one, you can either take it (1) or leave it (0). So there are really only (hah!) 2^n possible sets of items. Naturally, this lends itself to a tree data structure, but it should also be clear that 2^n quickly becomes infeasible for surprisingly low values of n.

Therefore, the real game is in traversing that tree in some clever way such that you don’t have to go through the whole tree. I should make a distinction here. There are two types of solution you can solve for. You can go for the OS, where you’re 100% sure the solution you end up with is optimal. This actually doesn’t necessitate going through the whole tree. Sometimes you can know that you don’t need to go farther down a tree, which allows you to not have to look at the solutions contained within that part of the tree, but still know that a solution you find will be the OS. On the other hand, sometimes this is still infeasible, so you have to go for an approximate solution (AS), where you’re just doing the best you can. However, not requiring the solution you get to be the OS can let you find great solutions way faster, and still sometimes even let you find the OS. The downside is that you won’t know 100% that you got the OS.

So let’s actually start. The way my friends and I originally did it (and I stuck with), we actually don’t explicitly have a tree data structure; however, by dint of doing recursion with two possible recursive calls, this implicitly forms a tree.

The first thing done is reading in the data and storing it in a list of namedtuple Item objects, which I believe was given to us. Then, because we’re going to go through the tree in order of decreasing density, we sort the list by that:

def sortListDensity(inList): return(sorted(inList, key=lambda item:-item.density)) lines = input_data.split('\n') firstLine = lines[0].split() item_count = int(firstLine[0]) capacity = int(firstLine[1]) items = [] for i in range(1, item_count+1): line = lines[i] parts = line.split() items.append(Item(i-1, int(parts[0]), int(parts[1]),float(parts[0])/float(parts[1]))) sortedList = sortListDensity(items) read more

Grouping IMDb top movies by runtime


This is a fun lil one. For an upcoming article, I need to know a list of (hopefully good) movies I haven’t yet seen, with similar runtimes. Now, I could have just scrolled down the list of IMDb.com’s top 250 movies, ctrl + clicking on the ones I haven’t seen, and then compared them by eye, because, to be honest, I think I’ve seen many (/most?) of them (we’ll see shortly).

But of course, that would be an efficient use of my time, and I’m learning pandas these days anyway, so why not use it!

To do what I want, I basically need to take that top 250 list (let’s say I don’t really care about the ratings within that list, I just want to select movies from it), get a column with the runtimes, and then manually make a column with 0/1 entries for if I haven’t/have seen it, and then select for the ones I have. Then, I could group, or at least visualize them.

The first (tiny) hurdle comes from the data source. IMDb.com very helpfully provides several zipped files of ALL their movies (beware, the “basics” one is 420MB unzipped!), buuut… they have separate files for the table with the runtimes (title.basics.tsv.gz) and the ratings (title.ratings.tsv.gz). That wouldn’t be too bad, you might think: you could just sort the ratings file by the rating column, take those entries, and select from the basics file, to get the runtimes.

Buuuut… a quick glance (or if you’ve ever just perused the dark back alleys of IMDb itself) reveals that there are many entries with very high ratings, higher than the top 250 scores (which range from 8.0 to 9.2). This is probably because there are lots of smaller productions where you get a selection bias, such that you pretty much only get people who really like the movie voting, so they’re all voting 10. Of course, IMDb links actually links to an explanation of this effect:

As indicated at the Top Rated Movies page, only votes from regular IMDb voters are considered when creating the top 250 out of the full voting database. This explains any difference between the vote averages reported in the top 250 lists and those on the individual title pages. This also explains why movies or shows you might think from their averages ought to appear on the list yet do not actually appear there.

To maintain the effectiveness of the top 250 lists, we deliberately do not disclose the criteria used for a person to be counted as a regular voter.

and says on the top 250 page itself:

The list is ranked by a formula which includes the number of ratings each movie received from users, and value of ratings received from regular users
To be included on the list, a movie must receive ratings from at least 25000 users

I assumed this before starting this thing but wasn’t sure how they did it. I always assumed that the simplest way would be to just have some threshold of a minimum number of votes (which they do), to even qualify. The “small, religiously devoted fanbase theory” of those artificially high ratings would probably break if you set it correctly — I mean, once you set the threshold of minimum votes high enough, if the rating is still high, it’s not really “artificial” anymore, is it? There are potential problems with that, like only selecting for really large productions (depending on the threshold). It appears they actually do this, but also add a secret “special sauce” where they weigh certain votes more, but they don’t say how.

Anyway, that’s a bit of a long winded way to say that it’d be hard to do what I said above, to get the runtimes of the top 250. At this point, I saw a few options: I probably could try their method manually, using the ratings file (which has average ratings and number of votes for each one), just taking the subset of movies that have a rating at and above the minimum of the top 250 list, and then thresholding those for the minimum number of votes. Maybe I’ll try this anyway, but I assumed (because they say they do something else in addition) that I might end up with another list if I did that.

Another thing I briefly considered (that, at this point, may have been much easier) would be some sort of web scraping. It’d be reaaaaaal easy (in theory) to have a script go to the link for each entry on the top 250 page (which would lead directly to the actual movie, which as we’ll see shortly, is actually a bit of a pain), and then each page has a well defined “runtime” field right below the title. I briefly debated this (and maybe I’ll try it later), but I don’t actually know how to do web scraping in python yet, so it would probably be a really hacky job on my part.

So, speaking of hack-y, here’s what I ended up doing. Everything is presented in bits because I did it in a Jupyter notebook.

To get around the “which are actually the top 250 movies” problem, I literally copied and pasted the top 250 list from the page, and pasted it into a text file, which I imported and dropped two things that ended up being garbage columns. Then I had to do a tiny bit of parsing, because using “\t” as a delimiter worked to separate the title and year, but not the rank and title. So, I had to delimit that column with the “.” after the rank #, but setting n=1, because some titles have a period in their name as well (Monsters, Inc. for example), so you wouldn’t want to chop those up into separate fields. Ahhh details!

movieRatings_df = pd.read_csv("copypasteRatings.csv","\t") display(movieRatings_df.head()) movieRatings_df = movieRatings_df.drop(["Your Rating","Unnamed: 3"],axis=1) display(movieRatings_df.head()) dotsplit = movieRatings_df["Rank & Title"].str.split(".",n=1) titleYears = pd.Series([entry[1] for entry in dotsplit]) rank = pd.Series([entry[0] for entry in dotsplit]).rename("Rank") years = pd.Series([int(title[-5:-1]) for title in titleYears]).rename("Year") titles = pd.Series([title[1:-7] for title in titleYears]).rename("Title") ratings_df = pd.concat([rank,titles,years],axis=1) ratings_df.head() read more