Training a real robot to play Puckworld with reinforcement learning

After I trained an agent to play “puckworld” using Q-learning, I thought “hey, maybe I should make a real robot that learns this. It can’t be that hard, right?”

Hooooooooo boy. I did not appreciate how much harder problems in the physical world can be. Examples of amateurs doing Reinforcement Learning (RL) projects are all over the place on the internet, and robotics are certainly touted as one of the main applications for RL, but in my experience, I’ve only found a few examples of someone actually using RL to train a robot. Here’s a (very abridged!) overview of my adventure getting a robot to learn to play a game called puckworld. read more

Beating OpenAI games with neuroevolution agents: pretty NEAT!

Let’s start with a fun gif!

Something I’ve been thinking about recently is neuroevolution (NE). NE is changing aspects of a neural network (NN) using principles from evolutionary algorithms (EA), in which you try to find the best NN for a given problem by trying different solutions (“individuals”) and changing them slightly (and sometimes combining them), and taking the ones that have better scores. read more

Solving the Brachistochrone and a cool parallel between diversity in genetic algorithms and simulated annealing

In my first post on Genetic Algorithms (GA), I mentioned at the end that I wanted to try doing some other applications of them, rather than just the N Queens Problem. In the next post, I built the “generic” GA algorithm structure, so it should be easy to test with other “species”, but didn’t end up using it for any applications.

I thought I’d do a bunch of applications, but the first one actually ended up being pretty interesting, so… here we are. read more

Training an RL agent to play Puckworld with a DDQN

Last time I messed around with RL, I solved the classic Mountain Car problem using Q-learning and Experience Replay (ER).

However, it was very basic in a lot of ways:

  • There are really only two actions, and the state space had only two dimensions (position and velocity).
  • The way I was representing the state space was very simple, “coarse coding”, which breaks the continuous state space into discrete chunks, so in a way it still has discrete states. More interesting problems have continuous, many dimensional state spaces.
  • The representation of Q was just a state vector times a weight vector, so just linear. You can actually get a decent amount done with linear, but of course all the rage these days is in using neural networks to create the Q function.
  • The problem was very “stationary”, in the sense that the flag (where the car wanted to go) was always in the same place. Even if I had the flag move around from episode to episode, the strategy would always be the same: try to pick up enough momentum by going back and forth. A more interesting problem is one where the goal moves.

Genetic Algorithms, part 2

Last time, in case you missed it, I left off with a laundry list of things I wanted to expand on with Genetic Algorithms (GA). Let’s see which of those I can do this time!

This is pretty wordy and kind of dry, since I was just messing around and figuring stuff out, but I promise the next one will have some cool visuals.

Using Reinforcement Learning to solve the Egg drop puzzle

So last time, I solved the egg drop puzzle in a few ways. One of them was using a recent learn, Markov Decision Processes (MDP). It worked, which got me really stoked about them, because it was such a cool new method to me.

However, it’s kind of a baby process that’s mostly used as a basis to learn about more advanced techniques. In that solution to the problem, I defined the reward matrix and the transition probability matrix , and then used them explicitly to iteratively solve for the value function v and the policy p. This works, but isn’t very useful for the real world, because in practice you don’t know  and , you just get to try stuff and learn the best strategy through experience. So the real challenge would be letting my program try a bunch of actual egg drops, and have it learn the value function and policy from them.

Some mathy tesselating stamp art!

I was recently at the art store for some reason, just browsing. I found the linoleum stamp section at the back and immediately wanted to make some! We had made them in 5th grade art class or something, and I remember liking it a lot, but had never since then. They’re kind of the perfect type of art for me, since I seem to like 3D things with more of a “crafts” element. I like carving/whittling anyway, so this was perfect.

I grabbed a few (pretty cheap), and on the way home thought of what I’d do: make a square stamp with weaving paths, asymmetric, such that it could be stamped out in a grid to either create cool repeating patterns, or random ones.

First I’ll briefly talk about the craft section, and then the coding part. If you wanna see the big figures, skip to the bottom!

I bought two, so I could have the small square be a “1×1” unit square “tile”, and the bigger one be a 2×2:

It turns out the sizes they are sold as aren’t totally accurate, so I had to resize them:

The linoleum is real easy to cut with a thin scroll saw, but the wood was pretty annoying.

Ta da!

Here’s the 1×1 tile. I tested a few patterns and decided to put “ticks” (i.e., entry/exit points for the paths) at the quarter, half, and three quarters marks of each side, so when you rotate it and put it next to another, it will match up with them and continue the paths, sometimes creating closed loops (either between adjacent ones, or over many tiles).

For the pattern, it was somewhat arbitrary, but I wanted a combination of ones that would connect the same side to itself, to adjacent ones, and ones across (you can see the one I chose in the pad to the back):

I also wanted it to be totally asymmetric, since it would offer more pattern making.

The carving was pretty easy. The carving kit I got had “scoop” blades that I originally though would be more useful, but it turns out that it’s easier and cleaner to just use a straight blade and cut at a “V” angle into it to make the channels.

Additionally, I’m not sure if you can see here, but I intentionally put a little gap in between paths when one crosses another, to give the impression that it’s crossing “under” the other, adding a bit of depth.

So how does it look?

Neato! The print was splotchy, but I love the wandering, wild patterns it quickly makes.

The 2×2 was even more fun, because I had a lot more space to go nuts. It now had 6 ticks per side, which let me do some pretty wide, sweeping circles. Here it is carved next to the 1×1:

It was a bit of a pain to line up the ticks exactly after carving so many, but it only took a few adjustments and testing all sides against each other to make them pretty decent (keep in mind that you’re not gonna have a ton of precision with stamping anyway):

So how does the 2×2 look?

PRETTY SWEET IN MY OPINION.

Here’s the 2×2 first:

Then for fun, I did another 2×2 and a bunch of 1×1’s in the area around them:

I was doing it kind of fast and loose, so the edges don’t line up perfectly. Still, I love the swooping, huge circles created by the 2×2. It seems like I made the lines a little thicker for the 1×1, so maybe I’ll go back at some point and widen the ones for the 2×2.

So that’s the craft part! I dunno what I’ll do with them. I think it would be pretty cool to cover a wall in them a some point or maybe make a shirt if I get some fabric ink.

 

Now let’s code!

I could (and will!) make lots of prints, but I wanted the ability to test out large scale patterns of various configurations of the blocks without having to manually print them. Then, in the process of doing that, I found some nifty patterns that occur.

The first thing I did was take the above photos, crop out and perspective shift the tiles out of them to make them square, then “posterize” them to decrease the number of colors used. This allowed me to easily grab the basic shapes of the tiles and make new, cleaner images with them:

From here, it was pretty to use PIL (python image library) to whip up a program with a Board class, which uses a Tile class, to create large scale patterns quickly. Here’s the most simple, plotting just the 1×1 tile repeating uniformly:

It already forms a neat thing! If you follow a single path, you’ll see that it goes down 1 and right 1 each step, in a winding path.

What if we add the smallest of variation? To do this, first lemme define a few things. To be consistent with PIL, the origin starts at the upper left, the x coordinate is to the right, and the y coordinate is down. I can rotate a given tile, and PIL rotates CCW. There are only 4 unique rotations, so rot=0 is not rotated at all (i.e., the rastered 1×1 tile above), rot=1 is rotated 90 degrees CCW, etc.

Here, what I’ll do is create horizontal stripes. To do this, as I add tiles, I’ll just rotate each tile in a row by its y value, modulo some value. So, if I do mod=2, the stripes will be every other tile. For example, here’s the relevant code for that:

def hStripesPopulate(self,mod=2):
  self.label = 'hstripes_mod'+str(mod)
  for i in range(self.N_board):
    for j in range(self.N_board):
      self.insertTile((i,j),type=1,rot=j%mod)

The important part is the rot=j%mod. Here’s what that looks like!

Daaang. I dunno about you, but I find that really pleasing.

Here’s mod=3:

mod=4:

Mod 2 and 4 look pretty similar at a glance, but if you look closer they’re actually a bit different.

I won’t bother with vertical stripes, because they’re pretty much the same aggregate, just rotated. But, what’s really cool, is a “both stripes” pattern, where I basically do the same thing as above, but use rot=(i+j)%mod:

def bothStripesPopulate(self,mod=2):
  self.label = 'bothstripes_mod'+str(mod)
  for i in range(self.N_board):
    for j in range(self.N_board):
      self.insertTile((i,j),type=1,rot=(i+j)%mod)

Here’s mod=2:

I looooove that squiggly, repeating closed path. If you notice, it actually has a bit of an MC Escher-y vibe due to the overlapping, if you follow the path, because it’s always overlapping the next one it crosses.

mod=3:

mod=4:

This one is even cooler to me. It has two enclosed paths!

Now let’s try a few other random things. Here’s using rot=(i*j)%mod.

mod=2:

That one actually is just a series of closed paths, although you wouldn’t guess at a glance! Try following one.

mod=3:

That one takes a very long, wandering path before it repeats again, if you follow it.

mod=4:

This one is neat, because it has two different types of paths, for each row, that weave around each other, but never connect.

Here’s a neat one! A radial one, where it rotates it by roughly how far it is from the center, rot=floor(sqrt(((i-self.N_board/2)**2+(j-self.N_board/2)**2)))%mod.

Here it is for size 15:

If you follow the path at the very center of the image, it takes a VERY long path for being a fairly small board.

zooming out to give a bit more of the large scale pattern, 30:

You can see that, since the rotation has to be an integer, it doesn’t have the most precision in its “circle-y-ness”, but you get the picture. There’s definitely some weirdness happening at the corners.

Okay, I promise I’m almost done!

A cool one is using the Fibonacci sequence, rotating each tile of the double for loop above by Fibonacci[i*j + i]%4 (so it’s like you’re just counting across and down the grid, but using that term of the Fibonacci sequence):

You can see (if you use the little circles as a guide) some very interesting, non repeating behavior!

Lastly, to make a more classic tessellation, I did a Pythagorean Tiling, which you’ve almost certainly seen on a bathroom floor somewhere. The idea is that you have a big square that’s one color (for example), that’s twice as big as a small square, that’s another color. Putting the small square in the upper right of the big square allows you to tile the whole plane. For my application, the big square was just 4 tiles of the same rotation, and the smaller one is one tile of another rotation. You can also make the bigger square size (its width) any integer number of the smaller tiles, but I found that 2 looks the best.

Here are a couple variations of that, just rotating the smaller one differently with respect to the bigger one:

To do this, each entry in the for loop (where I insert tiles) is actually entering a “unit” of a big square (which itself is entering four 1×1 tiles), and one 1×1 tile. I did this in a bit of a wonky way (where it plots more tiles than necessary), but it made the coding real easy because a Pythagorean Tiling produces its own “grid” (see the Wiki article) which is easier to code with respect to:

for x in range(-int(self.N_board/2),self.N_board):
  for y in range(-1,self.N_board):

    i = big_sq_size*x + y
    j = big_sq_size*y - x

    #The big square
    for a in range(big_sq_size):
      for b in range(big_sq_size):
        self.insertTile((i+a,j+b),type=1,rot=bigrot)

    #the smaller square
    self.insertTile((i-1,j),type=1,rot=smallrot)

Here, x and y are the coordinates in the “Pythagorean grid”, and you get i and j for each of them.

 

Okay, so at this point I got a little curious. It seems like some patterns give rise to a ton of different paths, while some cause a minimum of them, and I was wondering if I could quantify what to expect, or at least find some pattern in them. One thing you can say is that the number of paths per tile (in a large board of them) is bounded, in general (like, considering other 1×1 tiles I didn’t make but you could). If you had a small 1×1 tile with 3 ticks on each side, but any arrangement of the lines connecting the ticks within it (such that they’re only connected in pairs), the smallest number of paths would be if you just connected each tick to the other tick straight across it, like this:

This would be that orientation doesn’t matter, and that as you add more tiles, it’s always just continuing existing paths, decreasing the (# paths/# tiles) ratio. So if you added another row of N tiles here, you’d get 3 more paths but increase the number of tiles by N, meaning it tends towards 6 paths/N tiles, or 0 for large N.

On the other end, you can only make so many paths per tile. It wasn’t immediately clear how to do it with 3 tick/side (actually, is there a way? or are odd number tick sides more bounded?), so I just sketched it with 4 ticks per side, where you can connect each with a pair, ensuring that there’s 4 paths per tile when you repeat it:

So you’re never getting more than (N_ticks/side) per tile paths, and you can basically get as low as zero paths/tile.

However, I’m still not sure how to predict, given just the tile and how we’ll orient it, how to predict how many paths it could produce.

As a starting point, I plotted the # paths/# tiles for a given layout pattern, for a bunch of sizes of boards. Here’s the one for the simplest, the uniform one:

It goes as the sqrt, which I thought was strange. But, if you look at this one, you can see that it actually makes sense. When you add the next “layer” of tiles, it continues all the paths there were already, but also adds one extra path for each tile:

Something interesting happens if you do it so each tile is placed with random orientation:

I tried to plot lines that seemed like they were similar to what it tends towards, but I dunno (the jumpy parts are where I added a couple samples at that same # tiles to see how much variation in the randomness there is). It seems pretty clear that the ones for low # tiles (which there are a bunch of samples for) do tend towards sqrt(# tiles), but then it evens out towards linear. I also plot the number of closed paths, and I think it’s interesting that it looks like it tracks it as a proportion of the total number fairly well.

For curiosity, here’s a random one:

I won’t try it today, but I think one approach that could work for this (the random one) is using some statistical mechanics-y technique. Maybe something where, for a given ensemble of already placed tiles, I try to figure out the chance of another tile placed on the edge cutting off existing paths vs adding a new one. This is one of those things where I bet if I spent a bit of time looking into it, it’s an insanely well known field that many people have written books on. (I mean, I know tessellations are for sure, but this has the paths element to it too. Does anyone know what this might be called?)

I’ll probably look at this stuff in another post, since it’s intriguing me. Well, that’s about all for now. I have a bunch of ideas I’ll try next time. Some more functions of x and y, sequences, etc. I also want to build more “units” (like the Pythagorean tiling), where I can engineer them to have extremely long paths that are nevertheless closed. Speaking of, I want to also characterize the average path length for different configurations (in # tiles it crosses, not counting the actual distance on a tile). I think with engineering bigger unit groups of 1×1 tiles could give some really interesting behavior. I also want to figure out what “rules” would be specific to the 1×1 tile I did, vs any tiles. Lastly, I didn’t even use my 2×2! It turned out that it was complicated enough with just the 1×1.

The (messy) code for all this is here.

The egg drop puzzle: brute force, Dynamic Programming, and Markov Decision Processes

I first heard this puzzle when taking an algorithms class in undergrad. The prof presented it as a teaser for the type of thing you could solve using algorithmic thinking, though he never told us the answer, or what the way of thinking is. Then, it more recently came up with my friends while we were hiking, and we were talking about it. I’ll talk about what I have so far, but first let’s say what the puzzle actually is.

There’s a building with 100 floors. You have two identical crystal eggs. They will break if dropped from (or above) some height (the same height for both), and you’d like to find that height using the fewest number of drops possible. If you drop an egg from some height and it doesn’t break, you can use that egg again. Once an egg is broken (i.e., you dropped it from that breaking height or above), you can’t use that egg again. So the question is, what’s the best dropping strategy?

So you can use the first egg to do a faster “search” between floors 1-100 (i.e., if you drop it at floor 20 and it doesn’t break, you’ve saved searching floors 1-20!). However, once your first egg is broken, the second egg has to be used to “scan” the remaining floors from the highest one you know it doesn’t break at to the one you just broke it from. For example, if you dropped your first egg at 20 and it didn’t break, and so you tried again at floor 40 and it broke, you know the breaking height is somewhere between floors [21-39], so you have to drop your second egg on floor 21, 22, 23, etc, until it breaks and you’ve found the floor.

So you can tell that there’s a bit of a balance here between “search” and “scan”. The more aggressively you search with the first egg, the more you get to skip floors when it doesn’t break, but you also have to scan a larger number of floors in between when it does.

Another detail is that this problem is a little ill-posed (at least what we remember): it’s unclear whether the question wants the average number of drops needed for a strategy to find the height, or maybe the least-bad worst case (i.e., if someone knew your dropping strategy and they could choose the floor to make you have to use the max number of drops) ? The average number seems better, but I think it actually ends up not mattering much (see below).

I solved this in three ways. First I did a brute search thing, which should give the actual average/worst case numbers, because it’s literally trying a given strategy for the egg being on each floor, and then averaging the results. Then I do a method Phil suggested based on Dynamic Programming. Last, I do a similar thing, but with a Markov Decision Process.

 

Brute force

In this section, I’m going to try the brute force method. I.e., for a given drop strategy I test, I’m going to build an “ensemble” where I have it use that strategy for every possible “break floor” (the floor the eggs happen to break on). This will give me the definitive average and worst case numbers for each strategy, though it will be only “empirical”; i.e., I’ll only get the best strategy for ones I try, but not necessarily the best strategy there is (see below for that!).

So I’ll admit: both when I heard this problem long ago and when I heard it again here, my mind instantly went to a bisecting log type search. That means drop the first egg on floor 50. If it doesn’t break, you split the difference of the remaining floors and do 75. If it doesn’t break again, drop from 88 (ceil(87.5)), etc. I guess I carelessly thought that because it seems like so many CS things end up being like that, but I should’ve thought more!

Here’s my quick and dirty code for trying the bisecting search. Excuse any messiness; I wanted to keep those comments in for diagnosing different variants I tried during this post and I’m kind of a code hoarder. The instrumental line is the curFloor = ceil((buildingHeight + curFloor)/2) one, which defines the next floor that it will be dropped from.

import matplotlib.pyplot as plt
import numpy as np
from math import log,ceil,floor

minList = []
maxList = []
avgList = []

buildingHeight = 100

dropCountList = []

for breakFloor in range(1,buildingHeight+1):

    firstEggWhole = True
    drops = 0
    lastUnbrokenFloor = 0
    curFloor = floor(buildingHeight/2)
    #print()
    while firstEggWhole:
        drops += 1
        #print("drop {}: dropping from floor {}".format(drops,curFloor))
        if curFloor>=breakFloor:
            #print("first egg broken dropping from floor",curFloor)
            firstEggWhole = False
        else:
            lastUnbrokenFloor = curFloor
            curFloor = ceil((buildingHeight + curFloor)/2)

    if curFloor==breakFloor:
        #print("have to search {} more floors ({} to {})".format(((breakFloor-1) - lastUnbrokenFloor),lastUnbrokenFloor+1,(breakFloor-1)))
        drops += ((breakFloor-1) - lastUnbrokenFloor)
        #print("total drops:",drops)
    else:
        #print("have to search {} more floors ({} to {})".format((breakFloor - lastUnbrokenFloor),lastUnbrokenFloor+1,breakFloor))
        drops += (breakFloor - lastUnbrokenFloor)
        #print("total drops:",drops)

    dropCountList.append(drops)

minDrops = min(dropCountList)
maxDrops = max(dropCountList)
avgDrops = sum(dropCountList)/len(dropCountList)
print("min: {}, max: {}, avgDrops: {}".format(minDrops,maxDrops,avgDrops))

So I calculate both the max (worst case) and avg of the ensemble of the 100 cases of when the breaking height is on each floor. I get:

min: 2, max: 50, avgDrops: 19.12

That 50 is really a killer, because for half of the ensemble (floors < 50), you lose your first egg immediately and then have to scan up to it. That’s painful and probably why this method doesn’t work.

Anyway, my smarter friends more immediately thought of a constant search method, where the first egg has some stride that it checks at until the egg breaks. So if you had a stride of 20, you might take the first egg and check at 20, 40, 60, etc, until it breaks, and then scan the rest. It’s pretty easy to calculate the worst case for this method; it’s basically setting the breaking floor to use the first egg as much as possible, and then set it again to make the second egg have to scan as much as possible. So for 20, you want to make the first egg do 20, 40, 60, 80, and break on 100, meaning it will have to use the second egg to search 81-99.

Like above, there’s definitely going to be some sweet spot where the aggressiveness of the search stride isn’t outweighed by the cases where you have to scan a bunch with the second egg. My friends immediately recognized that it was probably a stride of 10, and it’s probably not a coincidence that 10 = sqrt(100). So I made another quick dirty little program (be merciful pls) to try a bunch of these strides and plot the avg and worst cases:

import matplotlib.pyplot as plt
import numpy as np

#skipLengthList = list(range(15,70))+[100,200,300,500,800,980]
skipLengthList = list(range(1,20))+[30,40]
minList = []
maxList = []
avgList = []

buildingHeight = 1000

for skipFloorLength in skipLengthList:
    dropCountList = []
    #print("\nskipFloorLength =",skipFloorLength)
    for breakFloor in range(1,buildingHeight+1):
        firstEggWhole = True
        drops = 0
        lastUnbrokenFloor = 0
        #skipFloorLength = 5
        curFloor = skipFloorLength
        #print()
        while firstEggWhole:
            drops += 1
            #print("drop {}: dropping from floor {}".format(drops,curFloor))
            if curFloor>=breakFloor:
                #print("first egg broken dropping from floor",curFloor)
                firstEggWhole = False
            else:
                lastUnbrokenFloor = curFloor
                curFloor += skipFloorLength

        if curFloor==breakFloor:
            #print("have to search {} more floors ({} to {})".format(((breakFloor-1) - lastUnbrokenFloor),lastUnbrokenFloor+1,(breakFloor-1)))
            drops += ((breakFloor-1) - lastUnbrokenFloor)
            #print("total drops:",drops)
        else:
            #print("have to search {} more floors ({} to {})".format((breakFloor - lastUnbrokenFloor),lastUnbrokenFloor+1,breakFloor))
            drops += (breakFloor - lastUnbrokenFloor)
            #print("total drops:",drops)

        dropCountList.append(drops)

    minDrops = min(dropCountList)
    maxDrops = max(dropCountList)
    avgDrops = sum(dropCountList)/len(dropCountList)
    #print("min: {}, max: {}, avgDrops: {}".format(minDrops,maxDrops,avgDrops))
    minList.append(minDrops)
    maxList.append(maxDrops)
    avgList.append(avgDrops)


avgListArgMin = np.argmin(np.array(avgList))
maxListArgMin = np.argmin(np.array(maxList))
print('best skip length in terms of avg case:',skipLengthList[avgListArgMin])
print('best skip length in terms of worst case:',skipLengthList[maxListArgMin])
print('best avg case:',avgList[avgListArgMin])
print('best worst case:',maxList[maxListArgMin])
avgLine = plt.plot(skipLengthList,avgList,'bo-',label='avg')
maxLine = plt.plot(skipLengthList,maxList,'ro-',label='worst')
plt.axvline(skipLengthList[avgListArgMin], color='b', linestyle='dashed', linewidth=1)
plt.axvline(skipLengthList[maxListArgMin], color='r', linestyle='dashed', linewidth=1)
plt.xlabel('skip floors length')
plt.ylabel('# of drops')
plt.title('building height = '+str(buildingHeight))
plt.legend()
plt.show()

So you can see that my friends were right. The dotted lines show the positions of the best values for the two metrics (the red one is actually a plateau at that, it just selected 8 because argmin() selects the first value it finds, so ignore that). Interestingly, the best avg case is also about 10. Hmmm. You can also see that the worst case pretty much perfectly tracks the average case.

To check, I also tried it with a building size of 1000 (and therefore, a step size of floor(sqrt(1000)) = 32, which gave similar results:

Same deal. Here’s a neat little detail, though. This is the very “zoomed in” set of test strides, because I guessed it would be around the sqrt(building height). If you zoom out:

In the region that’s a bad strategy anyway, the worst case increases as you might expect, but you see some pretty interesting behavior of the average case, where there are different “regimes” or something. Maybe I’ll investigate this more at some point, but I wonder if it’s a coincidence that there’s what looks like a cusp at (building height)/2…

Anyway, one last thing for now. When coding this, I realized that if you’re doing the optimal stride of sqrt(building height), and your first egg doesn’t break… well, it’s almost as if you’re restarting the problem, but with a slightly shorter building! So, you shouldn’t use the same stride that was calculated for the “original” building. That is, if building height = 100 and stride = 10, then if it survives the 10 and 20 floor drops, you still have two eggs and now it’s kind of like you’re doing the same problem with building height prime = 80, and ceil(sqrt(80)) = 9. So if you adjust this as you drop for each case, it should be better.

And it is! …very incrementally.

Here it is for building height = 100:

and 1000:

(I just included more output so you can see it adjusting the stride.)

So it’s better but not crazy better. At this point, there’s a really good chance this method (best stride = sqrt of remaining floors) is the best strategy for this more general method (the “stride” method), though I still haven’t actually proven it. But even if I did, that would only prove that it’s the best strategy within the stride method. How do we know there isn’t something better?

 

DP method

My much smarter friend Phil came up with a very clever way to get what has to theoretically be the best strategy, independent of any general strategy like the stride thing, using concepts from dynamic programming (DP). At the time I didn’t understand it beyond the vaguest idea of “build up from the solutions to smaller subproblems”, but since then I’ve learned a tiny bit about DP. So, I’m going to try it here in a few ways!

Until recently, my only experience with DP was a brief mention of building the Fibonacci sequence up from smaller terms we did in some CS class ages ago. I’ll take the Fibonacci sequence as an example. It’s frequently defined recursively as f(n) = f(n-1) + f(n-2), with the base case of f(1) = 1 and f(2) = 1. So you can calculate it that way. If you want f(100), now just calculate f(99) and f(98), and then to calculate each of those, calculate… etc. The problem is that, if you called this in the most naive way with a recursive function, it would explode into a huge number of terms, and more importantly, very redundant terms. For example (if you’re trying to get f(100)), calculating f(99) needs f(98) and f(97), but you also need f(98) for f(100). So it’s basically totally impractical for anything large.

An alternate way, the DP way, is to “build up to” the goal you want. So if we want f(100), we calculate f(3), which is easy since we have f(1) and f(2). Then f(4) from f(3) and f(2), etc. So you can see that it’s way easier and involves storing way fewer numbers (basically just a single list of the ones you have calculated so far). Incidentally, this feels like a pretty contrived example, since I don’t think any person who isn’t already familiar with and eager to use recursion would define the Fibonacci sequence that way. I’m pretty sure I’ve usually heard people say “start with 1, 1, and then add the last two numbers to get the next number”. So, peoples’ intuitive default seems to be more like DP anyway.

But the point still stands: if we can break the problem into sub problems and then solve those easy subproblems first, the bigger problem will be a lot easier.

As I mentioned above, an important part of this problem is that, if there are 100 floors and you initially drop it at d=10 and it doesn’t break, you now have to solve it for floors 11-100. However, since you still have two eggs and there’s nothing at all special about these remaining floors, the remaining problem is completely identical to solving the initial problem, but with 90 floors instead of 100!

So here’s my strategy. There’s inherently a probabilistic nature to this that gets neatly taken care of with this formulation. I define a function v(f,d,e), which is the average number of egg drops that will be needed if there are f floors left to search, you drop the egg from floor d, and you have e eggs left, and for all following drops, you choose the optimal drop height. The last part is important because it mean that if you’re calculating v(100,78,2), which is obviously a bad move (from what we’ve seen before, d=78 with 100 floors), we’re calculating that quantity as if after that drop, all the following choices are optimal. When there are two eggs, I’ll refer to the optimal value of v(f,d,2) as v(f) as shorthand.

This function can be divided up into two groups: 1 egg, and 2 eggs. 1 egg is simple, you just have to scan up from your highest non-break floor to the top. Therefore, v(f,*,1) = f-1, where I write * because it doesn’t matter.

For two eggs, for a given drop, there are really two possibilities: it breaks or it doesn’t. So we can define v(f,d,2) as the probability of it breaking times the drops needed if it breaks (v(d,*,1) because now you only have to scan up to floor d) plus the probability of it not breaking times the drops needed for solving the new subproblem with two eggs (v(f-d)).

If there are f floors left and you drop it from d, assuming the probability of it breaking from each floor is equal, then the chance of it breaking is (d/f) and the chance of it not breaking is ((f-d)/f).

So, using the above: v(f,d,2) = 1 + (d/f)*v(d,*,1) + ((f-d)/f)*v(f-d). The +1 is because you need to include this current drop for all the ones you consider.

And now we have a recursive relationship that we can use DP to build from! Remember, that v(f-d) is the number of drops using best play.

Lastly, how do you actually use this? It’s assuming you can just plug in the value for the best play, but how do we get that? Let’s just start.

So we start with v(1) and v(2). If you have one floor to search, I think we’re basically defining the problem to say it’s not at height 0, so you can assume that you don’t have to search at all because you know it has to break on floor 1 (maybe this is wrong, but it shouldn’t change the principal or the answers more than 1, I think). So v(1) = 0. For v(2), you can either do d=1 or d=2. If you do d=1, and it breaks, you’re set. If it doesn’t, you know it breaks on 2 anyway, so still good. If you do d=2, it has to break, but you still don’t know whether it would have broken on d=1, so you haven’t learned anything and have to try again.

So: v(2,1,2) =1 + (1/2)*0 + (1/2)*v(1) = 1. On the other hand, v(2,2,2) = 1 + (2/2)*1 + (0)*v(0) = 2. So we see that d=1 is optimal on average for f=2, so v(2) = 1.

Now, you can do the same thing to get v(3), which will have to consider d={1,2,3} and evaluate each. So, at each step, we’re taking the argmin to get the best d, and add it to the list.

So how does it do? Pretty sweet!

You can see that the best d tracks sqrt(f) perfectly (with some weird wobbles?). v(100), which should be the average number of drops needed from 100, doing it perfectly, is actually 12.8. That’s a little weird because if you look above, my brute force method for d = 10 gave an average of 10.9. I haven’t looked into why this is yet, but I’m probably undercounting in one or overcounting in another (I’m guessing at one of the edge cases, like d=1 or d=100).

Here’s the code for that:

import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime

def v1egg(f):
    return(f-1)

#This is for floors 0 and 1, the anchor cases.
v2egg = [0,0]
best_d = [0,1]

f_limit = 100

for f in range(2,f_limit+1):

    vd = [1 + (d*1.0/f)*v1egg(d) + ((1.0*f-d)/f)*v2egg[f-d] for d in range(1,f+1)]

    min_d = min(vd)
    best_d_ind = np.argmin(vd)

    best_d.append(best_d_ind + 1)
    v2egg.append(min_d)


print('best for f = {}: avg={}, best move={}'.format(f_limit,v2egg[f_limit],best_d[f_limit]))
print('best d', best_d[:20])

plt.xlabel('f')
plt.ylabel('v(f), best d')
plt.plot(best_d,label='best_d')
plt.plot(v2egg,label='v(f)')
x = np.arange(1,f)
plt.plot(x,np.sqrt(x),label='sqrt(f)',color='darkred')
plt.legend()

date_string = datetime.now().strftime("%H-%M-%S")
plt.savefig('eggdrop_DP_{}floor_{}.png'.format(f_limit,date_string))
plt.show()

 

Markov Decision Process

So that worked pretty quick, but I recently learned a little about Markov Decision Processes (MDP) and wanted to use my new fancy knowledge! I realized that it could probably also be used to solve this, so I tried that. I don’t think I can fully explain what MDPs are, but I’ll just say a tiny bit that’s relevant for what I did.

In a MDP, you have a set of states your system can be in. For this problem, a state is (f,e), where f is the number of floors left (i.e., floors you still need to search to find the break floor) and e is how many eggs left, plus one state I call the “solved” state. So if there are N floors, there are 2*N+1 states. You also have a set of actions that can take you from one state to another. Here, it’s the set of floors you can drop to (plus the solved state), so there are N+1 actions. You also have a reward matrix, R_s,a, which is a reward you get if you take action a from state s. You also have a transition matrix, P_a,s,s’, which is the probability, if you take action a in state s, that you actually get to state s’. So if the system is stochastic, and if you’re in state s and take action a, you could end up in a few different states defined by P_a,s,s’. Because this matrix has an entry for every action and every pair of states, it can be pretty big. In our case, it will be (N+1)*(2*N+1)^2. You also have a state-value function, v(s), which gives you the value of being in a current state, so here is just N+1 long.

Lastly, you have your policy, p_s,a (usually denoted by pi but I’ll use p here). This is generally the goal in MDP or maybe more generally reinforcement learning, because if you solve p, it tells you the optimal action to take in every state. For every state s, it has an array of weights for each action a, so is size (2*N+1)*(N+1). Eventually, these weights should settle (at least in this problem, but maybe always as long as P and R are not time dependent?) so that only one of them is nonzero. Until then, it will add its own stochastic-ness (like P_a,s,s’) by trying different actions in the same state.

In this problem, pi and v will be changing, because you’ll be solving for them. P and R will be static and known. P and R will essentially fully describe the system, but kind of indirectly and not in the most useful way. Therefore, we want to solve for v, which will describe the system in a more useful way (i.e., “it’s good to be in this state, bad to be in that one”). We’ll use this to solve for p, because we want to know what to actually do, also.

Okay, so how do we solve this?

It gets a little mathy at this point. Here’s the idea, though. There’s also a matrix, similar to v(s), called the action-value function, q_s,a. Similar to how v(s) is just the value of being in that state, q_s,a is the value of taking action a in state s. This is obviously going to contain R_s,a, but also has to include all the possible future rewards the system could get from other states and their rewards/actions.

So: q_s,a = R_s,a + gamma*sum_s'(P_a,s,s’*v(s’)). So q is basically the reward for immediately taking that action, plus the value of every state it could immediately end up in times the chance it’ll actually get in that state. The gamma variable is just how much you want to count future rewards. We want to completely in this problem, so we’ll say gamma=1 and forget about it.

Okay, so that’s q. On the other hand, the original v(s) can be calculated by, for each action a you can take from s, adding up the product of your policy for taking that action in this state, p_s,a, and the action-value for taking a in state s, q_s,a: v(s) = sum_a(p_s,a*q_s,a). So now we have v(s) as a function of q_s,a, and q_s,a as a function of v(s). This allows us to plug q in, and get a kind of recursive function of v:

v(s) = sum_a(p_s,a*sum_s'(P_a,s,s’*v(s’)))

This is kind of funny at first glance, v(s) containing itself (partly). However, there’s a theorem that says that if you update v with the equation above using the values it currently has, and the current values of p (as well as the static values of P and R), it has to approach the correct v. At each step, we also update p by, for each state, taking the argmax of the action-value q for all the actions it can take in that state. Basically, looking at the best choice it has in a state, based on our current v.

 

Oof, okay. To the actual problem. So the main task here is choosing P and R matrices that accurately describe the system. In these matrices, I’ve ordered the states such that index 0 is the solved state, [1,N] are the 1 egg states, and [N+1,2N+1] are the 2 egg states. For the actions, index 0 is going into the solved state, and [1,N] are dropping on that index’s floor.

So it’s a little confusing, because you have to incentivize the process in a strange way. For example, I only want states with one egg to be able to go to the solved state (which will mean they’re scanning up from floor 1), so R_(f,1 egg state),(to solved state) will have reward -(f-1), but 2 egg states will have a huge negative reward (like -900). R_(solved, to solved) is zero, which is fine.

For ordinary 2 egg drops, the reward for every action to a state with the floor below its current floor will be -1 (takes one drop). I think we could actually also let the reward for going to states above our current state also be -1, and it would be disincentivized by it just being a longer path to the solved state, but to be sure, I also set the reward for those to be a large negative.

So at this point, for N=10 floors, here’s what R_s,a looks like. The rows are for each s, the columns are for a.

You can see that the only “allowed” (i.e., not massively negative) actions are either the scan up for 1 egg states, or an ordinary drop for two egg states.

So that’s for R. P_a,s,s’ is basically three dimensional (or 2D and huge), so I can’t really write it here. I’m not even sure I could have it in code as a matrix, because it would be of size N^3 (maybe for 100 would be okay). So I basically made it a function instead, which you call with the three arguments a,s,s’. Because it’s kind of enforcing some rules, it is a trainwreck of if/else statements, but it works. The main idea is that most the P values will be 0 or 1, except for the “ordinary 2 egg drops”, which will “branch” with probabilities d/f and 1-d/f (like above).

Anyway, once I’ve set up P and R, I just initialize p and v randomly, and I’m off to the races! For some iterations, I’ll repeatedly calculate the new v, use that to update p, and then repeat. Here’s the relevant code:

def policyEval(v,pi,R):
    gamma = 1
    v_next = np.zeros(v.shape)

    for s in range(len(v_next)):
        v_sum = sum( [ pi[s][a]*(R[s][a] + gamma*sum([Pmat(a,s,s2)*v[s2] for s2 in range(len(v))])) for a in range(pi.shape[1])] )
        v_next[s] = v_sum

    return(v_next)


def policyImprove(v,pi,R):
    pi_new = np.zeros(pi.shape)

    for s in range(len(v)):
        q_list = [sum( [R[s][a]] + [Pmat(a,s,s2)*v[s2] for s2 in range(len(v))]) for a in range(pi.shape[1])]
        best_a = np.argmax(q_list)
        pi_new[s][best_a] = 1.0

    return(pi_new)

v_log = np.array([v])

for i in range(20):
    print(i)
    v = policyEval(v,Pi_sa,R_sa)
    v_log = np.concatenate((v_log,[v]))
    Pi_sa = policyImprove(v,Pi_sa,R_sa)

After this is done, we can look at p:

A little harder to read, but if you look at the bottom half of it, those are the probabilities for each action (column) in each state (row). We can plot the values of v(s) as it was improving:

And also the 2 egg state entries of p, for each row (if you look, there is just a single 1 in each row. So we’re plotting the indices of those 1’s here):

Pretty cool! You can see that for N=10, v plateaus at roughly iteration 8.

N=100:

v values:

argmax(p):

This actually takes a couple minutes to run. You can see that while N=10 needed 10 iterations, N=100 only needed about 20. Hmmmm.

The last value for v is actually -12.85, which is exactly what my DP method above found, so either they’re both wrong in the same way, or I’m guessing my original brute force method wasn’t counting the first or last drop or something.

Well, that’s it for now. It’s worth pointing out that this is only useful when you’re already given R and P, so it’s not really RL at this point, more just a method for solving some fully, but inconveniently described system. However, it should also be solvable if a P and R are defined, but not given to it, and it’s allowed to sample many drops. Maybe I’ll try that next time!

Neato sequence art

A while ago I watched this video by Numberphile (a very cool channel!). In terms of the actual math, it’s pretty light, but what I love about it is the idea of creating very cool images via very simple rules.

The idea of it is this. In the video, the guy shows a way of drawing a sequence of numbers (more on that in a minute). You draw a number line, and then you draw a semicircle above the line from the first number of the sequence to the second, with a diameter equal to the distance between the points. Then, you draw a semicircle from the second to third numbers of the sequence in the same way, except this time, the semicircle goes under the number line. You continue this way as long as you want, alternating above/below for the semicircles.

So here’s what the design for the first few Fibonacci terms looks like:

This is neat, but to be honest it’s not super interesting if it’s just a monotonically increasing function; the bigger range you show, the more detail you’re going to lose for the smaller numbers/differences. So what really makes it is when you have a cool sequence that has more interesting behavior for a given range.

In the video, he uses the Recaman sequence as his sequence. The Recaman sequence is interesting. It goes as follows: starting at 1 (though you could definitely start elsewhere), each turn, you have a number you’re either going to subtract from, or add to, the current number of the sequence, to get the next number of the sequence. You subtract if the number that would result hasn’t been seen yet in the sequence (and is greater than 0), but if it has, instead you add that number to get the next in the sequence. The number you’re either adding or subtracting each iteration starts at 2 and increases by 1 each iteration. This is often stated as “subtract if you can, add if you can’t”.

For example, starting at 1: you have to subtract or add 2. You can’t subtract (<0), so you add and get 3. Now, you have to subtract or add 3 to 3. Again you can’t subtract, so you add and get 6. Now you have to subtract or add 4, and you can subtract, so now you’re at 2. Next, add 5 to 2 to get 7, and so on…

Let’s just skip to the pretty pictures. Here it is for the first 20 terms:

Very neat! You can get an idea of the behavior: if you get a term that’s kind of far out enough that it has space for it and nearby terms to subtract for their next term, but close enough to 0 that its subtracted terms aren’t big enough to subtract (so, they have to add), you get these sections of multiple adjacent terms (like the bunch towards the left in the above pic), which looks awesome in my opinion.

What’s also very cool to me is that it has this awesome, strange behavior where it definitely tends to increase as the sequence goes on, but it leaves little gaps so the sequence eventually goes waaaaay back from a higher number. In contrast to what I was mentioning above about how something like the Fibonacci sequence isn’t that interesting because you only really get to “see” one scale, for the Recaman you really get to!

Here are a few more, showing more and more terms.

N = 40:

 

N = 100:

 

N = 480:

One thing that’s interesting to me about the Recaman Sequence is that I’ve never seen this mechanism for generating a sequence before. Probably the simplest sort of sequence is more like a function, where you can just say ai = f(i). That is, you can calculate what the i-th term is by itself. For example, maybe just a geometric sequence like 1, 2, 4, 8, … where it’s just ai = 2i . The next simplest type is probably recursive, where you define each term in terms of one or more preceding terms, like ai = f(ai-1) or something. So, the Fibonacci or Collatz sequences are good examples of this. The important part is that for a recursive sequence, you usually can’t just calculate the ith term by itself, you need all the preceding terms to get there (unless the recursive relation is so simple that it has an obvious mapping to the first type of sequence I mentioned; i.e., that one could also be defined as ai = 2ai-1).

So the Recaman is recursive for sure, because it’s either subtracting from/adding to the current term to get the next. But the part that makes it unlike any other I’ve seen is its method for “forking” (choosing to subtract or add, basically an if/then statement): while some (see the Collatz, Tent, below) also fork at each term, they fork based on some objective, stable property of the term (i.e., is it even or odd). However, the Recaman is essentially keeping a register of visited values, which is something I’ve never really seen before in sequences. One thing I briefly experimented with was making a “double Recaman” sequence, where the “register” allows you to go to the same point twice, but no more. So it definitely looks a little different, and you can see it forking from the same point, where it went one way after its first visit to the point, and another way its second.

Double Recaman, first 30 terms:

So that’s the Recaman.

But what else can we do?

Let’s try a few other sequences!

We can’t go without the famous Collatz sequence. The rule is just “divide by 2 if it’s even, do 3*x+1 if it’s odd”. You can start wherever, and the length of the sequence produced (“ending” when it gets to 1) is very erratic with where you begin. For example, x0 = 26 has 11 steps, while x0 = 27 has 111.

x0 = 19:

 

x0 = 27:

The Collatz is cool because it has the backtracking effect the Recaman has, but tends to get monotonically larger, looking kinda like a seashell.

How about the Tent map? It’s pretty simple, but very cool. For x between 0 and 1, if x is less than 1/2, it returns 2*x. If it’s greater than or equal to 1/2, it returns 2*(1-x). It bounces around like crazy, but is unfortunately pretty confined to the 0,1 range.

Tent, x0 = 0.47, 80 terms:

Tent, x0 = 0.26, 80 terms:

 

There are a few others, but to be honest I couldn’t find too many others that look distinguishable from one of these.

Okay, patterns are cool, but how can we make them cooler?

One thing I tried was making a triangle whose height is dependent on the difference, instead of a semicircle:

Collatz, x0 = 263, 30 terms:

Recaman, 50 terms:

I really like these, actually. The more random ones like Collatz look kind of like a mountain range to me.

I wanted to spice it up a bit more though! One thing I tried was making the color of the circle dependent on the radius of the semicircle. Here’s how it came out:

Tent, x0 = 0.26, 80 terms:

Collatz, x0 = 43, 200 terms:

I think it looks slightly better in black:

Tent, x0 = 0.42, 80 terms:

 

One thing I didn’t love about this is that when two semicircles of very different sizes meet, the color changes abruptly, which I don’t think looks great. However, because of the way Recaman tends to have large sections of similar sized semicircles, it actually comes out looking pretty neat!

Recaman, 100 terms:

However, I still wanted a way to make colored ones without that abrupt color change. To do this, I used the triangle one (which would have the same problem as the above if I just naively colored it by the size of triangle), but with a slight change. It still has a color based on the triangle size, but the color of each fades to black as it approaches the number line. This makes all colors blend together nicely!

I also messed with the line widths here a little, because some looked better than others.

Recaman, 50 terms:

Collatz, 30 terms, x0 = 263:

Recaman, 100 terms:

Tent, x0 = 0.47, 80 terms:

At this point I tried to do a few original things, rather than just riffing on what they came up with. I thought, dang, this simple idea was really cool, maybe it’s easy to come up with other cool sequence art!

It turns out it is not. Here’s what I tried. Similar to the stuff above, I’m connecting sequence terms to the ones following them. However, this time, it’s like there are two number lines, perpendicular to each other, with their 0 points overlapping. The line each sequence term is drawn from/to alternates. I also do the line color/line length thing as above, and also the thing with the fade (where it fades towards either edge).

Here’s an example:

Recaman, 50 terms:

Then, to add a bit more interest, I rotate it 4 ways:

Collatz, x0 = 27, 115 terms:

Tent, x0 = 0.47, 80 terms:

Unfortunately, it doesn’t look thaaaat cool in my opinion, and there just don’t seem to be much variation once I make 5 or 10 of them.

Recaman, 1000 terms:

One thing I think is kind of cool is that, I do the 4-fold symmetry thing in each step (as opposed to drawing the whole sequence and then rotating it 4 times). This creates some cool “weaving” effects, like in the last one (look at the dark red and green in the corners).

Welp, that’s all for now. It’s definitely inspired me to make more math/programming generated art though, so look for that in the future!

RPi camera, part 3: a few incremental fixes

Round 3! Okay, this is where I try and polish it up in a couple ways.

Here are the things I said last time I needed to make better:

  • Send pics more conventionally
  • Fix detection sensitivity (still often picking up strong shade/sunlight quirks)
  • Total design flaw: since the log file currently gets sent with each picture, but is updated when each picture is written, it is actually sometimes more updated than the pics in the folder. That is, if 30 pictures are created by the camera function, and those are immediately added to the log file, the log file is sent with the first of those 30, and it contains all 30 of them even though only one has been sent
  • Better image classifier architecture
  • Better labeled image dataset (32×32 is tiny)
  • Sliding windows over detected images

 

Sliding windows

What is “sliding windows”? It’s a pretty simple idea. Here’s the motivating problem, first. When my convnet looks at an image I give it and tries to guess whether it has a car in it, it was trained on images where the car mostly filled the frame, so it will tend to only correctly classify images where the car also mostly fills the frame. If you have an image (like the one below) where there is a car but it’s mostly in one corner, it may not get it (because it’s looking for features bigger than that. There are a couple other effects too. One is that if we’re only classifying 32×32 square images, then what I’ve been doing so far (resizing the image to have the smaller side be 32 and then squeezing the bigger side to also be 32) will distort the image, which will make it harder to classify. Lastly, you can imagine that if the actual image size we’re giving it is something like 256×512, then even if it actually would have correctly classified it given these other problems, by the time it smooshes it down to 32×32, it might not have the resolution in the car region of the image to do it.

So what can fix a lot of these problems? You define a “window”, a subset of the image, and “slide” it over the image in steps. Then, you pass this window subset to the classifier, so it’s actually only classifying the subset. You might choose a stride for the sliding window such that you get M windows in the vertical dimension and N windows in the horizontal. So you’re really doing MxN total classifications for the whole window, and then if one of them says “car!”, you say that the image contains a car.

Here’s a little illustration of mine, where the red grid over the green outlined window shows the windows being used (it’s a little hard to tell them apart, but they’re squares. There are three in the vertical direction):

There are of course a million little quirks and variants and choices to make with this. For example, I think it’s common to choose two sizes for the window, which should let you look at two different “scales”. Also, you have to choose some balance between having more sub windows and the computation time it will take to actually process them. I’m also pretty sure some convnets can have this built in by choosing different filter sizes (like, one that would group a block of pixels as a single pixel to make a larger “window”).

Anyway, how does it work? Here are the results using my CIFAR-10 trained convnet from last time, on the same little group of detected images. I show the certainty distribution, which is the certainty that it thinks it detects a car.

No sliding windows:

Detected:

Not detected:

 

 

Sliding windows:

Detected:

Not detected:

 

Definitely better! But still getting a ton of false positives, which is annoying. Honestly it may be because they’re 32×32.

 

Fixing image sending

So I had a bit of a mystery on my hands. I was finding that after a while, my program was just…stopping. Not crashing, not giving any error, just stopping after about an hour. What I mean by stopping is that, while normally it outputs a line for each image it detects (on the RPi side), it would stop outputting anything, but not stop running. It took me embarrassingly long to figure out, but here’s what I did. I first made a Debug class that basically logs everything the program does, right at the moment of doing it. This is actually a pretty handy thing to have around anyway, and basically doesn’t slow it down. You’ll notice that I’m periodically logging the CPU/Mem/temp, since I read somewhere that that can cause a problem, but all the values I’ve seen on mine are fine. Anyway, here was the first clue, you can see where it stops, after about an hour:

So you can see that it’s saving them steadily for a while, and then stops saving them, but continues to send them. Welp, you probably guessed before I did, but while I was aware of how little space my RPi had on it (~700MB to play with), I thought that because I was removing the files right after sending them, I’d be okay. Howeverrrr:

So I was running out of space!

One thing I did was immediately get a 32GB micro SD card and clone my RPi onto it, just to have a bit more wiggle room. To be honest, that might solve the problem, since I doubt I’d ever keep the program running long enough to generate that much data, but that would be not addressing the real problem here, which is that my files are sending way too heckin’ slow!

My files are usually ~100kB, which should be easy to send and keep up with, even if something like 10 a second are being produced. For example, I know off the top of my head that when I send files via scp between my desktop and RPi, the transfer rate it shows is usually something like 1.5 MB/s. So what’s going on?

It turns out that that “S” that stands for “secure” in SCP (or SSH, which it’s based on) is pretty important! As they discuss in this thread where it seems like the person was having exactly my problem, there’s actually some pretty nasty overhead involved in encrypting the file you’re going to send. Of course, I don’t care about that! I’m just sending stuff I don’t care about over my LAN.

So one option in that thread was using a weaker cipher, while another was to use the rcp command, which is kind of like a totally unencrypted scp. I’m going to do a little diversion for a minute here because I wanted to know just how much these compared.

What I did was create a few dummy files, smallfile.txt (100 kB), mediumfile.txt (1 MB), and bigfile.txt (5 MB). First I just sent smallfile.txt 10 times to get a rough sense of the speed and overhead:

for i in range(10):
    file = files[0]
    print('processing file',file)
    start = time()
    subprocess.check_call(['scp',file,remoteHostPath])
    total = time()-start
    print('time elapsed:',total)
    times.append(total)

print('done')
print(times)

Since it’s apparently actually transferring at about 1.5 MB, an 88 kB file should take roughly 0.05 s, but you can see that it typically takes about 10x that. So SCP is definitely slowing it down.

Same when I send all of them:

If you do the math, you can see that they each have ~0.5 s of overhead, like the small ones above. What if we do the weaker encryption? As they mention here, apparently in the latest release of Ubuntu, they’ve disabled weaker encryptions by default.

Well, let me cut to the chase. It’s a pain to get those legacy ciphers working, but they mentioned in the comments of the SE thread above that the aes128-gcm@openssh.com cipher should be “blazingly fast” if you have the right CPU chipset or something. So, I tried that (which is still supported in Ubuntu 18.04), and it was no faster. I also tried rcp (which was actually very easy to set up at least), and it too was no faster.

So, the greater lesson I’m learning here is that there’s something kind of inherent to setting up a connection across a LAN where you’re not going to avoid about 0.5 s of overhead, even if the actual transfer is very fast.

At this point I have two paths forward in mind (other than just leaving it the way it is). The first is setting up my own thing, with python sockets and stuff. This might be doable, but seems like it would take a while to debug and stuff. The other that I’ve seen suggested a bunch is to mount a network drive on the RPi. This means that it will actually mount a folder from my desktop so it will look like a local directory to my RPi, in which case I won’t have to (explicitly) deal with sending stuff at all. I’m a little paranoid about the whole mounting process (I know with physical media you have to be careful when mounting/unmounting), but I’m going to try this.

Welp, it’s done. It actually didn’t take that much redoing of my code, and it’s a TON neater. I also took the opportunity to divide most of my code up into 4 main classes, which do most of their stuff internally: FileTools (which mounts the remote dir to the pi and creates the run dir), LogFile (which just creates and interfaces with the log file), Debugfile (same but for debug), and CameraTools (which handles the actual image detection and stuff). I also made a function in my main code that watches the keyboard for inputs, so you can enter a ‘q’ to quit. This is actually remarkably hard to figure out how to do properly when you have other processes running in parallel, especially if you want to make those processes able to exit cleanly (for example, in my case, unmount the filesystem as opposed to just killing the whole process).

So here’s the code from the main section now:

def watchKB(event,stdin):
    print('start kb loop')
    while True:
        k = stdin.readline().strip()
        #print(k)
        if k=='q':
            print('\n\nyou pressed q!')
            event.set()
            return(0)


#Get CLI arguments for notes for a run
if len(sys.argv)>1:
  notes = sys.argv[1]
else:
  notes = "no notes"

#Get event manager to close nicely
m = Manager()
close_event = m.Event()

ft = FileTools(notes,close_event)
lf = LogFile(ft)
df = DebugFile(ft)
ct = CameraTools(ft)

pool = Pool(processes=2)
p1 = pool.apply_async(ct.cameraStream,args=(ft,lf,df))
p2 = pool.apply_async(df.debugUpdateLoop)

watchKB(close_event,sys.stdin)

timeout_hours = 10
timeout_s = timeout_hours*3600
p1.get(timeout=timeout_s)
p2.get(timeout=timeout_s)

The functional part here is the close_event = m.Event(), which I then pass to the subprocesses that are created. It acts as a kind of global variable, which I can “set” in the main loop so that it will be detected in the subprocess. The subprocess is checking for “close_event.is_set()”, which will tell it to shut down nicely.

How does it work? Pretty great! And it feels a lot cleaner than making a million scp calls.

 

Well, that’s all for now. I have a couple more things already in the works, but I’ll just put them in the next post. Here are the things that still obviously need to be done:

  • It’s still just getting a lot of shade problems. I think if I did this carefully and smartly I could actually make this not happen by just changing how the background average is taken. Related to that but slightly different, looking at it in not just grayscale might do it. That is, when an alternating shady/bright spot changes, it’s currently sensing the change in intensity, but it’s actually mostly the same hue. So if I kept the colors and looked at the different in the different channels, actual object movement (that changes the color as well as the intensity) might still trigger it, but not things like the sun getting brighter or the shade of leaves moving in the wind.
  • Use a different convnet architecture — I’ve actually already gotten a pre-trained VGGNet working with it, which is trained on a wayyyy huger dataset, and larger images. So hopefully it will be able to use my images that are larger (than 32×32) anyway, and be better trained that the dinky thing I did with the CIFAR-10.
  • Relatedly, have it detect things like people, or bicycles: the VGGNet dataset has the most crazily specific classes.

Till next time!