Python Registry Software – v.1
Hi all. Recently, I’ve been fiddling around in Python, and stumbled across a real world problem with a pretty fun software solution. At my school, some class placements are done by 12 people in a room for 6 hours via marathon sessions that go on for a few days. As a programmer my first thought in the break down of this problems was: “Is the work hard?” My father and I are big believers in the idea that a computer can be stupid so fast it looks smart.
I recently watched an interesting talk by Jake VanderPlasĀ called “Statistics for Hackers”, where the speaker stressed how much easier it is to reason about a simple program, than it is to try and use a lot of statistical analysis tools. One of his tag-lines was
“If you can use a for loop, you can do statistics.”
So, I started trying to model my data (in a simplified manner).
I put together a place for me to store student class preferences "class_preferences"
, and a place to store the available slots in each class "class_slots"
. The preferences were kept a dict(ionary), a Python data structure that works a little like a Javascript object, that contained student ids as keys and an array of eight classes as ordered preferences. The class_slots
dict just stored {class_name: # of slots}
. I modeled my flow around the idea that I could create placement functions that took in class and preference data and returned something I called a state
, which was just a dictionary which mapped student ids to classes, and a dict that contained the new available class_slots
.
My first function was a bit of a mess. In registry it is called a “lottery.” The basic idea is that I shuffle my list of students randomly, and then I iterate through the list giving each person their first choice if it is open, or second choice, or third…… until I either place them in one of their eight preferences or assign them a random class.
Seeing as I wanted to be able to compare different strategies, I wrote a quick “cost” function that could compare state
and class_preferences
, and give me a single number telling me how bad my function did (lower is better)
def cost(state, target): _c = 0 for person in state: if state[person] in target[person]: _c += target[person].index(state[person]) else: _c += len(target[person]) return _c
In the spirit of Jake VanderPlas, I wrote a for loop to test this:
def gen_sample_data(class_list, num_prefs, slots): t = {} for n in range(slots): name = 'Person ' + str(n) choices = list(class_list) random.shuffle(choices) choices = choices[:num_prefs] t[name] = choices return t def test(class_slots, num_prefs, num=100): for epoch in range(num): print("epoch", epoch) t = gen_sample_data(class_slots.keys(), num_prefs) for i in range(100): _cost = cost(np_firstcome(t, class_slots)), print(i, _cost) return data
My next few steps were to create some other functions and test them along side this method. Just so you know, this simple method could get an average placement of people in their “1.2851th” choice. Tune in for an update on how I got this number down even lower, and put the tool up on the web for free use!
There are no comments yet.