Frictionless Bananas

Team name: Frictionless Bananas
Team members: Jeremy Sawicki (jeremy at sawicki dot us)
Code: submission.tar.gz (authorization token removed from the code)
Data: myproblems.json, status.json

The Frictionless Bananas had only one team member this year. This page describes my 2013 ICFP Programming Contest entry.

Implementation

My solution was written in C++. I used Jansson for JSON manipulation and libcurl for server communication.

Representation

I represented a program as a sequence of operations in postfix order, which can be evaluated using a stack. For example, the zero operation pushes the number 0 onto the stack and the plus operation adds the top two values on the stack and places the result on the stack. The fold operation invokes a loop that executes a second sequence of operations representing the lambda body.

Generation

The core of my program is a program generator that enumerates all programs of a given size using only operations in a given set. For each generated program, it invokes a callback function (an std::function<> object, typically a lambda expression provided by the caller).

My initial implementation generated the sequence of operations in postfix order, choosing the first operation and then calling itself recursively to choose the rest. I used that implementation for the lightning round, but it ultimately proved difficult to work with. It required awkward logic to avoid pushing more values onto the stack than can be consumed by the remaining operations, and it was not clear how to extend it to prune the search by prohibiting certain combinations of operations.

My final implementation works top-down, choosing the top-level operation and the sizes of each subexpession and then recursively generating subexpressions of the chosen sizes. To support operations with more than one subexpression, I used continuation passing style. The program generator calls itself recursively to generate one subexpression, passing a continuation (a lambda expression) to generate the next subexpression. It was strange to see so many functional programming concepts like lambda expressions and continuations appearing in C++ code.

Counting

I also implemented a routine to quickly compute the number of programs for a given size and operation set. It uses a dynamic programming algorithm that builds a table of program counts for each size in increasing order. When I started pruning the search space by prohibiting certain combinations of operations, I added additional dimensions to the table to represent the fact that different sets of expressions are permitted depending on the context in which the expression appears.

Having an accurate program count is useful for several reasons. For one thing, it provides a useful estimate of the difficulty of a given problem. More importantly, the program generator is able to use the counts to generate a subset of the possible programs corresponding to an arbitrary range of indices, efficiently skipping parts of the search space and knowing exactly how many programs were skipped. I used that capability to generate arguments to commutative binary operators in only one of two possible orders. After an expression of index k is chosen for the left subexpression, the search for the right subexpression is restricted to indices greater than k. Another possible use is to divide the search space into equal sized pieces for parallelization, but I never took advantage of that possibility.

Strategy

I used a relatively simple approach. My program solves problems in real time, first submitting an initial set of inputs to evaluate and then generating and testing as many candidate programs as possible until a correct solution is found or time runs out. I did not precompute anything ahead of time, and I only used a single CPU core for computations. I chose to focus my efforts on improving the set of generated programs rather than utilizing more computing resources.

My initial implementation generated programs of exactly the target size, and included the constraint that the program must contain every operation from the target set at least once. My final implementation generates programs less than or equal to the target size in increasing order, and permits some of the target operations to be unused.

Evaluation

My initial evaluation uses the following inputs:

0x428a2f98d728ae22
0x0000000000000000
0x7137449123ef65cd
0xffffffffffffffff
0xb5c0fbcfec4d3b2f
0x0000000000000001
0xe9b5dba58189dbbc
0xfffffffffffffffe
0x0000000000000010
0x0000000000020000
0x0000000400000000
0x8000000000000000

The random looking values are constants taken from the SHA-512 hash function.

Pruning

To reduce the search space, my program avoids generating combinations of operations that are equivalent to other generated expressions. The following constructs are not generated:

(shl1/shr1/shr4/shr16 0) - equivalent to 0
(shr1/shr4/shr16 1) - equivalent to 0
(shr1 (shl1 1)) - equivalent to 1
(shr4/shr16 (shl1 1)) - equivalent to 0
(shri (shrj a)) - equivalent to (shrj (shri a))
(shri (shl1 (shr1 a))) - equivalent to (shri a)
(not (not a)) - equivalent to a
(and a 0) - equivalent to 0
(or/xor/plus a 0) - equivalent to a
(and a (not 0)) - equivalent to a
(or a (not 0)) - equivalent to (not 0)
(xor (not a) b) - equivalent to (not (xor a b))
(if0 0 a b) - equivalent to a
(if0 1 a b) - equivalent to b
(if0 (not a) b c) - equivalent to (if0 a c b) (not true! see below)
(if0 a b b) - equivalent to b
(and/or/xor/plus a b) - equivalent to (and/or/xor/plus b a)
(and/or a a) - equivalent to a
(xor a a) - equivalent to 0
(plus a a) - equivalent to (shl1 a) if shl1 is permitted

After reading my writeup, another contestant pointed out that (if0 (not a) b c) is not equivalent to (if0 a c b) because not is bitwise. It means my program would have to search a bit harder to find an equivalent of (if0 (not a) b c), for example (if0 (plus 1 a) b c).

Heuristics

I took advantage of certain patterns observed in the training problems to further reduce the search space:

simple: I did not make any assumptions about the structure of simple (fold-free) problems.
tfold: I assumed that fold is the top-level operation, that the first two arguments are x and 0, and that the lambda body does not reference x. (Those constraints were given in the spec, so they are technically not heuristics.)
fold: In all training problems, I observed that the fold body does not reference x, so I made that assumption in my code. I also observed that fold is almost always the top-level operation. I think I might have seen a counterexample once, but I still made the assumption in my code. I observed that in many cases the fold lambda body has a specific form containing y, z, op1, and op2 exactly once each, although that was not always the case. My main program made that assumption, but when I observed it taking a significant amount of time to solve a problem, I ran another instance of my program which relaxed that assumption, and it occasionally found the solution quickly. I sometimes tried relaxing the other assumptions, but that did not seem to help much.
bonus: For bonus problems, I assumed that the top-level operation is if0. I also observed that the three subexpressions of if0 tended to be of roughly equal size, so I assumed that each subexpression must have a size of at least 5.

I observed that for some problem types if0 tends to appear only once, but I did not take advantage of that. I also did not think of the approach that some other teams came up with of independently searching for a solution to one branch of an if0 that produces the correct answer for half of the inputs. I did not spend much time on bonus problems in general.

Precomputation

At one point I started an implementation that would precompute a table of programs and corresponding sample inputs/outputs which could quickly be queried, but I eventually abandoned that effort.

I am not convinced that precomputation is very helpful. When solving a problem in real-time, it is easy to restrict the search to the set of operations used in that problem. When precomputing, it it necessary to generate either a much larger set of programs using all operations or a separate set of programs for each problem, neither of which is very appealing. For example, if I look at the simple (fold-free) problems of size 14 in my problem set, I must generate between 2 million and 700 million programs for each problem, depending on the set of operations used in the problem. My code can generate the largest of those sets within roughly 5 minutes. On the other hand, to generate programs of size 14 using all operators, I would have to generate 12.9 billion programs, likely taking an hour or two. If I knew in advance which problems were difficult, it may be worth precomputing programs for just those problems using the correct set of operations. But most problems can be solved easily, with only a few requiring large, difficult to find solutions, so I would be forced to do a lot of precomputation for all problems when only a few need it. Overall it did not seem like a good approach.

Results

I scored 425 points in the lighting round, which should be around 15th place based on the score distribution published by the organizers (10th place is 457, 25th place is 345). I greatly underestimated the time required to communicate with the server and solve problems due to rate limiting, so I was not able to run as many problems as I intended before the time expired. Still, I only intended to solve problems during the lighting round where I was fairly confident that my program could exhaustively enumerate all possible solutions.

I scored 1359 points in the full contest, which should also be around 15th place (11th place is around 1400, 25th place is around 1250). Despite my experience with the lighting round, I again underestimated the amount of time necessary to communicate with the server and try all of the problems. I finished running the non-bonus problems about five minutes before the contest ended, leaving little time to attempt the bonus problems. (I also somehow missed one of the tfold problems.) Here are the statistics:

simple: 537 solved, 23 failed
tfold: 439 solved, 20 failed, 1 untouched
fold: 370 solved, 30 failed
bonus: 13 solved, 5 failed, 382 untouched

Looking at only the non-bonus problems, I scored 1346 points. Based on unofficial scores published by some teams, the next highest score I have seen for the non-bonus problems is 1273, so I think there is a chance that my code is among the top performing in the contest. Unfortunately I did not successfully make use of it in time. I also did not implement a sophisticated solution for if0 in the bonus problems, so I still may not have been competitive with the best teams on the full set of problems.

Overall impression

This year’s task proved to be quite fun. It was a straightforward problem, not involving an unreasonable amount of preliminary code or complex reverse engineering to get started.

I am always a little skeptical of tasks where the judges will not run code submitted by participants. Of course that makes the judging process simpler, and it also lets teams work with any platforms and tools they choose, but it sometimes can make the contest less of a true “programming” contest because you do not need to write code that can handle any input that it may be presented with. I think the judges succeeded with this year’s task.

The only downside I see to this year’s task was the large amount of time required to solve all of the problems due to rate limiting by the server. Ideally it should be possible to submit solutions as fast as your code is able to generate them. I understand that the server’s job was not easy though, involving proving the equivalence of programs.

The Frictionless Bananas