Frictionless Bananas

Nickname: Frictionless Bananas
Name: Jeremy Sawicki (jeremy at sawicki dot us)
Language: C++
Rank: 46th - Legend League

This page describes my CodinGame Summer Challenge 2024 - Olymbits entry. I reached the Legend League and achieved 46th place overall.

This was my first time participating in a CodinGame contest, though I have some experience with other contests, mainly ICFPC and AZsPCs.

The task

The task was to play four different arcade-style games. Each game takes a controller input of UP, DOWN, LEFT, or RIGHT on each time step. The twist is that you must use a single controller to play all four games at once.

Games are played against two opponents. At the end of each game, a gold medal earns 3 points and a silver medal earns 1 point. The simulation continues for long enough to play each game around 6 times. Scores within each of the games are added, and then the totals for the four games are multiplied. The multiplication means it is important to spread wins evenly across the four games.

Solution approach: expected scores

My basic approach is to consider possible sequences of my own future moves, evaluate the score impact of the game state reached at the end of each move sequence, and make the first move from the best move sequence.

Initially I used move sequences with a fixed depth of 8. Later I switched to beam search to be able to search to the end of each active game.

I evaluate the game state reached at the end of a move sequence based on what fraction of my opponents’ possible move sequences would result in my receiving a gold or silver medal. Looked at another way, I calculate my expected score from gold and silver medals assuming that my opponents play randomly. For example, if a given move sequence would result in a gold medal 80% of the time and a silver medal 10% of the time, the expected score is 3 * 0.80 + 1 * 0.10 = 2.5.

Performing the above calculation requires knowing for each game the distribution of scores that each opponent can receive. The details vary per game, but it generally requires pre-computing some tables via dynamic programming that can be used for quick lookups during the beam search.

Archery and diving

Archery and diving are the easiest cases. The actions of the players don’t affect each other, and the game runs for a fixed number of turns, so the medals are determined by a simple comparison of each player’s score at the end of the game.

For each opponent, I build a table that for each game state stores the number of move sequences resulting in that game state. Then I combine all game states having the same score to produce a table of the number of move sequences resulting in each score. Divide by the total number of move sequences to compute the fraction of total move sequences resulting in each score. Sum the values to produce the fraction of total move sequences resulting in a score equal to or worse than a given score.

With the above, I can look up my own score to find the fraction of move sequences where each opponent receives a score equal to or worse than mine. Multiply those fractions to produce the fraction of combined move sequences where I receive a gold medal. A similar calculation applies for silver.

Hurdles

Hurdles is slightly more complex because the game ends when the first player reaches the finish line. Thus, ranking between the bottom two players depends on the actions of the top player. I made the simplifying assumption that each player is allowed to finish the race, and ranking is determined by the number of moves to finish. Under that assumption, I build similar tables as in archery and diving. I never went back and figured out how to implement the correct logic.

Skating

Skating is by far the most difficult case. First, moves are scrambled after each turn, so it is impossible to predict the consequences of specific moves more than one turn in advance. Second, the actions of the players are not independent because players on the same space have their risk value increased, so it is not possible to evaluate the actions of each player in isolation.

On the other hand, each skating game is exactly the same (no randomized hurdle locations, wind directions, etc.). As a result, it is possible to compute a much larger lookup table on the first turn which remains valid for the whole game, taking advantage of the larger 1 second time budget. I managed to make it work after some significant optimizations.

The idea is that for every number of turns remaining, every combined game state of all three players, and every possible ending score, I can look up the number of future move sequences across all three players which will result in that score. To use the table, I first have to simulate one time step ahead for each of the 64 possible move combinations of the three players. Then I look up the score distributions for each of the three players and combine them to calculate the fraction of move sequences leading to a gold or silver medal. That information is combined across all opponent moves for each of my four moves to determine an expected medal score for each of my four moves.

To build the table within 1 second, I had to reduce both the number of entries in the table and the time spent computing those entries. The game state consists of a score and a risk for each player. There are 34 possible scores and 7 possible risk values, so the total number of game states is 34^3 * 7^3 = 13,481,272, which is too large. But to calculate which players are in the same position, it is not necessary to know the exact score (number of spaces moved)—only the current positions (score mod 10). That reduces the number of states to 10^3 * 7^3 = 343,000. To make that optimization work, the table cannot store information about total scores, but rather scores for the remainder of the game. At lookup time, we can add the score so far. Furthermore, it is not necessary to know the exact positions of the players, only their relative positions. Thus, we can assume one of the players is at position 0, reducing the number of states to 10^2 * 7^3 = 34,300. We can further require that the positions of the other two players be in sorted order, reducing the number of states to 55 * 7^3 = 18,865. I implemented various optimizations to compute the table more quickly, including not trying all four moves for stunned players.

Combining games

After computing expected scores for each of the four games, I need to combine them in a way that reflects the fact that the final score is the product of the four game scores. When a game’s current score is x and the expected additional score is y, the expected final score increase is y/x times the current final score. Therefore, it makes sense to weight each game by the reciprocal of the current score. Note that in some cases the current score may be zero, in which case the reciprocal is infinite. In those cases, I use a weight of 1 for games with zero score and a weight of 0 for games with non-zero score, so that only games with zero score are considered.

A problem with the above is that it assumes that future games will contribute nothing to the score. It does not make sense to place so much weight on games with a low or zero score early in the game, since there will be plenty of opportunities to earn more points. So as a modification, if fewer than four games have been played, I assume that the remaining games up to the fourth game will receive the average score of (3 + 1 + 0) / 4 = 4/3. Thus, it is only after four consecutive bronze medals that I will start ignoring other games and focus exclusively on games with zero scores.

Favoring wins over ties

So far, I have described trying to increase my own score by earning gold and silver medals, but it is also valuable to lower opponents’ scores by robbing them of medals. In other words, a tie is good, but a win is better.

Reducing one opponent’s score by a certain amount is roughly equivalent to reducing both opponents’ scores by half the amount, which is equivalent to increasing my score by half the amount. So, while normal scoring rules earn 1 point for silver and an additional 2 for gold, I instead use 1 point for a silver tie, an additional 0.5 points or 1.5 total for a silver win, an additional 2 points or 3.5 total for a gold tie, and an additional 1 point or 4.5 total for a gold win.

The way I implement that is not entirely accurate. As described above, I weight my scores for each game based on the reciprocal of my score for that game. In reality, I should weight the points for a tie based on my own scores and the additional points for a win based on the scores of whichever opponent I am beating. Furthermore, the only thing that matters at the end of the game is the relative order of the three players’ final scores, so I should ideally weight any score changes based on the likelihood that it will change the relative order between me and an opponent. I don’t do any of that.

Refinement: guaranteed scores

A problem with the approach described so far is that it assumes my opponents play randomly, when they are in fact trying very hard to play well. I may choose a move sequence which beats a large fraction of possible move sequences at diving, but it is likely that an opponent will play the one perfect move sequence and render my efforts moot.

My solution is to pick one of the four games to prioritize, and play in such a way that I get the best achievable medal score (gold win, gold tie, silver win, or silver tie) for that game. I run beam search as usual, filtered to exclude branches that can’t reach the best medal score for that game, in order to choose moves that also benefit other games.

To choose a game to prioritize, I first determine the maximum and minimum medal scores achievable for each game (4.5, 3.5, 1.5, or 1, as described above) and subtract them to determine the largest score increase possible for the game. I then weight those scores by the reciprocal of the current score for that game (again, as described above). Finally, I divide the weighted score by the number of turns left in the game. The idea is to maximize the points gained per turn—it is better to pick a game where I can earn a gold in 3 turns than one in which I can earn a gold in 8 turns, because the extra 5 turns can be used to gain points in another game.

Non-skating

For the games other than skating, it is relatively straightforward to fill a table that stores the maximum and minimum scores achievable from each game state. I can look those values up for each player to determine the best and worst medals possible. I can use the same tables during beam search to determine whether it is possible to reach the best possible medal from a given state.

Skating

Skating is again the difficult case. I once again populate a large table on the first turn, storing information about each game state. This time, I store four bits for whether each medal score (gold win, gold tie, silver win, silver tie) can be achieved if I play optimally, and another four bits for whether each medal score will always be achieved regardless of what I do.

The previous skating table only considers the relative positions of players, not their absolute scores. To determine medals, we need the actual differences between scores (if not the absolute scores themselves). The maximum score difference is 23, so we need to represent score differences between -23 and 23, or 47 values total, for each player. We already perform table lookups using the score differences mod 10, which I normalize to a range of -5 to 4. We can replace each of the bits that we need to store with 5*5=25 bits representing the five possible ranges, (-25 to -16, -15 to -6, -5 to 4, 5 to 14, 15 to 24) for each of the two players whose relative positions we are storing. I pack two such sets of 25 bits into a 64-bit integer and compute the bits with efficient bit manipulation operations.

Results

My approach was enough to get into Legend League early on, though just barely. I never found any promising improvements after that, and I stayed towards the bottom, finishing in 46th place.

I think the main problem was that I was modeling the opponents in a simplistic way, rather than using a game-playing algorithm that models the opponents’ play with the same sophistication as my own. I am familiar with minimax, but I have never implemented MCTS and have never heard of some of the things that other participants have been discussing. I didn’t really have any idea how to deal with simultaneous moves and three players. Still, my approach did quite decently.

Overall impression

This was a really fun contest and interesting task. Thank you to everyone who put it together.

The Frictionless Bananas