Corporate Gifting (HackerCup)

Facebook HackerCup Round 1 ended yesterday. After comparing input and output files with a couple of friends I almost certainly know that I will fail to proceed to Round 2, due to a silly misinterpretation of stressful wins in the problem named Winning at Sports. I am quite frustrated with the fact that, despite I solved all the problems, I will not proceed to the next round due to carelessness unless miraculously 500th competitor of the round ends up with 75 points.

That said, the need to focus on something else than the frustrating events of yesterday led me to writing up this post on the solution to Problem D of Round 1, Corporate Gifting.

Problem Statement

The problem basically presents us a company hierarchy. Everyone in the company is to buy gifts with costs from 1 to N, but no employee should buy the same gift with their manager. Given the hierarchy of workers as a tree, we are to find the minimum sum of money the whole company is spending on gifts. Number of nodes on the tree(N) is not more than 200,000. Finally, there can be T≤100 in a single input file.

Arriving at the Solution

Even though I, like almost all the other competitors, noticed that the problem was about coloring a tree using a particular method, my initial candidate for a solution (i.e. a postfix traversal; assigning 1 to the leaves and working my way up) was disproved by a test case I thought of.

After that I actually had to do a couple of hours' googling only to end up with a paper by Leo G. Kroon, Arunabha Sen, Haiyong Deng, and Asim Roy titled The Optimal Cost Chromatic Partitioning Problem for Trees and Interval Graphs. As can be seen by my perfectly conventional referencing of the said paper, I haven't been an avid reader of papers recently. Most of the paper deals with the mathematical foundation behind the chromatic numbers of graphs and the proof of the properties that makes an efficient solution possible for trees and interval graphs. The polynomial-time algorithm they present, though, solves the problem Corporate Gifting. Below is that algorithm, hopefully with a more practical narrative.

Solution

First thing to cover in an effort to understand the algorithm presented in the paper is a bit of notation and theory. After all, this algorithm came straight out of a scientific paper.

Notation

We have N different nodes v1, v2, v3, ...vN. We also have N colors(i.e. gifts) which we will denote by c1, c2, c3, ...cN. Their corresponding costs are kc1, kc2, kc3, ...kcN. We are to assign colors so that, for a valid coloring, the sum of the color cost of each vertex becomes minimal.

Approach

The algorithm considers each node as the root of its own subtree, and computes the cost of two best coloring schemes(i.e. a primary and a secondary option) for each one, which use different colors. For each node v, it stores the costs of primary and secondary coloring schemes of the tree rooted at v as well as the color of node v if the primary scheme is used.

The Algorithm

The solution traverses the tree in postfix order, and using its children's values, computes for v the costs of primary coloring scheme P[v], the color C[v] that yields the optimal cost for that primary scheme, and the cost of the second cheapest scheme S[v] that uses a color different from C[v].

Algorithm computes the costs of primary and secondary coloring schemes based on the formula below. It essentially depends the node v on which it is used, and on the color c used to paint that node.

equation

If the number of children v has is d, as there are (d+1) possible colors vor v to take, the overall time it would take to compute the formula for the whole tree has the complexity O(d2), or rather O(N2). Obviously, it is too inefficient.

Thankfully the authors present a method to compute the formula which has an overall time complexity of O(N). If you closely look into the formula, you'll see that what it does for a particular color c is that it adds its cost kc once (i.e. the cost of painting v to color c), then adds the primary cost of coloring the subtrees of the children with colors different than c, and the secondary cost of coloring the subtrees of those with the color c. The proposed method stores in an intermidiary array A of size (d+1) and stores in each A[i] an initial value M, the sum of primary costs of all the children of v.

A[i] corresponds to K(v, ci). Then, the children of v are traversed (i.e. as opposed to all the possible colors v may take) and for each child vi , ( S[ vi ] - P[ vi ] ) is added to A[ C[ vi ] ]. The meaning of this operation is as follows. Since A[i] is accumulating the minimal cost of coloring the subtree rooted at v if node v has the color ci , then A[i] should only contain the primary cost of coloring of the subtrees rooted at children vi such that C[ vi ]ci. For the any child vi of v that has the color ci, it should contain the cost of the secondary coloring scheme of the subtree rooted at vi. That is why, for each such child, the corresponding A[i] is updated to contain their secondary cost. After a final traversal of A to add to each A[i] the cost of the color kci that may be used used on v, A[i] stores the values of K( v, ci ) for c=1,2,...,(d+1).

Then, all we need to compute P[v] is to get the minimum of all those values. The color ci which yields that minimal value is assigned to C[v] . Lastly, to get S[v] , all we need to do is to traverse A and get the minimum value such that the color being used to paint v is not C[v] .

After a postfix traversal of the tree in which we compute arrays P, S and C as explained above, the minimal value we should output resides in P[1], assuming the nodes of the tree are numbered from 1 to N.

Complexity

Obviously the postfix traversal of the tree has linear time complexity. One may be mistakenly thinking that the loops used while computing the values of array A adds another order of complexity to the solution. Yet, it is trivial to understand that overall complexity is linear too, once one notices that each node in a tree, by definition, is a child of a single node. (except the root, of course) In other words, primary and secondary costs of the subtree rooted in any node vi will be considered just once. (i.e. when computing the costs of coloring the tree rooted at its parent) As there are N such subtrees, the total number of iterations for the inner loops is linear as well.

Implementation

The single most important thing in implementing this solution was to make sure we implement the postfix traversal using a stack we allocate in the heap and manage ourselves instead of implementing it recursively. Given a valid case where N = 200,000 and each node (except the single leaf node) has one child, a recursive implementation surely yields a stack overflow.

Conclusion

I believe there may be a lot of incorrect submissions to this one. The sample cases provided for the problem is suggestive of quite a few different, yet incorrect solutions. My initial idea of assigning 1 to all leaves and crawling up accordingly was a result of that.

All in all, this was a fascinating problem to tackle, as it was the one that pushed me the most. Even if you did not participate in the Qualification Round or Round 1 of the contest, this is the first problem I suggest you solve.