This was the only problem in the contest solution to which I could not figure out intuitively. I looked at the editorial of the contest, as well, but some major portions regarding the solution were still missing. In the end I actually had to ask Vitaliy(witua), the problemsetter of the contest, who is also a friend, about the solution 3 years after he set the contest. Thankfully, he somehow managed remembering the tricky details and led me towards a solution.
Problem statement defines what a lucky number is, and then tell us about Petya who needs to pick, from a sequence of length N, two non-intersecting and continuous subsequences such that no lucky number occurs in both. The statement asks us to find the number of different pairs of subsequences Petya can pick. The upper bound for N is 105. Another key thing to note here is that the number of total appearances for a lucky number will not exceed 103.
I will explain the idea as Vitaliy did in his editorial. I will first solve a simpler version of this problem, and adapt the solution of that one to the original problem. What will be different from Vitaliy's narrative is I will use smaller steps in explaining the tricky details, which should help ease understanding them.
A Simpler Version of the Problem
Consider an array A of length L ≤ 103 and all elements of which are lucky numbers. Let's solve the same problem for this array first.
We have two intervals, or in other words, four endpoints to find. In this simpler version, since all numbers are lucky, basically no value on the right subsequence should be in the left subsequence.
A brute force approach to find the four endpoints would not work for this problem due to maximum input size being 103. An alternative that might work is fixing one interval, or at least one endpoint of one of the subsequences (e.g. the left endpoint i of the right subsequence) and progressively expanding the other endpoint, j, of the same(i.e. right) subsequence while somehow maintaining the current and total number of eligible left subsequences we may pick to accompany this right subsequence that consists of the elements from A[i] to A[j]. We need to maintain the eligible left subsequences in such a manner that the time complexity of updating them or counting them should not exceed O(logL) due to the input size of the problem.
Once we fix i, if we use a set to keep track of the endpoints of the continuous and maximal left subsequences we have to the left of index i, we may update those intervals as j (i.e. the right endpoint of the right subsequence) increases. For a fixed left endpoint at index i, every time we increase j and encounter a new lucky number A[j], we may look into all indices it appears at and split the intervals that contain them. Once we fix index i and j, updating the left intervals in the set would have O(logL) time complexity.
There is another thing to note here. There are L appearances of lucky numbers in A, and because of that, you may mistakenly think that we will be dividing left subsequences O(L) times for each pair of fixed indices i and j, as we are to update the set of left subsequences for each appearance of A[j]. However, the number of appearances of all the lucky numbers is L, which implies that the total number of times we will divide any left subsequences in the set is L. This means, for fixed i, overall time complexity of maintaining all eligible left subsequences as we shift j is O(L + LlogL).
Now we have the functionality that will suffice to count the number of eligible left subsequences we have for fixed i and j. At each step of the process described above, we have a fixed right subsequence [i,j] and all we need to do to count the pairs of subsequences included in the answer is to count the eligible left subsequences we can pick to accompany [i,j]. Let's start with how many eligible left subsequences a maximal continuous left subsequence(i.e. such as the ones in the set we maintain) contributes.
The reason I accentuated the necessity of the left intervals in the set we maintain being maximal has to do with the calculation of the total number of left subsequences Petya may pick. Given such a maximal continuous left subsequence as eligible(i.e. as a member of the set we maintain), we also know that all the subsequences of that particular subsequence are eligible as well. The number of eligible subsequences such a maximal subsequence interval of length G has is given below.
Thanks to our usage of set, at each step (i.e. for fixed i and j) we have the maximal continuous left subsequence intervals. We may, using the formula above, count the number of left subsequences we can pick to accompany our right subsequence [i, j]. Yet it is inefficient to count the number of left subsequences from scratch for each i, j pair. Instead we can have two values currentSum & currentNum and we may update them as we iterate j in range [i, (L-1)]. currentNum may hold the total number of eligible left subsequences for fixed i, and the current value of j. We may update it every time we divide an interval G into two by decreasing the number of all subsequences of G (including itself) from it, and adding the number of all subsequences of the new intervals we introduce into the set. After currentNum is updated, currentSum may be incremented by currentNum. If this is repeated for all values of j, after traversing all possible values of j, currentSum will contain the total number of left subsequences we can have for fixed i. (i.e. all the right subsequences that start at index i)
Consider the functionality described up to this point to be a function that counts the number of possible selections of left and right subsequences for fixed i. Let's name this function countForFixedI(i). Once we put another outer loop that iterates countForFixedI(i) through all the values index i can have, the simple version of the problem is solved. We may name the function that iterates i in range [1, (N-1)] and calls countForFixedI(i) as count(). The loop in count() adds an O(L) factor to the time complexity of countForFixedI(i), and the overall time complexity of the solution to the simple version of the problem is O(L2logL).
Extending the solution to the original problem
We should now adapt our solution to the case where we also have some numbers in A that are not lucky. The length of A will, from now on, be N while the number of lucky numbers remains as L. Also, array indices mentioned are assumed to be 0-indexed.
We have to keep indices i and j iterating only on lucky numbers due to the time constraints. If any one of them traverses O(N) values the overall complexity becomes O(NLogL) and it exceeds the time limit.
Indices i and j being only able to iterate through lucky numbers causes the need to have an array luckyLocations of the indices lucky numbers are located at, so that we actually can iterate through them in O(L). We also need to update the process of incrementing currentSum a little bit. We may now skip some not lucky numbers on the right subsequences we investigate in countForFixedI(i), as we iterate j only through the indices of the lucky numbers. So, before splitting the eligible maximal left intervals by the indices at which A[j] is located, we need to add to currentSum the product of currentNum * (luckyLocations[j] - luckyLocations[j-1] - 1) as we implicitly added (luckyLocations[j] - luckyLocations[j-1] - 1) not lucky numbers to the right subsequence, and after each addition we had currentNum-many eligible left subsequences that could accompany that new right subsequence starting at i and ending at any one of those not lucky numbers.
Obligation to keep indices i and j iterating on lucky numbers also shows us that we need to handle the new kinds of right subsequences we may encounter separately, if we want to adapt our algorithm for the simpler version to these new constraints.
Iterating i and j over the lucky numbers exclusively lets us count the total number of left subsequences for all possible right intervals [i,j] which contain at least 1 lucky number and that start and end with a lucky number. Considering this, the newly introduced cases we should handle become obvious. The new types of right intervals we introduced into the problem by adding not lucky numbers to A are;
- Right subsequences with no lucky numbers
- Right subsequences that have 1+ lucky numbers but end with not lucky numbers
- Right subsequences that have 1+ lucky numbers but start with not lucky numbers
Essential modifications to the algorithm to handle all three of these new cases are narrated below. Keep in mind that the modifications to handle the first case of right subsequences is made to count() function, while the latter two require direct modifications to be made on countForFixedI(i).
I used a lot of range notation to hopefully ease the understanding of the ideas. Especially to those of you who are less familiar with this particular type of notation, consider [a, b] to be the inclusive range of elements from index a to b in A. Also, in some cases, just to keep your focus on the point I am trying to make, I ommitted the fact that based on the input(i.e. the sequence A), sometimes such a range [a, b] may be non-existent as b < a. You have to handle such cases by checking if such a range is actually valid in your implementation.
Right subsequences with no lucky numbers
With luckyLocations array we defined above to help iterate i & j exclusively through lucky numbers' appearances, we now also implicitly have all the continuous maximal subsequences of not lucky numbers. If we iterate an index k in the index range [1, N) (remember, 0-indexed) the sequence [luckyLocations[k-1] + 1, luckyLocations[k] - 1] contains a continuous maximal subsequence of not lucky numbers, all subsequences of which are eligible to become a right subsequence Petya may choose.
There are only two exceptional cases that may cause us to have an eligible right subsequence without lucky numbers that is also not a subsequence of any of the maximal ranges [luckyLocations[k-1] + 1, luckyLocations[k] - 1]. Namely, we may have maximal continuous subsequences of not lucky numbers at index ranges [0, luckyLocations - 1] and [luckyLocations[L-1], (N-1)]. We need to cover them separately.
Now let's basically cover how we will compute the total number of pairs of left and right subsequences Petya may choose such that the right subsequence is a subsequence of such a maximal right subsequence [P, R] with no lucky numbers. Since no lucky number is going to be in the right subsequence, the maximal continuous left subsequence for any i (i.e. left endpoint of the right subsequence Petya may choose) we may pick is the whole range of indices [0, i-1] of length i. Consequently such a maximal left subsequence has i * (i+1) / 2 subsequences we can use, as pointed out earier in the solution to the simpler version of the problem. Taking this into account, the number of left subsequences we may have for all i and j we may assign in the range [P, R] is equal to the summation below.
The summation expression above is equal to an expression in terms of P and R, and can be calculated in constant time. Knowing this, as mentioned in the beginning of this section, we may iterate an index k over luckyLocations, and compute the number of pairs of sequences such that the right subsequence has no lucky numbers for each of the maximal continuous subsequences of not lucky numbers. This modification is done in scope of count() function, and you should not forget to handle the two exceptional cases separately.
Right subsequences that have 1+ lucky numbers but end with not lucky numbers
The second type of right subsequences we introduced with adding not lucky numbers to A is the ones that have some amount of lucky numbers, but ends with some amount of not lucky numbers. This new case as well as the case following this one actually modifies the function countForFixedI(i) as opposed to the path we followed for the previous case of right subsequences not having any lucky numbers.
This case is relatively straightforward. We basically have some not lucky numbers after index luckyLocations[L-1]. This really is not different from having some amount of not lucky numbers between luckyLocations[j-1] and luckyLocations[j]. Remember how countForFixedI(i) was adapted to handle those? We simply stated that currentNum never changed in the interval [luckyLocations[j-1] + 1, luckyLocations[j] - 1] and we added currentNum * (luckyLocations[j] - luckyLocations[j-1] - 1) to currentSum as we passed by (luckyLocations[j] - luckyLocations[j-1] - 1) elements by simply incrementing j by 1. Therefore, all we need to do to handle this case is to add to currentSum the value currentNum * (luckyLocations[N-1] - luckyLocations[L-1] - 1), and the case is solved!
Right subsequences that have 1+ lucky numbers but start with not lucky numbers
The modification to adapt countForFixedI(i) to this case may seem a little tricky and counterintuitive. The reason for it seeming so is that we will actually be counting the number of cases where the left endpoint of the right subsequence is not lucky and has an index smaller than i, even though we are in the scope of countForFixedI(i), inside which we assumed, till now, that i was the left endpoint of the right subsequence we were counting for.
Inside countForFixedI(i), for some fixed i and j, we need to consider the case of having a maximal subsequence of not lucky numbers in the interval [luckyLocations[i-1] + 1, luckyLocations[i] - 1]. Let's call that interval Z. Since our right subsequence is currently [i,j], the interval Z is actually included as a subsequence in one of the candidate intervals for the left subsequences we maintain in a set inside countForFixedI(i). We somehow need to increment currentSum appropriately by taking into account, for each fixed [i, j], the case of the right interval not starting from i but from any one of the not lucky numbers in range [luckyLocations[i-1] + 1, luckyLocations[i] - 1].
Let's examine, inside the scope of countForFixedI(i) the set I in which our maximal and continuous left intervals reside. Let's call the rightmost(i.e. the one with the right endpoint closest to i) of those intervals as G. We know for a fact that the alternative, not lucky left endpoints for right subsequence [i,j] reside in G. In other words, Z is included in G. Note that the intervals in I are only split by the appearances of the lucky numbers encountered in the range [i, j]. This means that G may still contain some lucky numbers. But since we have luckyLocations array, we know exactly where the rightmost of them resides: luckyLocations[i-1]. So, just as mentioned, our candidates for replacing i are in the range [luckyLocations[i-1] + 1, luckyLocations[i] - 1]. (i.e. the interval Z)
What we need to do now is to iterate an index k in interval Z and for each k, compute the total number of left subsequences we can have for interval [k, j]. We already do a similar computation in countForFixedI(i). We maintain currentNum, which is the total number of subsequences of each maximal left interval we maintain in I. However, this time the rightmost one of those, G, is to be only partially included in that counting. We need to act as if G is a maximal left subsequence that ends at index (k-1) and only include in currentNum the subsequences of G that end before k.
After a bit of consideration, one notices that iterating an index k is inefficient. The computation that yields the total number of pairs of eligible left and right subsequences of A such that the right subsequence is [k, j], for all values of k, may be expressed with the following summation formula.
Here lenZ and lenG denote the number of elements in the ranges Z and G, respectively. Basically, for each value of k, we subtract from currentNum everything G contributes and add back only the portion of it that is to the left of our designated k. It takes O(1) time to compute this summation expression, or rather a simplified version of it in terms of currentNum, lenZ and lenG.
As long as we pick an i value for countForFixedI(i) that has a subsequence Z with lenZ > 0, we need to keep computing and adding the value of this sum to currentSum. Computation of this case complies with all the previous modifications we made on countForFixedI(i), as well. For instance, just as we increment currentSum by lenZ * currentNum when we skip some not lucky numbers between luckyLocations[j-1] and luckyLocations[j], we will add lenZ times the value of the summation expression. We also need to act similarly if we have not lucky numbers in the exceptional intervals [luckyLocations[L-1]+1, (N-1)] or [0, luckyLocations - 1].
I understand that the theoretical explanation of the solution may have been a bit excruciating to you. You may probably code almost all the solution if you understood the ideas I tried to convey in the previous section. However we are still not out of the woods yet, and the implementation has a few tricky points to watch out for, as well.
I have been mentioning some set I in countForFixedI(i) since the beginning, in which we will store the eligible maximal left subsegments containing exclusively not lucky numbers. Every time we encounter a lucky number in A[j], for each of its appearances we will split the relevant left subsegment into two.
There are two key things that determined use of data structures while implementing this algorithm. The first is the ability to easily find the relevant left subsegment in the set I efficiently. To achieve this in logarithmic time, I simply used a map which mapped the right endpoints of the intervals to their endpoints. Thanks to STL, a simple call to lower_bound() function of map, which worked on keys, helped me easily find, in logarithmic time, the interval to which some index k was located in.
A second data structure I used was a map from integers to dynamic arrays, which just mapped the lucky numbers in A to the arrays that had the indices they appear at. This helped me just go through the appearances of the relevant lucky number instead of all of the luckyLocations array.
The unfolding of the two critical summation expressions presented above are left to you so that you may at least get a chance to get your hands a bit dirty. The readers who do so will encounter an expression with a term of fourth degree. Since that term is bounded by (N-1), the maximum value it may have is around 1020, which pushes the bounds of even unsigned long long integer type.
Since the solution to this issue is not the focal point of this problem I will not cover it in depth and will just state that I wrote some sort of a greedy multiplication for rational expressions of type which tries to divide the partial result by the terms in the denominator every time it multiplies the partial result with another term from the numerator.
A relatively sanitized version of my passing solution can be viewed here. I tried putting it up here, but due to the template it looks real bad.
This problem is definitely an interesting and tough combinatorics problem to get a grasp of, and I believe anyone with enough time should tackle it. The implementation alone is quite a fascinating process through which I learned a lot.
Also, this writeup is single-handedly the longest and possibly the most rigorous one I have ever done till now. I hope it helps whoever is interested in this problem.
Lastly, I would like to thank Vitaliy once more for his rapid and helpful responses whenever I had an issue with understanding the key concepts of the solution idea.