From 953c32a739eb4a446967e07b51e7eecec6631637 Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sat, 26 Oct 2024 22:54:57 -0400 Subject: [PATCH 01/15] On Parallel Binary Search --- src/num_methods/binary_search.md | 138 ++++++++++++++++++++++++++++++- 1 file changed, 137 insertions(+), 1 deletion(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index ae9b2aed1..ffc691a85 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -71,7 +71,7 @@ while (r - l > 1) { During the execution of the algorithm, we never evaluate neither $A_L$ nor $A_R$, as $L < M < R$. In the end, $L$ will be the index of the last element that is not greater than $k$ (or $-1$ if there is no such element) and $R$ will be the index of the first element larger than $k$ (or $n$ if there is no such element). -**Note.** Calculating `m` as `m = (r + l) / 2` can lead to overflow if `l` and `r` are two positive integers, and this error lived about 9 years in JDK as described in the [blogpost](https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html). Some alternative approaches include e.g. writing `m = l + (r - l) / 2` which always works for positive integer `l` and `r`, but might still overflow if `l` is a negative number. If you use C++20, it offers an alternative solution in the form of `m = std::midpoint(l, r)` which always works correctly. +**Note.** Calculating `m` as `m = (r + l) / 2` can lead to overflow if `l` and `r` are two positive integers, and this error lived about 9 years in JDK as described in the [blogpost](https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html). Some alternative approaches include e.g. writing `m = l + (r - l) / 2` which always works for positive integer `l` and `r`, but might still overflow if `l` is a negative number. If you use C++20, it offers an alternative solution in the form of `m = midpoint(l, r)` which always works correctly. ## Search on arbitrary predicate @@ -138,6 +138,134 @@ Another noteworthy way to do binary search is, instead of maintaining an active This paradigm is widely used in tasks around trees, such as finding lowest common ancestor of two vertices or finding an ancestor of a specific vertex that has a certain height. It could also be adapted to e.g. find the $k$-th non-zero element in a Fenwick tree. +## Parallel Binary Search + +[^1] Imagine that we want to answer $Z$ queries about the index of the largest value less than or equal to some $X_i$ (for $i=1,2,\ldots,Z$) in some sorted 0-indexed array $A$. Naturally, each query can be answered using binary search. + +Specifally, let us consider the following array $A$: + +| $A_0$ | $A_1$ | $A_2$ | $A_3$ | $A_4$ | $A_5$ | $A_6$ | $A_7$ | +|-------|-------|-------|-------|-------|-------|-------|-------| +| 1 | 3 | 5 | 7 | 9 | 9 | 13 | 15 | + +with queries: $X = [8,11,4,5]$. We can use binary search for each query sequentially. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
query$ X_1 = 8 $$ X_2 = 11 $$ X_3 = 4 $$ X_4 = 5 $
step 1answer in $[0,7]$answer in $[0,7]$answer in $[0,7]$answer in $[0,7]$
check $ A_4 $check $ A_4 $check $ A_4 $check $ A_4 $
$ X_1 < A_4 = 9 $$ X_2 \geq A_4 = 9 $$ X_3 < A_4 = 9 $$ X_4 < A_4 = 9 $
step 2answer in $[0,3]$answer in $[4,7]$answer in $[0,3]$answer in $[0,3]$
check $ A_2 $check $ A_6 $check $ A_2 $check $ A_2 $
$ X_1 \geq A_2 = 5 $$ X_2 < A_6 = 13 $$ X_3 < A_2 = 5 $$ X_4 \geq A_2 = 5 $
step 3answer in $[2,3]$answer in $[4,5]$answer in $[0,1]$answer in $[2,3]$
check $ A_3 $check $ A_5 $check $ A_1 $check $ A_3 $
$ X_1 \geq A_3 = 7 $$ X_2 \geq A_5 = 9 $$ X_3 \geq A_1 = 3 $$ X_4 < A_3 = 7 $
step 4answer in $[3,3]$answer in $[5,5]$answer in $[1,1]$answer in $[2,2]$
$ index = 3 $$ index = 5 $$ index = 1 $$ index = 2 $
+ + +We generally process this table by columns (queries), but notice that in each row we often repeat access to certain values of our array. To limit access to the values, we can process the table by rows (steps). This does not make huge difference in our small example problem (as we can access all elements in $\mathcal{O}(1)$), but in more complex problems, where computing these values is more complicated, this might be essential to solve these problems efficiently. Moreover, note that we can arbitrarily choose the order in which we answer questions in a single row. Let us look at the code implementing this approach. + +```cpp +// Computes the index of the largest value in table A less than or equal to $X_i$ for all $i$. +vector ParallelBinarySearch(vector& A, vector& X) { + int N = A.size(); + int M = X.size(); + vector P(M, -1); + vector Q(M, N-1); + + for (int step = 1; step <= ceil(log2(N)); ++step) { + // Map to store indices of queries asking for this value. + unordered_map> important_values; + + // Calculate mid and populate the important_values map. + for (int i = 0; i < M; ++i) { + int mid = (P[i] + Q[i]) / 2; + important_values[mid].push_back(i); + } + + // Process each value in important_values. + for (const auto& [mid, queries]: important_values) { + for (int query : queries) { + if (A[mid] > X[query]) { + Q[query] = mid; + } else { + P[query] = mid; + } + } + } + } + return P; +} +``` + ## Practice Problems * [LeetCode - Find First and Last Position of Element in Sorted Array](https://leetcode.com/problems/find-first-and-last-position-of-element-in-sorted-array/) @@ -154,3 +282,11 @@ This paradigm is widely used in tasks around trees, such as finding lowest commo * [Codeforces - GukiZ hates Boxes](https://codeforces.com/problemset/problem/551/C) * [Codeforces - Enduring Exodus](https://codeforces.com/problemset/problem/645/C) * [Codeforces - Chip 'n Dale Rescue Rangers](https://codeforces.com/problemset/problem/590/B) + +### Parallel Binary Search + +* [Szkopul - Meteors](https://szkopul.edu.pl/problemset/problem/7JrCYZ7LhEK4nBR5zbAXpcmM/site/?key=statement) +* [AtCoder - Stamp Rally](https://atcoder.jp/contests/agc002/tasks/agc002_d) + + +[^1]: Note that this section is following the description in [Sports programming in practice](https://kostka.dev/sp/). \ No newline at end of file From ab50edc2d39b2c29386ce3d555a917fb9f67efbe Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sat, 26 Oct 2024 23:09:57 -0400 Subject: [PATCH 02/15] Small fixes. --- src/num_methods/binary_search.md | 129 +++++++------------------------ 1 file changed, 28 insertions(+), 101 deletions(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index ffc691a85..854b220d2 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -71,7 +71,7 @@ while (r - l > 1) { During the execution of the algorithm, we never evaluate neither $A_L$ nor $A_R$, as $L < M < R$. In the end, $L$ will be the index of the last element that is not greater than $k$ (or $-1$ if there is no such element) and $R$ will be the index of the first element larger than $k$ (or $n$ if there is no such element). -**Note.** Calculating `m` as `m = (r + l) / 2` can lead to overflow if `l` and `r` are two positive integers, and this error lived about 9 years in JDK as described in the [blogpost](https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html). Some alternative approaches include e.g. writing `m = l + (r - l) / 2` which always works for positive integer `l` and `r`, but might still overflow if `l` is a negative number. If you use C++20, it offers an alternative solution in the form of `m = midpoint(l, r)` which always works correctly. +**Note.** Calculating `m` as `m = (r + l) / 2` can lead to overflow if `l` and `r` are two positive integers, and this error lived about 9 years in JDK as described in the [blogpost](https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html). Some alternative approaches include e.g. writing `m = l + (r - l) / 2` which always works for positive integer `l` and `r`, but might still overflow if `l` is a negative number. If you use C++20, it offers an alternative solution in the form of `m = std::midpoint(l, r)` which always works correctly. ## Search on arbitrary predicate @@ -138,126 +138,56 @@ Another noteworthy way to do binary search is, instead of maintaining an active This paradigm is widely used in tasks around trees, such as finding lowest common ancestor of two vertices or finding an ancestor of a specific vertex that has a certain height. It could also be adapted to e.g. find the $k$-th non-zero element in a Fenwick tree. -## Parallel Binary Search +## Parallel Binary Search -[^1] Imagine that we want to answer $Z$ queries about the index of the largest value less than or equal to some $X_i$ (for $i=1,2,\ldots,Z$) in some sorted 0-indexed array $A$. Naturally, each query can be answered using binary search. +Note that this section is following the description in [Sports programming in practice](https://kostka.dev/sp/). -Specifally, let us consider the following array $A$: - -| $A_0$ | $A_1$ | $A_2$ | $A_3$ | $A_4$ | $A_5$ | $A_6$ | $A_7$ | -|-------|-------|-------|-------|-------|-------|-------|-------| -| 1 | 3 | 5 | 7 | 9 | 9 | 13 | 15 | +Imagine that we want to answer $Z$ queries about the index of the largest value less than or equal to some $X_i$ (for $i=1,2,\ldots,Z$) in some sorted 0-indexed array $A$. Naturally, each query can be answered using binary search. +Specifally, let us consider the following array $A = [1,3,5,7,9,9,13,15]$ with queries: $X = [8,11,4,5]$. We can use binary search for each query sequentially. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
query$ X_1 = 8 $$ X_2 = 11 $$ X_3 = 4 $$ X_4 = 5 $
step 1answer in $[0,7]$answer in $[0,7]$answer in $[0,7]$answer in $[0,7]$
check $ A_4 $check $ A_4 $check $ A_4 $check $ A_4 $
$ X_1 < A_4 = 9 $$ X_2 \geq A_4 = 9 $$ X_3 < A_4 = 9 $$ X_4 < A_4 = 9 $
step 2answer in $[0,3]$answer in $[4,7]$answer in $[0,3]$answer in $[0,3]$
check $ A_2 $check $ A_6 $check $ A_2 $check $ A_2 $
$ X_1 \geq A_2 = 5 $$ X_2 < A_6 = 13 $$ X_3 < A_2 = 5 $$ X_4 \geq A_2 = 5 $
step 3answer in $[2,3]$answer in $[4,5]$answer in $[0,1]$answer in $[2,3]$
check $ A_3 $check $ A_5 $check $ A_1 $check $ A_3 $
$ X_1 \geq A_3 = 7 $$ X_2 \geq A_5 = 9 $$ X_3 \geq A_1 = 3 $$ X_4 < A_3 = 7 $
step 4answer in $[3,3]$answer in $[5,5]$answer in $[1,1]$answer in $[2,2]$
$ index = 3 $$ index = 5 $$ index = 1 $$ index = 2 $
- +| query | \( X_1 = 8 \) | \( X_2 = 11 \) | \( X_3 = 4 \) | \( X_4 = 5 \) | +|--------|------------------------|------------------------|-----------------------|-----------------------| +| **step 1** | answer in \([0,8)\) | answer in \([0,8)\) | answer in \([0,8)\) | answer in \([0,8)\) | +| | check \( A_4 \) | check \( A_4 \) | check \( A_4 \) | check \( A_4 \) | +| | \( X_1 < A_4 = 9 \) | \( X_2 \geq A_4 = 9 \) | \( X_3 < A_4 = 9 \) | \( X_4 < A_4 = 9 \) | +| **step 2** | answer in \([0,4)\) | answer in \([4,8)\) | answer in \([0,4)\) | answer in \([0,4)\) | +| | check \( A_2 \) | check \( A_6 \) | check \( A_2 \) | check \( A_2 \) | +| | \( X_1 \geq A_2 = 5 \) | \( X_2 < A_6 = 13 \) | \( X_3 < A_2 = 5 \) | \( X_4 \geq A_2 = 5 \) | +| **step 3** | answer in \([2,4)\) | answer in \([4,6)\) | answer in \([0,2)\) | answer in \([2,4)\) | +| | check \( A_3 \) | check \( A_5 \) | check \( A_1 \) | check \( A_3 \) | +| | \( X_1 \geq A_3 = 7 \) | \( X_2 \geq A_5 = 9 \) | \( X_3 \geq A_1 = 3 \) | \( X_4 < A_3 = 7 \) | +| **step 4** | answer in \([3,4)\) | answer in \([5,6)\) | answer in \([1,2)\) | answer in \([2,3)\) | +| | \( index = 3 \) | \( index = 5 \) | \( index = 1 \) | \( index = 2 \) | We generally process this table by columns (queries), but notice that in each row we often repeat access to certain values of our array. To limit access to the values, we can process the table by rows (steps). This does not make huge difference in our small example problem (as we can access all elements in $\mathcal{O}(1)$), but in more complex problems, where computing these values is more complicated, this might be essential to solve these problems efficiently. Moreover, note that we can arbitrarily choose the order in which we answer questions in a single row. Let us look at the code implementing this approach. -```cpp +```{.cpp file=parallel-binary-search} // Computes the index of the largest value in table A less than or equal to $X_i$ for all $i$. vector ParallelBinarySearch(vector& A, vector& X) { int N = A.size(); int M = X.size(); - vector P(M, -1); - vector Q(M, N-1); + vector left(M, -1); + vector right(M, N-1); for (int step = 1; step <= ceil(log2(N)); ++step) { // Map to store indices of queries asking for this value. - unordered_map> important_values; + unordered_map> mid_to_queries; // Calculate mid and populate the important_values map. for (int i = 0; i < M; ++i) { - int mid = (P[i] + Q[i]) / 2; - important_values[mid].push_back(i); + int mid = (left[i] + right[i]) / 2; + mid_to_queries[mid].push_back(i); } // Process each value in important_values. - for (const auto& [mid, queries]: important_values) { + for (const auto& [mid, queries]: mid_to_queries) { for (int query : queries) { if (A[mid] > X[query]) { - Q[query] = mid; + right[query] = mid; } else { - P[query] = mid; + left[query] = mid; } } } @@ -286,7 +216,4 @@ vector ParallelBinarySearch(vector& A, vector& X) { ### Parallel Binary Search * [Szkopul - Meteors](https://szkopul.edu.pl/problemset/problem/7JrCYZ7LhEK4nBR5zbAXpcmM/site/?key=statement) -* [AtCoder - Stamp Rally](https://atcoder.jp/contests/agc002/tasks/agc002_d) - - -[^1]: Note that this section is following the description in [Sports programming in practice](https://kostka.dev/sp/). \ No newline at end of file +* [AtCoder - Stamp Rally](https://atcoder.jp/contests/agc002/tasks/agc002_d) \ No newline at end of file From 5b033469c92e788e5f729c6293b4a2be90e01d85 Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sat, 26 Oct 2024 23:11:01 -0400 Subject: [PATCH 03/15] Remove latex in comments. --- src/num_methods/binary_search.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index 854b220d2..71cf12587 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -164,7 +164,7 @@ with queries: $X = [8,11,4,5]$. We can use binary search for each query sequenti We generally process this table by columns (queries), but notice that in each row we often repeat access to certain values of our array. To limit access to the values, we can process the table by rows (steps). This does not make huge difference in our small example problem (as we can access all elements in $\mathcal{O}(1)$), but in more complex problems, where computing these values is more complicated, this might be essential to solve these problems efficiently. Moreover, note that we can arbitrarily choose the order in which we answer questions in a single row. Let us look at the code implementing this approach. ```{.cpp file=parallel-binary-search} -// Computes the index of the largest value in table A less than or equal to $X_i$ for all $i$. +// Computes the index of the largest value in table A less than or equal to X_i for all i. vector ParallelBinarySearch(vector& A, vector& X) { int N = A.size(); int M = X.size(); From 4111f24c23a2c3f661a0746770409f104f6496d2 Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sat, 26 Oct 2024 23:15:34 -0400 Subject: [PATCH 04/15] Fix naming in comments. --- src/num_methods/binary_search.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index 71cf12587..a30a834b5 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -175,13 +175,13 @@ vector ParallelBinarySearch(vector& A, vector& X) { // Map to store indices of queries asking for this value. unordered_map> mid_to_queries; - // Calculate mid and populate the important_values map. + // Calculate mid and populate the mid_to_queries map. for (int i = 0; i < M; ++i) { int mid = (left[i] + right[i]) / 2; mid_to_queries[mid].push_back(i); } - // Process each value in important_values. + // Process each value in mid_to_queries. for (const auto& [mid, queries]: mid_to_queries) { for (int query : queries) { if (A[mid] > X[query]) { From f98586324ec093890d1ea86a8d75280d634b2086 Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sat, 26 Oct 2024 23:16:56 -0400 Subject: [PATCH 05/15] Unify sides in conditions. --- src/num_methods/binary_search.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index a30a834b5..a9d4c7165 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -184,7 +184,7 @@ vector ParallelBinarySearch(vector& A, vector& X) { // Process each value in mid_to_queries. for (const auto& [mid, queries]: mid_to_queries) { for (int query : queries) { - if (A[mid] > X[query]) { + if (X[query] < A[mid]>) { right[query] = mid; } else { left[query] = mid; From 48d87ff436af517ec2fd7b117f2f803d176d9de1 Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sat, 26 Oct 2024 23:17:05 -0400 Subject: [PATCH 06/15] Typo. --- src/num_methods/binary_search.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index a9d4c7165..71afdbe02 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -184,7 +184,7 @@ vector ParallelBinarySearch(vector& A, vector& X) { // Process each value in mid_to_queries. for (const auto& [mid, queries]: mid_to_queries) { for (int query : queries) { - if (X[query] < A[mid]>) { + if (X[query] < A[mid]) { right[query] = mid; } else { left[query] = mid; From bdc32787e6894303e9602aeea706a2cac07ee05b Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sat, 26 Oct 2024 23:18:08 -0400 Subject: [PATCH 07/15] Tenses. --- src/num_methods/binary_search.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index 71afdbe02..a1e250c17 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -140,7 +140,7 @@ This paradigm is widely used in tasks around trees, such as finding lowest commo ## Parallel Binary Search -Note that this section is following the description in [Sports programming in practice](https://kostka.dev/sp/). +Note that this section follows the description in [Sports programming in practice](https://kostka.dev/sp/). Imagine that we want to answer $Z$ queries about the index of the largest value less than or equal to some $X_i$ (for $i=1,2,\ldots,Z$) in some sorted 0-indexed array $A$. Naturally, each query can be answered using binary search. From f821353a40e215c28a4bc8aa76b40ade62a14db2 Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sun, 27 Oct 2024 04:16:50 +0000 Subject: [PATCH 08/15] Some fixes. --- src/num_methods/binary_search.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index a1e250c17..a5a3bde4d 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -142,9 +142,9 @@ This paradigm is widely used in tasks around trees, such as finding lowest commo Note that this section follows the description in [Sports programming in practice](https://kostka.dev/sp/). -Imagine that we want to answer $Z$ queries about the index of the largest value less than or equal to some $X_i$ (for $i=1,2,\ldots,Z$) in some sorted 0-indexed array $A$. Naturally, each query can be answered using binary search. +Imagine that we want to answer $Z$ queries about the index of the largest value less than or equal to some $X_i$ (for $i=1,2,\ldots,Z$) in a sorted 0-indexed array $A$. Naturally, each query can be answered using binary search. -Specifally, let us consider the following array $A = [1,3,5,7,9,9,13,15]$ +Specifically, let us consider the following array $A = [1,3,5,7,9,9,13,15]$ with queries: $X = [8,11,4,5]$. We can use binary search for each query sequentially. | query | \( X_1 = 8 \) | \( X_2 = 11 \) | \( X_3 = 4 \) | \( X_4 = 5 \) | @@ -161,11 +161,11 @@ with queries: $X = [8,11,4,5]$. We can use binary search for each query sequenti | **step 4** | answer in \([3,4)\) | answer in \([5,6)\) | answer in \([1,2)\) | answer in \([2,3)\) | | | \( index = 3 \) | \( index = 5 \) | \( index = 1 \) | \( index = 2 \) | -We generally process this table by columns (queries), but notice that in each row we often repeat access to certain values of our array. To limit access to the values, we can process the table by rows (steps). This does not make huge difference in our small example problem (as we can access all elements in $\mathcal{O}(1)$), but in more complex problems, where computing these values is more complicated, this might be essential to solve these problems efficiently. Moreover, note that we can arbitrarily choose the order in which we answer questions in a single row. Let us look at the code implementing this approach. +We generally process this table by columns (queries), but notice that in each row we often repeat access to certain values of the array. To limit access to these values, we can process the table by rows (steps). This does not make huge difference in our small example problem (as we can access all elements in $\mathcal{O}(1)$), but in more complex problems, where computing these values is more complicated, this might be essential to solve these problems efficiently. Moreover, note that we can arbitrarily choose the order in which we answer questions in a single row. Let us look at the code implementing this approach. ```{.cpp file=parallel-binary-search} -// Computes the index of the largest value in table A less than or equal to X_i for all i. -vector ParallelBinarySearch(vector& A, vector& X) { +// Computes the index of the largest value in a sorted array A less than or equal to X_i for all i. +vector parallel_binary_search(vector& A, vector& X) { int N = A.size(); int M = X.size(); vector left(M, -1); @@ -192,7 +192,7 @@ vector ParallelBinarySearch(vector& A, vector& X) { } } } - return P; + return left; } ``` From c37c3522cbaae02051d350d0aef976d7109e0fdf Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sun, 27 Oct 2024 04:24:09 +0000 Subject: [PATCH 09/15] Fixes 2. --- src/num_methods/binary_search.md | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index a5a3bde4d..f23f8f0c1 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -168,31 +168,30 @@ We generally process this table by columns (queries), but notice that in each ro vector parallel_binary_search(vector& A, vector& X) { int N = A.size(); int M = X.size(); - vector left(M, -1); - vector right(M, N-1); + vector l(M, -1), r(M, N-1); for (int step = 1; step <= ceil(log2(N)); ++step) { // Map to store indices of queries asking for this value. - unordered_map> mid_to_queries; + unordered_map> m_to_queries; - // Calculate mid and populate the mid_to_queries map. + // Calculate middle point and populate the m_to_queries map. for (int i = 0; i < M; ++i) { - int mid = (left[i] + right[i]) / 2; - mid_to_queries[mid].push_back(i); + int m = (l[i] + r[i]) / 2; + m_to_queries[m].push_back(i); } - // Process each value in mid_to_queries. - for (const auto& [mid, queries]: mid_to_queries) { + // Process each value in m_to_queries. + for (const auto& [m, queries]: m_to_queries) { for (int query : queries) { - if (X[query] < A[mid]) { - right[query] = mid; + if (X[query] < A[m]) { + r[query] = m; } else { - left[query] = mid; + l[query] = m; } } } } - return left; + return l; } ``` From cc161af1441727918959a3f5a30328cec803d5a4 Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sun, 27 Oct 2024 04:25:25 +0000 Subject: [PATCH 10/15] N-1 to N. --- src/num_methods/binary_search.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index f23f8f0c1..6a8f4ed24 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -168,7 +168,7 @@ We generally process this table by columns (queries), but notice that in each ro vector parallel_binary_search(vector& A, vector& X) { int N = A.size(); int M = X.size(); - vector l(M, -1), r(M, N-1); + vector l(M, -1), r(M, N); for (int step = 1; step <= ceil(log2(N)); ++step) { // Map to store indices of queries asking for this value. From e2c9ab576ae4e6ae2e07dfad2b93579cf4945d2a Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sun, 10 Aug 2025 20:40:14 -0400 Subject: [PATCH 11/15] Address comments. --- src/num_methods/binary_search.md | 99 +++++++++++++++++++------------- 1 file changed, 60 insertions(+), 39 deletions(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index 6a8f4ed24..4b19755ac 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -28,9 +28,9 @@ When it is impossible to pick $M$, that is, when $R = L + 1$, we directly compa Since in the worst case we will always reduce to larger segment of $[L, M]$ and $[M, R]$. Thus, in the worst case scenario the reduction would be from $R-L$ to $\max(M-L, R-M)$. To minimize this value, we should pick $M \approx \frac{L+R}{2}$, then -$$ +$ M-L \approx \frac{R-L}{2} \approx R-M. -$$ +$ In other words, from the worst-case scenario perspective it is optimal to always pick $M$ in the middle of $[L, R]$ and split it in half. Thus, the active segment halves on each step until it becomes of size $1$. So, if the process needs $h$ steps, in the end it reduces the difference between $R$ and $L$ from $R-L$ to $\frac{R-L}{2^h} \approx 1$, giving us the equation $2^h \approx R-L$. @@ -77,9 +77,9 @@ During the execution of the algorithm, we never evaluate neither $A_L$ nor $A_R$ Let $f : \{0,1,\dots, n-1\} \to \{0, 1\}$ be a boolean function defined on $0,1,\dots,n-1$ such that it is monotonously increasing, that is -$$ +$ f(0) \leq f(1) \leq \dots \leq f(n-1). -$$ +$ The binary search, the way it is described above, finds the partition of the array by the predicate $f(M)$, holding the boolean value of $k < A_M$ expression. It is possible to use arbitrary monotonous predicate instead of $k < A_M$. It is particularly useful when the computation of $f(k)$ requires too much time to actually compute it for every possible value. @@ -104,21 +104,21 @@ while (r - l > 1) { Such situation often occurs when we're asked to compute some value, but we're only capable of checking whether this value is at least $i$. For example, you're given an array $a_1,\dots,a_n$ and you're asked to find the maximum floored average sum -$$ +$ \left \lfloor \frac{a_l + a_{l+1} + \dots + a_r}{r-l+1} \right\rfloor -$$ +$ among all possible pairs of $l,r$ such that $r-l \geq x$. One of simple ways to solve this problem is to check whether the answer is at least $\lambda$, that is if there is a pair $l, r$ such that the following is true: -$$ +$ \frac{a_l + a_{l+1} + \dots + a_r}{r-l+1} \geq \lambda. -$$ +$ Equivalently, it rewrites as -$$ +$ (a_l - \lambda) + (a_{l+1} - \lambda) + \dots + (a_r - \lambda) \geq 0, -$$ +$ so now we need to check whether there is a subarray of a new array $a_i - \lambda$ of length at least $x+1$ with non-negative sum, which is doable with some prefix sums. @@ -140,49 +140,70 @@ This paradigm is widely used in tasks around trees, such as finding lowest commo ## Parallel Binary Search -Note that this section follows the description in [Sports programming in practice](https://kostka.dev/sp/). +When we are faced with multiple queries that can each be solved with a binary search, it is sometimes too slow to solve them one by one. Parallel Binary Search is a technique that allows us to solve all of these queries simultaneously, often leading to a significant performance improvement. The main idea is to perform the binary search for all queries at the same time, step by step. This is particularly effective when the check function for the binary search is costly and can be optimized by processing queries in batches. -Imagine that we want to answer $Z$ queries about the index of the largest value less than or equal to some $X_i$ (for $i=1,2,\ldots,Z$) in a sorted 0-indexed array $A$. Naturally, each query can be answered using binary search. +### Motivation + +Consider a scenario where we have $Q$ queries, and for each query $q$, we need to find the smallest value $x$ that satisfies a certain condition $P(q, x)$. If $P(q, x)$ is monotonic on $x$, we can use binary search for each query. This would result in a total complexity of $O(Q \cdot \log(\t{range}) \cdot T_{check})$, where $T_{check}$ is the time to evaluate $P(q, x)$. + +Parallel binary search optimizes this by changing the order of operations. Instead of processing each query independently, we process all queries simultaneously, step by step. In each step of the binary search, we compute the middle points $m_i$ for all queries $q_i$ and group the queries by their middle point. This is particularly powerful if the check function $P(q, x)$ has a structure that allows for efficient batching or updates. + +Specifically, the major performance gain comes from two scenarios: +1. **Batching expensive checks:** If multiple queries need to check the same value $m$ in a given step, we can perform the expensive part of the check only once and reuse the result. +2. **Efficiently updatable checks:** Often, the check for a value $m$ (e.g., "process the first $m$ events") can be performed much faster if we have already computed the state for $m-1$. By processing the check values $m$ in increasing order, we can update the state from one check to the next, instead of recomputing from scratch each time. This is a very common pattern in problems involving time or a sequence of updates. + +This "offline" processing of queries, where we collect all queries and answer them together in a way that is convenient for our data structures, is the core idea behind parallel binary search. + +### Example Application: Meteors + +A classic example is the "Meteors" problem, which is listed in the practice problems. We are given $N$ countries, and for each country, a target number of meteors to collect. We are also given a sequence of $K$ meteor showers, each affecting a range of countries. The goal is to find, for each country, the earliest time (i.e., which meteor shower) they reach their target. + +For a single country, we could binary search for the answer from $1$ to $K$. The check for a given time $t$ would involve summing up the meteors from the first $t$ showers for that country. A naive check takes $O(t)$ time, leading to an overall complexity of $O(K \log K)$ for one country, and $O(N \cdot K \log K)$ for all, which is too slow. + +With parallel binary search, we search for the answer for all $N$ countries at once. In each of the $O(\log K)$ steps, we have a set of check values $t_i$ for the countries. We can process these $t_i$ in increasing order. To perform the check for time $t$, we can use a data structure like a Fenwick tree or a segment tree to maintain the meteor counts for all countries. When moving from checking time $t_i$ to $t_{i+1}$, we only need to add the effects of showers from $t_i+1$ to $t_{i+1}$ to our data structure. This "update" approach is much faster than recomputing from scratch. The total complexity becomes something like $O((N+K)\log N \log K)$, a significant improvement. + +### Implementation + +Now, let's go back to the simple problem to see the implementation structure. Imagine that we want to answer $Z$ queries about the index of the largest value less than or equal to some $X_i$ (for $i=1,2,\ldots,Z$) in a sorted 0-indexed array $A$. Naturally, each query can be answered using binary search. Specifically, let us consider the following array $A = [1,3,5,7,9,9,13,15]$ with queries: $X = [8,11,4,5]$. We can use binary search for each query sequentially. -| query | \( X_1 = 8 \) | \( X_2 = 11 \) | \( X_3 = 4 \) | \( X_4 = 5 \) | -|--------|------------------------|------------------------|-----------------------|-----------------------| -| **step 1** | answer in \([0,8)\) | answer in \([0,8)\) | answer in \([0,8)\) | answer in \([0,8)\) | -| | check \( A_4 \) | check \( A_4 \) | check \( A_4 \) | check \( A_4 \) | -| | \( X_1 < A_4 = 9 \) | \( X_2 \geq A_4 = 9 \) | \( X_3 < A_4 = 9 \) | \( X_4 < A_4 = 9 \) | -| **step 2** | answer in \([0,4)\) | answer in \([4,8)\) | answer in \([0,4)\) | answer in \([0,4)\) | -| | check \( A_2 \) | check \( A_6 \) | check \( A_2 \) | check \( A_2 \) | -| | \( X_1 \geq A_2 = 5 \) | \( X_2 < A_6 = 13 \) | \( X_3 < A_2 = 5 \) | \( X_4 \geq A_2 = 5 \) | -| **step 3** | answer in \([2,4)\) | answer in \([4,6)\) | answer in \([0,2)\) | answer in \([2,4)\) | -| | check \( A_3 \) | check \( A_5 \) | check \( A_1 \) | check \( A_3 \) | -| | \( X_1 \geq A_3 = 7 \) | \( X_2 \geq A_5 = 9 \) | \( X_3 \geq A_1 = 3 \) | \( X_4 < A_3 = 7 \) | -| **step 4** | answer in \([3,4)\) | answer in \([5,6)\) | answer in \([1,2)\) | answer in \([2,3)\) | -| | \( index = 3 \) | \( index = 5 \) | \( index = 1 \) | \( index = 2 \) | - -We generally process this table by columns (queries), but notice that in each row we often repeat access to certain values of the array. To limit access to these values, we can process the table by rows (steps). This does not make huge difference in our small example problem (as we can access all elements in $\mathcal{O}(1)$), but in more complex problems, where computing these values is more complicated, this might be essential to solve these problems efficiently. Moreover, note that we can arbitrarily choose the order in which we answer questions in a single row. Let us look at the code implementing this approach. +| query | \( X_1 = 8 \) | \( X_2 = 11 \) | \( X_3 = 4 \) | \( X_4 = 5 \) | +|:-----:|:------------------------------------------------------------------:|:-------------------------------------------------------------------:|:------------------------------------------------------------------:|:------------------------------------------------------------------:| +| **step 1** | answer in \([0,8)\)
check \( A_4 \)
\( X_1 < A_4 = 9 \) | answer in \([0,8)\)
check \( A_4 \)
\( X_2 \geq A_4 = 9 \) | answer in \([0,8)\)
check \( A_4 \)
\( X_3 < A_4 = 9 \) | answer in \([0,8)\)
check \( A_4 \)
\( X_4 < A_4 = 9 \) | +| **step 2** | answer in \([0,4)\)
check \( A_2 \)
\( X_1 \geq A_2 = 5 \) | answer in \([4,8)\)
check \( A_6 \)
\( X_2 < A_6 = 13 \) | answer in \([0,4)\)
check \( A_2 \)
\( X_3 < A_2 = 5 \) | answer in \([0,4)\)
check \( A_2 \)
\( X_4 \geq A_2 = 5 \) | +| **step 3** | answer in \([2,4)\)
check \( A_3 \)
\( X_1 \geq A_3 = 7 \) | answer in \([4,6)\)
check \( A_5 \)
\( X_2 \geq A_5 = 9 \) | answer in \([0,2)\)
check \( A_1 \)
\( X_3 \geq A_1 = 3 \) | answer in \([2,4)\)
check \( A_3 \)
\( X_4 < A_3 = 7 \) | +| **step 4** | answer in \([3,4)\)
\( index = 3 \) | answer in \([5,6)\)
\( index = 5 \) | answer in \([1,2)\)
\( index = 1 \) | answer in \([2,3)\)
\( index = 2 \) | + +We generally process this table by columns (queries), but notice that in each row we often repeat access to certain values of the array. To limit access to these values, we can process the table by rows (steps). This does not make huge difference in our small example problem (as we can access all elements in $O(1)$), but in more complex problems, where computing these values is more complicated, this might be essential to solve these problems efficiently. Moreover, note that we can arbitrarily choose the order in which we answer questions in a single row. Let us look at the code implementing this approach. ```{.cpp file=parallel-binary-search} // Computes the index of the largest value in a sorted array A less than or equal to X_i for all i. vector parallel_binary_search(vector& A, vector& X) { int N = A.size(); - int M = X.size(); - vector l(M, -1), r(M, N); + int Z = X.size(); + vector l(Z, -1), r(Z, N); for (int step = 1; step <= ceil(log2(N)); ++step) { - // Map to store indices of queries asking for this value. - unordered_map> m_to_queries; - - // Calculate middle point and populate the m_to_queries map. - for (int i = 0; i < M; ++i) { - int m = (l[i] + r[i]) / 2; - m_to_queries[m].push_back(i); + // A vector of vectors to store indices of queries for each middle point. + // This is generally faster and safer than std::unordered_map in competitive programming. + vector> m_to_queries(N); + + // Group queries by their middle point. + for (int i = 0; i < Z; ++i) { + if (l[i] < r[i] - 1) { + int m = l[i] + (r[i] - l[i]) / 2; + m_to_queries[m].push_back(i); + } } - // Process each value in m_to_queries. - for (const auto& [m, queries]: m_to_queries) { - for (int query : queries) { + // Process each group of queries. + for (int m = 0; m < N; ++m) { + if (m_to_queries[m].empty()) { + continue; + } + for (int query : m_to_queries[m]) { if (X[query] < A[m]) { r[query] = m; } else { From 72fa65d8ebd7be4767ce9045142696c58c39d4f7 Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sun, 10 Aug 2025 20:45:17 -0400 Subject: [PATCH 12/15] Address comments 2. --- src/num_methods/binary_search.md | 58 +++++++++++++++----------------- 1 file changed, 27 insertions(+), 31 deletions(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index 4b19755ac..285ca32f3 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -28,9 +28,9 @@ When it is impossible to pick $M$, that is, when $R = L + 1$, we directly compa Since in the worst case we will always reduce to larger segment of $[L, M]$ and $[M, R]$. Thus, in the worst case scenario the reduction would be from $R-L$ to $\max(M-L, R-M)$. To minimize this value, we should pick $M \approx \frac{L+R}{2}$, then -$ +$$ M-L \approx \frac{R-L}{2} \approx R-M. -$ +$$ In other words, from the worst-case scenario perspective it is optimal to always pick $M$ in the middle of $[L, R]$ and split it in half. Thus, the active segment halves on each step until it becomes of size $1$. So, if the process needs $h$ steps, in the end it reduces the difference between $R$ and $L$ from $R-L$ to $\frac{R-L}{2^h} \approx 1$, giving us the equation $2^h \approx R-L$. @@ -77,9 +77,9 @@ During the execution of the algorithm, we never evaluate neither $A_L$ nor $A_R$ Let $f : \{0,1,\dots, n-1\} \to \{0, 1\}$ be a boolean function defined on $0,1,\dots,n-1$ such that it is monotonously increasing, that is -$ +$$ f(0) \leq f(1) \leq \dots \leq f(n-1). -$ +$$ The binary search, the way it is described above, finds the partition of the array by the predicate $f(M)$, holding the boolean value of $k < A_M$ expression. It is possible to use arbitrary monotonous predicate instead of $k < A_M$. It is particularly useful when the computation of $f(k)$ requires too much time to actually compute it for every possible value. @@ -104,21 +104,21 @@ while (r - l > 1) { Such situation often occurs when we're asked to compute some value, but we're only capable of checking whether this value is at least $i$. For example, you're given an array $a_1,\dots,a_n$ and you're asked to find the maximum floored average sum -$ +$$ \left \lfloor \frac{a_l + a_{l+1} + \dots + a_r}{r-l+1} \right\rfloor -$ +$$ among all possible pairs of $l,r$ such that $r-l \geq x$. One of simple ways to solve this problem is to check whether the answer is at least $\lambda$, that is if there is a pair $l, r$ such that the following is true: -$ +$$ \frac{a_l + a_{l+1} + \dots + a_r}{r-l+1} \geq \lambda. -$ +$$ Equivalently, it rewrites as -$ +$$ (a_l - \lambda) + (a_{l+1} - \lambda) + \dots + (a_r - \lambda) \geq 0, -$ +$$ so now we need to check whether there is a subarray of a new array $a_i - \lambda$ of length at least $x+1$ with non-negative sum, which is doable with some prefix sums. @@ -140,13 +140,11 @@ This paradigm is widely used in tasks around trees, such as finding lowest commo ## Parallel Binary Search -When we are faced with multiple queries that can each be solved with a binary search, it is sometimes too slow to solve them one by one. Parallel Binary Search is a technique that allows us to solve all of these queries simultaneously, often leading to a significant performance improvement. The main idea is to perform the binary search for all queries at the same time, step by step. This is particularly effective when the check function for the binary search is costly and can be optimized by processing queries in batches. - -### Motivation +When we are faced with multiple queries that can each be solved with a binary search, it is sometimes too slow to solve them one by one. Parallel Binary Search is a technique that allows us to solve all of these queries simultaneously, often leading to a significant performance improvement. The main idea is to perform the binary search for all queries at the same time (in parallel), step by step. This is particularly effective when the check function for the binary search is costly and can be optimized by processing queries in batches. Consider a scenario where we have $Q$ queries, and for each query $q$, we need to find the smallest value $x$ that satisfies a certain condition $P(q, x)$. If $P(q, x)$ is monotonic on $x$, we can use binary search for each query. This would result in a total complexity of $O(Q \cdot \log(\t{range}) \cdot T_{check})$, where $T_{check}$ is the time to evaluate $P(q, x)$. -Parallel binary search optimizes this by changing the order of operations. Instead of processing each query independently, we process all queries simultaneously, step by step. In each step of the binary search, we compute the middle points $m_i$ for all queries $q_i$ and group the queries by their middle point. This is particularly powerful if the check function $P(q, x)$ has a structure that allows for efficient batching or updates. +Parallel binary search optimizes this by changing the order of operations. Instead of processing each query independently, we process all queries simultaneously, step by step. In each step of the binary search, we compute the middle points $m_i$ for all queries and group the queries by their middle point. This is particularly powerful if the check function $P(q, x)$ has a structure that allows for efficient batching or updates. Specifically, the major performance gain comes from two scenarios: 1. **Batching expensive checks:** If multiple queries need to check the same value $m$ in a given step, we can perform the expensive part of the check only once and reuse the result. @@ -154,27 +152,18 @@ Specifically, the major performance gain comes from two scenarios: This "offline" processing of queries, where we collect all queries and answer them together in a way that is convenient for our data structures, is the core idea behind parallel binary search. -### Example Application: Meteors - -A classic example is the "Meteors" problem, which is listed in the practice problems. We are given $N$ countries, and for each country, a target number of meteors to collect. We are also given a sequence of $K$ meteor showers, each affecting a range of countries. The goal is to find, for each country, the earliest time (i.e., which meteor shower) they reach their target. - -For a single country, we could binary search for the answer from $1$ to $K$. The check for a given time $t$ would involve summing up the meteors from the first $t$ showers for that country. A naive check takes $O(t)$ time, leading to an overall complexity of $O(K \log K)$ for one country, and $O(N \cdot K \log K)$ for all, which is too slow. - -With parallel binary search, we search for the answer for all $N$ countries at once. In each of the $O(\log K)$ steps, we have a set of check values $t_i$ for the countries. We can process these $t_i$ in increasing order. To perform the check for time $t$, we can use a data structure like a Fenwick tree or a segment tree to maintain the meteor counts for all countries. When moving from checking time $t_i$ to $t_{i+1}$, we only need to add the effects of showers from $t_i+1$ to $t_{i+1}$ to our data structure. This "update" approach is much faster than recomputing from scratch. The total complexity becomes something like $O((N+K)\log N \log K)$, a significant improvement. - ### Implementation -Now, let's go back to the simple problem to see the implementation structure. Imagine that we want to answer $Z$ queries about the index of the largest value less than or equal to some $X_i$ (for $i=1,2,\ldots,Z$) in a sorted 0-indexed array $A$. Naturally, each query can be answered using binary search. +Imagine that we want to answer $Z$ queries about the index of the largest value less than or equal to some $X_i$ (for $i=1,2,\ldots,Z$) in a sorted 0-indexed array $A$. Naturally, each query can be answered using binary search. Specifically, let us consider the following array $A = [1,3,5,7,9,9,13,15]$ with queries: $X = [8,11,4,5]$. We can use binary search for each query sequentially. - -| query | \( X_1 = 8 \) | \( X_2 = 11 \) | \( X_3 = 4 \) | \( X_4 = 5 \) | -|:-----:|:------------------------------------------------------------------:|:-------------------------------------------------------------------:|:------------------------------------------------------------------:|:------------------------------------------------------------------:| -| **step 1** | answer in \([0,8)\)
check \( A_4 \)
\( X_1 < A_4 = 9 \) | answer in \([0,8)\)
check \( A_4 \)
\( X_2 \geq A_4 = 9 \) | answer in \([0,8)\)
check \( A_4 \)
\( X_3 < A_4 = 9 \) | answer in \([0,8)\)
check \( A_4 \)
\( X_4 < A_4 = 9 \) | -| **step 2** | answer in \([0,4)\)
check \( A_2 \)
\( X_1 \geq A_2 = 5 \) | answer in \([4,8)\)
check \( A_6 \)
\( X_2 < A_6 = 13 \) | answer in \([0,4)\)
check \( A_2 \)
\( X_3 < A_2 = 5 \) | answer in \([0,4)\)
check \( A_2 \)
\( X_4 \geq A_2 = 5 \) | -| **step 3** | answer in \([2,4)\)
check \( A_3 \)
\( X_1 \geq A_3 = 7 \) | answer in \([4,6)\)
check \( A_5 \)
\( X_2 \geq A_5 = 9 \) | answer in \([0,2)\)
check \( A_1 \)
\( X_3 \geq A_1 = 3 \) | answer in \([2,4)\)
check \( A_3 \)
\( X_4 < A_3 = 7 \) | -| **step 4** | answer in \([3,4)\)
\( index = 3 \) | answer in \([5,6)\)
\( index = 5 \) | answer in \([1,2)\)
\( index = 1 \) | answer in \([2,3)\)
\( index = 2 \) | +| Query | \( X_1 = 8 \) | \( X_2 = 11 \) | \( X_3 = 4 \) | \( X_4 = 5 \) | +|--------|:----------------------------------------:|:-----------------------------------------:|:------------------------------------------:|:------------------------------------------:| +| **Step 1** | Answer in \([0,8)\)
Check \( A_4 \)
\( X_1 < A_4 = 9 \) | Answer in \([0,8)\)
Check \( A_4 \)
\( X_2 \geq A_4 = 9 \) | Answer in \([0,8)\)
Check \( A_4 \)
\( X_3 < A_4 = 9 \) | Answer in \([0,8)\)
Check \( A_4 \)
\( X_4 < A_4 = 9 \) | +| **Step 2** | Answer in \([0,4)\)
Check \( A_2 \)
\( X_1 \geq A_2 = 5 \) | Answer in \([4,8)\)
Check \( A_6 \)
\( X_2 < A_6 = 13 \) | Answer in \([0,4)\)
Check \( A_2 \)
\( X_3 < A_2 = 5 \) | Answer in \([0,4)\)
Check \( A_2 \)
\( X_4 \geq A_2 = 5 \) | +| **Step 3** | Answer in \([2,4)\)
Check \( A_3 \)
\( X_1 \geq A_3 = 7 \) | Answer in \([4,6)\)
Check \( A_5 \)
\( X_2 \geq A_5 = 9 \) | Answer in \([0,2)\)
Check \( A_1 \)
\( X_3 \geq A_1 = 3 \) | Answer in \([2,4)\)
Check \( A_3 \)
\( X_4 < A_3 = 7 \) | +| **Step 4** | Answer in \([3,4)\)
\( index = 3 \) | Answer in \([5,6)\)
\( index = 5 \) | Answer in \([1,2)\)
\( index = 1 \) | Answer in \([2,3)\)
\( index = 2 \) | We generally process this table by columns (queries), but notice that in each row we often repeat access to certain values of the array. To limit access to these values, we can process the table by rows (steps). This does not make huge difference in our small example problem (as we can access all elements in $O(1)$), but in more complex problems, where computing these values is more complicated, this might be essential to solve these problems efficiently. Moreover, note that we can arbitrarily choose the order in which we answer questions in a single row. Let us look at the code implementing this approach. @@ -187,7 +176,6 @@ vector parallel_binary_search(vector& A, vector& X) { for (int step = 1; step <= ceil(log2(N)); ++step) { // A vector of vectors to store indices of queries for each middle point. - // This is generally faster and safer than std::unordered_map in competitive programming. vector> m_to_queries(N); // Group queries by their middle point. @@ -216,6 +204,14 @@ vector parallel_binary_search(vector& A, vector& X) { } ``` +### Example Problem: Meteors + +A pretty well known problem using this method is called "Meteors", and is listed in the practice problems. We are given $N$ countries, and for each country, a target number of meteors to collect. We are also given a sequence of $K$ meteor showers, each affecting a range of countries. The goal is to find, for each country, the earliest time (i.e., which meteor shower) they reach their target. + +For a single country, we could binary search for the answer from $1$ to $K$. The check for a given time $t$ would involve summing up the meteors from the first $t$ showers for that country. A naive check takes $O(t)$ time, leading to an overall complexity of $O(K \log K)$ for one country, and $O(N \cdot K \log K)$ for all, which is too slow. + +With parallel binary search, we search for the answer for all $N$ countries at once. In each of the $O(\log K)$ steps, we have a set of check values $t_i$ for the countries. We can process these $t_i$ in increasing order. To perform the check for time $t$, we can use a data structure like a Fenwick tree or a segment tree to maintain the meteor counts for all countries. When moving from checking time $t_i$ to $t_{i+1}$, we only need to add the effects of showers from $t_i+1$ to $t_{i+1}$ to our data structure. This "update" approach is much faster than recomputing from scratch. The total complexity becomes something like $O((N+K)\log N \log K)$, a significant improvement. + ## Practice Problems * [LeetCode - Find First and Last Position of Element in Sorted Array](https://leetcode.com/problems/find-first-and-last-position-of-element-in-sorted-array/) From 59c381b121175995567617f38cd954e347373a34 Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sun, 10 Aug 2025 20:48:28 -0400 Subject: [PATCH 13/15] Remove "something like". --- src/num_methods/binary_search.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index 285ca32f3..aebff1cf4 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -210,7 +210,7 @@ A pretty well known problem using this method is called "Meteors", and is listed For a single country, we could binary search for the answer from $1$ to $K$. The check for a given time $t$ would involve summing up the meteors from the first $t$ showers for that country. A naive check takes $O(t)$ time, leading to an overall complexity of $O(K \log K)$ for one country, and $O(N \cdot K \log K)$ for all, which is too slow. -With parallel binary search, we search for the answer for all $N$ countries at once. In each of the $O(\log K)$ steps, we have a set of check values $t_i$ for the countries. We can process these $t_i$ in increasing order. To perform the check for time $t$, we can use a data structure like a Fenwick tree or a segment tree to maintain the meteor counts for all countries. When moving from checking time $t_i$ to $t_{i+1}$, we only need to add the effects of showers from $t_i+1$ to $t_{i+1}$ to our data structure. This "update" approach is much faster than recomputing from scratch. The total complexity becomes something like $O((N+K)\log N \log K)$, a significant improvement. +With parallel binary search, we search for the answer for all $N$ countries at once. In each of the $O(\log K)$ steps, we have a set of check values $t_i$ for the countries. We can process these $t_i$ in increasing order. To perform the check for time $t$, we can use a data structure like a Fenwick tree or a segment tree to maintain the meteor counts for all countries. When moving from checking time $t_i$ to $t_{i+1}$, we only need to add the effects of showers from $t_i+1$ to $t_{i+1}$ to our data structure. This "update" approach is much faster than recomputing from scratch. The total complexity becomes $O((N+K)\log N \log K)$. ## Practice Problems From 3eb485192e86de743b319578314e30fd6fca2c05 Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sun, 10 Aug 2025 20:56:14 -0400 Subject: [PATCH 14/15] Fix math mode in one equation. --- src/num_methods/binary_search.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index aebff1cf4..695d0679c 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -142,7 +142,7 @@ This paradigm is widely used in tasks around trees, such as finding lowest commo When we are faced with multiple queries that can each be solved with a binary search, it is sometimes too slow to solve them one by one. Parallel Binary Search is a technique that allows us to solve all of these queries simultaneously, often leading to a significant performance improvement. The main idea is to perform the binary search for all queries at the same time (in parallel), step by step. This is particularly effective when the check function for the binary search is costly and can be optimized by processing queries in batches. -Consider a scenario where we have $Q$ queries, and for each query $q$, we need to find the smallest value $x$ that satisfies a certain condition $P(q, x)$. If $P(q, x)$ is monotonic on $x$, we can use binary search for each query. This would result in a total complexity of $O(Q \cdot \log(\t{range}) \cdot T_{check})$, where $T_{check}$ is the time to evaluate $P(q, x)$. +Consider a scenario where we have $Q$ queries, and for each query $q$, we need to find the smallest value $x$ that satisfies a certain condition $P(q, x)$. If $P(q, x)$ is monotonic on $x$, we can use binary search for each query. This would result in a total complexity of $O(Q \cdot \log(range) \cdot T_{check})$, where $T_{check}$ is the time to evaluate $P(q, x)$. Parallel binary search optimizes this by changing the order of operations. Instead of processing each query independently, we process all queries simultaneously, step by step. In each step of the binary search, we compute the middle points $m_i$ for all queries and group the queries by their middle point. This is particularly powerful if the check function $P(q, x)$ has a structure that allows for efficient batching or updates. From 8dfc8cb174db38fc6e069be0115651f6a4c6535f Mon Sep 17 00:00:00 2001 From: Bartosz Kostka Date: Sun, 10 Aug 2025 21:12:50 -0400 Subject: [PATCH 15/15] Fix the table. --- src/num_methods/binary_search.md | 1 + 1 file changed, 1 insertion(+) diff --git a/src/num_methods/binary_search.md b/src/num_methods/binary_search.md index 695d0679c..0ccb53478 100644 --- a/src/num_methods/binary_search.md +++ b/src/num_methods/binary_search.md @@ -158,6 +158,7 @@ Imagine that we want to answer $Z$ queries about the index of the largest value Specifically, let us consider the following array $A = [1,3,5,7,9,9,13,15]$ with queries: $X = [8,11,4,5]$. We can use binary search for each query sequentially. + | Query | \( X_1 = 8 \) | \( X_2 = 11 \) | \( X_3 = 4 \) | \( X_4 = 5 \) | |--------|:----------------------------------------:|:-----------------------------------------:|:------------------------------------------:|:------------------------------------------:| | **Step 1** | Answer in \([0,8)\)
Check \( A_4 \)
\( X_1 < A_4 = 9 \) | Answer in \([0,8)\)
Check \( A_4 \)
\( X_2 \geq A_4 = 9 \) | Answer in \([0,8)\)
Check \( A_4 \)
\( X_3 < A_4 = 9 \) | Answer in \([0,8)\)
Check \( A_4 \)
\( X_4 < A_4 = 9 \) |