1

Gvien two arrays with identical entries but different orders, is there a way to find the index mapping which would rearrange one array such that it is equivalent to the other?

i.e.

x1 = np.array([[1,2],
               [3,4],
               [5,6],
               [7,8],
               [9,10]])

x2 = np.array([[3,4],
               [7,8],
               [1,2],
               [5,6],
               [9,10]])

The mapping from x1 to x2 would be np.array([1,3,0,2,4]), and the mapping from x2 to x1 would be np.array([2,0,3,1,4])

Note, in this example x1 is sorted, however I do not wish to make this assumption in general.

2
  • In addition to my review of possible ways to solve this problem in numpy, you might like to check out numpy_indexed module. It has a method called indices dedicated for solution of problems like yours. Commented Feb 25, 2022 at 5:33
  • There are also quite a lot related questions like this if you use npi.indexed as a keyword in SO search. Commented Feb 25, 2022 at 5:43

4 Answers 4

1

Assuming the first array's elements are sorted as in the example, an efficient option would be to view the two-column arrays as 1d arrays of tuples, and use np.searchsorted to find the indices where these tuples are in x2_view:

x1_view = x1.view(dtype='i,i').ravel()
x2_view = x2.view(dtype='i,i').ravel()

np.searchsorted(x1_view, x2_view)
# array([1, 3, 0, 2, 4], dtype=int64)

In the case the first array is not sorted, you can pass np.searchsorted a sorter:

x1_sorted_view = np.argsort(x1_view)
np.searchsorted(x1_view, x2_view, sorter=x1_sorted_view)

For the second case, you only need np.argsort on x2:

np.argsort(x2_view)
# array([2, 0, 3, 1, 4], dtype=int64)
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks. Perhaps I used a poor example because I don't want to assume either is sorted... I could just sort both using argsort/lexsort but I'm trying to avoid that.
Then this solution could work if you used argsort for the first array, since searchsorted requires the first array to be sorted. But there aren't many efficient alternatives to this without using broadcasting @jonnybolton16
np.searchsorted it's not working for me as expected. It returns array of 10 indices instead of 5
@jonnybolton16 Try using sorter in searchsorted
Should work if your using the views @mathfux
|
0

Using array comprehension this is what I do:

x1_in_x2 = [x2.index(e) for e in x1]

I do not know if this is what you expect. Please confirm If I understood you correctly.

1 Comment

This works but only for lists; I guess I could use the tolist() method on the two arrays, and then turn the answer back into an array. But I'll wait to see if there is a numpy solution before accepting this answer. Thanks!
0

There are several ways to manipulate index sets of x1 and x2 in order to arrive at the mappings you need. A use case could be summarised as follows:

def find_mappings(sidx1, sidx2):
    '''Input: two set of indices that sorts arrays x1 and x2
       Output: both maps x1->x2 and x2->x1'''
    x1_to_x2 = np.empty(len(x1), dtype=x1.dtype)
    x2_to_x1 = np.empty(len(x1), dtype=x1.dtype)
    x1_to_x2[sidx2] = sidx1
    x2_to_x1[sidx1] = sidx2
    return x1_to_x2, x2_to_x1

I will review the most common ways to pass two index sets sidx1 and sidx2

Way 1 using np.lexsort as sorting indices:

sidx1 = np.lexsort(x1.T[::-1])
sidx2 = np.lexsort(x2.T[::-1])

find_mappings(sidx1, sidx2)
>>> (array([1, 3, 0, 2, 4]), array([2, 0, 3, 1, 4]))

Way 2 using np.argsort of reduced dimensions as sorting indices:

x1_dimreduce = np.ravel_multi_index(x1.T, np.max(x1, axis=0)+1)
x2_dimreduce = np.ravel_multi_index(x2.T, np.max(x2, axis=0)+1)
sidx1 = np.argsort(x1_dimreduce)
sidx2 = np.argsort(x2_dimreduce)

find_mappings(sidx1, sidx2)
>>> (array([1, 3, 0, 2, 4]), array([2, 0, 3, 1, 4]))

Another variation of way 2 (reducing dimensions by hands works faster and could be optimised further in numba)

x1_dimreduce = x1[:,0] * (np.max(x1[:,1])+1) + x1[:,1]
x2_dimreduce = x2[:,0] * (np.max(x2[:,1])+1) + x2[:,1]

Way 3 using np.searchsorted in order to find x1_to_x2 and then find inverse mapping x2_to_x1 automatically (credits to @yatu as well):

x1_view = x1.view(dtype=np.dtype([('', x1.dtype), ('', x1.dtype)])).ravel()
x2_view = x2.view(dtype=np.dtype([('', x2.dtype), ('', x2.dtype)])).ravel()

argidx = np.argsort(x1_view)
sidx = np.searchsorted(x1_view, x2_view, sorter=argidx)

sidx1 = argidx[sidx]
sidx2 = np.arange(len(sidx))

find_mappings(sidx1, sidx2)
>>> (array([1, 3, 0, 2, 4]), array([2, 0, 3, 1, 4]))

Another variation of way 3 (simplification of way 3 using sidx and argidx arguments instead of sidx1 and sidx2)

def find_mappings2(sidx, argidx):
    '''reduces some excessive commands met in `find_mapping`'''
    x1_to_x2 = argidx[sidx]
    x2_to_x1 = np.empty(len(x1), dtype=x1.dtype)
    x2_to_x1[x1_to_x2] = np.arange(len(sidx))
    return x1_to_x2, x2_to_x1

x1_view = x1.view(dtype=np.dtype([('', x1.dtype), ('', x1.dtype)])).ravel()
x2_view = x2.view(dtype=np.dtype([('', x2.dtype), ('', x2.dtype)])).ravel()

argidx = np.argsort(x1_view)
sidx = np.searchsorted(x1_view, x2_view, sorter=argidx)

np.array_equal(find_mapping(sidx1, sidx2), find_mapping2(sidx, argidx))
>>> True

Note that every of these ways could be minimized in case any of arrays x1 and x2 preserves order.

Use case for better understanding

x1_to_x2, x2_to_x1 = find_mapping(sidx1, sidx2)
>>> np.array_equal(x1[x1_to_x2], x2)
True
>>> np.array_equal(x2[x2_to_x1], x1)
True

Runtime

Variant of Way 2 appears to be a winner. Moreover, it's open for further optimizations:

import benchit
%matplotlib inline
benchit.setparams(rep=5)

sizes = [3, 10, 30, 100, 300, 900, 3000, 9000, 30000, 90000, 300000, 900000]

def _generate_x1_x2(N):
    '''return two randomly ordered arrays x1 and x2 of length N'''
    arr = np.transpose(np.unravel_index(np.arange(N), (int(N**0.5)+1, int(N**0.5)+1)))
    rng = np.random.default_rng()
    return arr[rng.permutation(N)], arr[rng.permutation(N)]

def _find_mappings(sidx1, sidx2):
    x1_to_x2, x2_to_x1 = np.empty(len(sidx1), dtype=sidx1.dtype), np.empty(len(sidx2), dtype=sidx2.dtype)
    x1_to_x2[sidx2] = sidx1
    x2_to_x1[sidx1] = sidx2
    return x1_to_x2, x2_to_x1

def _find_mappings2(sidx, argidx):
    x1_to_x2 = argidx[sidx]
    x2_to_x1 = np.empty(len(x1_to_x2), dtype=x1_to_x2.dtype)
    x2_to_x1[x1_to_x2] = np.arange(len(sidx))
    return x1_to_x2, x2_to_x1

def lexsort_map(x1, x2):
    sidx1 = np.lexsort(x1.T[::-1])
    sidx2 = np.lexsort(x2.T[::-1])
    return _find_mappings(sidx1, sidx2)

def argsort_map(x1, x2):
    x1_dimreduce = np.ravel_multi_index(x1.T, np.max(x1, axis=0)+1)
    x2_dimreduce = np.ravel_multi_index(x2.T, np.max(x2, axis=0)+1)
    sidx1 = np.argsort(x1_dimreduce)
    sidx2 = np.argsort(x2_dimreduce)
    return _find_mappings(sidx1, sidx2)

def argsort_map_v2(x1, x2):
    x1_dimreduce = x1[:,0] * (np.max(x1[:,1])+1) + x1[:,1]
    x2_dimreduce = x2[:,0] * (np.max(x2[:,1])+1) + x2[:,1]
    sidx1 = np.argsort(x1_dimreduce)
    sidx2 = np.argsort(x2_dimreduce)
    return _find_mappings(sidx1, sidx2)

def searchsort_map(x1, x2):
    x1_view = x1.view(dtype=np.dtype([('', x1.dtype), ('', x1.dtype)])).ravel()
    x2_view = x2.view(dtype=np.dtype([('', x2.dtype), ('', x2.dtype)])).ravel()
    argidx = np.argsort(x1_view)
    sidx = np.searchsorted(x1_view, x2_view, sorter=argidx)
    sidx1 = argidx[sidx]
    sidx2 = np.arange(len(sidx))
    return _find_mappings(sidx1, sidx2)

def searchsort_map_v2(x1, x2):
    x1_view = x1.view(dtype=np.dtype([('', x1.dtype), ('', x1.dtype)])).ravel()
    x2_view = x2.view(dtype=np.dtype([('', x2.dtype), ('', x2.dtype)])).ravel()
    argidx = np.argsort(x1_view)
    sidx = np.searchsorted(x1_view, x2_view, sorter=argidx)
    return _find_mappings2(sidx, argidx)

fns = [lexsort_map, argsort_map, argsort_map_v2, searchsort_map, searchsort_map_v2]
in_ = {s: generate_x1_x2(s) for s in sizes}
t = benchit.timings(fns, in_, multivar=True, input_name='Length of x1 and x2')
t.plot(logx=True, figsize=(12, 6), fontsize=14)

enter image description here

Update on 02/25/2022

There is a nice module numpy_indexed by @EelcoHoogendoorn (pip install numpy-indexed) written on top of numpy that extends to more advanced actions of indexing, grouping and set manipulations. You might like also to check out numpy_indexed.indices, Line 115 method:

numpy_indices as npi
x1_to_x2 = npi.indices(x1, x2)
x2_to_x1 = np.empty_like(x1_to_x2)
x2_to_x1[x1_to_x2] = np.arange(len(x2_to_x1))

2 Comments

I think you've misundersood the question; I want to return np.array([1,3,0,2,4]) rather than use it to begin with
@jonnybolton16 Yes, I see. I need to fix it
-1

I don't know much about numpy, but here's one janky way to do it with just Python lists:

x1 = [[1, 2],
      [3, 4],
      [5, 6],
      [7, 8],
      [9, 10]]

x2 = [[3, 4],
      [7, 8],
      [1, 2],
      [5, 6],
      [9, 10]]

map_1_to_2 = []
map_2_to_1 = []

for element_1, element_2 in zip(x1, x2):
    map_1_to_2.append(x2.index(element_1))
    map_2_to_1.append(x1.index(element_2))

print(map_1_to_2)
print(map_2_to_1)

1 Comment

Indeed this works (for lists); but @RubyLearning answer does it much more succinctly. Thanks though :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.