Numpy concatenate + merge 1D arrays

Question

I need to concatenate arrays but also merge the end of A with the start of B if they are overlapping.

[1, 2, 4] + [2, 4, 5] -> [1, 2, 4, 5]
[1, 2, 4] + [2, 5, 4] -> [1, 2, 4, 2, 5, 4]
[1, 2, 4] + [1, 2, 4, 5] -> [1, 2, 4, 5]

Note: Order of elements must be preserved, [4, 5] is not the same as [5, 4].

Note 2: The question can be understood like this too: We need the shortest possible extension of A such that the output ends with B.

Of course I can iterate over the second array and compare element-by element, but I am looking for a nice Numpy solution.

Are your elements properly comparable? E.g., are they always integers? — 9769953
– 9769953, Commented Aug 30, 2019 at 7:21
Essentially, you want to merge the "overlapping" parts: the end of the first array and the start of the second array. Is that correct? — 9769953
– 9769953, Commented Aug 30, 2019 at 7:24
Seems i misunderstood, so I removed my answer. I'll see if i can come up with a good solution — Artog
– Artog, Commented Aug 30, 2019 at 7:27
Should [1, 2, 4, 2] and [2, 4, 5] combined return [1, 2, 4, 2, 4, 5]? There's still a double sequence in the final result, but the arrays only match on a single 2 at their end and start. — 9769953
– 9769953, Commented Aug 30, 2019 at 11:56

Artog · Accepted Answer · 2019-08-30 07:47:46Z

2

Originally misunderstood the problem. The problem, is from my understanding:

Two item suffix of A matches 2 item prefix of B:
[1, 2, 4] +
   [2, 4, 5] =>
[1, 2, 4, 5]

No suffix of A matches a prefix of B:
[1, 2, 4] + 
         [2, 5, 4] -> 
[1, 2, 4, 2, 5, 4]

Then we can use this terribly inefficient function:

def merge(A,B):
    i = 0
    m = 0
    # Find largest suffix of A that matches the prefix of B with the same length
    while i <= len(A):
        if A[-i:] == B[:i] and i > m:
            m = i
        i += 1
    return A + B[m:]

edited Aug 30, 2019 at 7:47

answered Aug 30, 2019 at 7:21

Artog

1,1321 gold badge13 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Artog Over a year ago

Its not very nice, and its like O(n^2). Slicing the array makes a copy every time..

Vidak Over a year ago

I also couldn't come up with anything faster than this, and also your function is more elegant than what I have! I still hope there is some magical numpy function to use :D One note, I think i should be initialised with 1 instead of 0 :)

Vidak Over a year ago

Another note, it's enough to do while i <= min(len(A), len(B))

Vidak Over a year ago

I added my own iterative solution, should be o(n) :)

9769953 · Accepted Answer · 2019-08-30 12:23:47Z

1

Below is a solution using NumPy. It's not ideal, since it requires a (possibly unneeded) sort, and an iteration. Both the sorting and iteration should be over a relatively small array (or even a single element).

import numpy as np

def merge(left, right):
    """Concatenating two arrays, merging the overlapping end and start of
    the left and right array"""

    # We can limit the search to the maximum possible overlap between
    # the arrays, which is the minimum of the two lengths
    l = min(len(left), len(right))

    # Find all indices in `right` where the element matches the last element of `left`.
    # Need to sort, since the `nonzero` documentation doesn't
    # explicitly state whether the returned indices follow the order
    # as in `right`
    # As long as there are few matches, sorting will not be a showstopper
    # Need to reverse the sorted array, to start from the back of the
    # right array, work towards the front, until there is a proper match
    for i in np.sort(np.nonzero(right[:l] == left[-1])[0])[::-1]:
        # Check if the subarrays are equal
        if np.all(left[-i-1:] == right[:i+1]):
            return np.concatenate([left, right[i+1:]])
    # No match
    return np.concatenate([left, right])


a = np.array([1, 2, 4])
b = np.array([2, 4, 5])
c = np.array([2, 5, 4])
d = np.array([1, 2, 4, 5])
e = np.array([1, 2, 4, 2])
f = np.array([2, 4, 2, 5])

print(merge(a, b))
print(merge(a, c))
print(merge(a, d))
print(merge(e, b))
print(merge(e, f))

which yields

[1 2 4 5]
[1 2 4 2 5 4]
[1 2 4 5]
[1 2 4 2 4 5]
[1 2 4 2 5]

answered Aug 30, 2019 at 12:23

9769953

12.5k3 gold badges31 silver badges46 bronze badges

2 Comments

Vidak Over a year ago

Nice approach, but it does seem a bit too much :D I was kind of hoping we could come up with some fancy indexing/convolution/correlation vectorized solution. I have added an answer with my own solution without numpy (should be o(n))

9769953 Over a year ago

@Vidak I would also compare the various solutions given in the answers, for the overall size of your lists, then see which one is the fastest. NumPy may win out with large arrays with large overlaps, but perhaps that isn't relevant for your case.

Vidak · Accepted Answer · 2019-09-03 06:33:12Z

1

I have an O(n) solution, albeit without Numpy:

def merge(a, b):
    n_a = len(a)
    n = min(n_a, len(b))
    m = 0
    for i in range(1, n + 1):
        if b[n - i] == a[n_a - 1 - m]:
            m += 1
        else:
            m = 0
    return a + b[m:]

answered Sep 3, 2019 at 6:33

Vidak

1,11315 silver badges30 bronze badges

Comments

JakobVinkas · Accepted Answer · 2019-08-30 07:28:53Z

0

You could do it like this.

def concatenate(a,b):
    ret = a.copy()
    for element in b:
        if not element in ret:
            ret.append(element)
    return ret

This keeps the order in a + b formation.

edited Aug 30, 2019 at 7:28

answered Aug 30, 2019 at 7:21

JakobVinkas

1,07311 silver badges26 bronze badges

5 Comments

Vidak Over a year ago

this fails for example 2 :)

JakobVinkas Over a year ago

Your qustions seems a bit wierd, how long must the matching sub-list be to not be copied to the final list?

Vidak Over a year ago

all sub-arrays of length 1 until len(b) need to be checked, e.g. [1,2,4] + [4] = [1,2,4], and [1,2,4]+[2,4]=[1,2,4], and finally [1,2,4]+[1,2,4]=[1,2,4]

JakobVinkas Over a year ago

But then you second example does not follow that rule, just as [1,2,4] + [4] = [1,2,4], [1, 2, 4] + [2] = [1,2,4] and then [1, 2, 4] + [2, 5, 4] = [1,2,4,5]. Or am I missing something?

Vidak Over a year ago

the order of elements is important, so in your example the correct output would be [1,2,4,2,5,4]

Collectives™ on Stack Overflow

Numpy concatenate + merge 1D arrays

4 Answers 4

4 Comments

2 Comments

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

2 Comments

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related