7

I need to concatenate arrays but also merge the end of A with the start of B if they are overlapping.

[1, 2, 4] + [2, 4, 5] -> [1, 2, 4, 5]
[1, 2, 4] + [2, 5, 4] -> [1, 2, 4, 2, 5, 4]
[1, 2, 4] + [1, 2, 4, 5] -> [1, 2, 4, 5]

Note: Order of elements must be preserved, [4, 5] is not the same as [5, 4].

Note 2: The question can be understood like this too: We need the shortest possible extension of A such that the output ends with B.

Of course I can iterate over the second array and compare element-by element, but I am looking for a nice Numpy solution.

12
  • 1
    @Artog yes, order is important :) I updated the question Commented Aug 30, 2019 at 7:20
  • 1
    Are your elements properly comparable? E.g., are they always integers? Commented Aug 30, 2019 at 7:21
  • 2
    Essentially, you want to merge the "overlapping" parts: the end of the first array and the start of the second array. Is that correct? Commented Aug 30, 2019 at 7:24
  • 1
    Seems i misunderstood, so I removed my answer. I'll see if i can come up with a good solution Commented Aug 30, 2019 at 7:27
  • 1
    Should [1, 2, 4, 2] and [2, 4, 5] combined return [1, 2, 4, 2, 4, 5]? There's still a double sequence in the final result, but the arrays only match on a single 2 at their end and start. Commented Aug 30, 2019 at 11:56

4 Answers 4

2

Originally misunderstood the problem. The problem, is from my understanding:

Two item suffix of A matches 2 item prefix of B:
[1, 2, 4] +
   [2, 4, 5] =>
[1, 2, 4, 5]

No suffix of A matches a prefix of B:
[1, 2, 4] + 
         [2, 5, 4] -> 
[1, 2, 4, 2, 5, 4]

Then we can use this terribly inefficient function:

def merge(A,B):
    i = 0
    m = 0
    # Find largest suffix of A that matches the prefix of B with the same length
    while i <= len(A):
        if A[-i:] == B[:i] and i > m:
            m = i
        i += 1
    return A + B[m:]
Sign up to request clarification or add additional context in comments.

4 Comments

Its not very nice, and its like O(n^2). Slicing the array makes a copy every time..
I also couldn't come up with anything faster than this, and also your function is more elegant than what I have! I still hope there is some magical numpy function to use :D One note, I think i should be initialised with 1 instead of 0 :)
Another note, it's enough to do while i <= min(len(A), len(B))
I added my own iterative solution, should be o(n) :)
1

Below is a solution using NumPy. It's not ideal, since it requires a (possibly unneeded) sort, and an iteration. Both the sorting and iteration should be over a relatively small array (or even a single element).

import numpy as np

def merge(left, right):
    """Concatenating two arrays, merging the overlapping end and start of
    the left and right array"""

    # We can limit the search to the maximum possible overlap between
    # the arrays, which is the minimum of the two lengths
    l = min(len(left), len(right))

    # Find all indices in `right` where the element matches the last element of `left`.
    # Need to sort, since the `nonzero` documentation doesn't
    # explicitly state whether the returned indices follow the order
    # as in `right`
    # As long as there are few matches, sorting will not be a showstopper
    # Need to reverse the sorted array, to start from the back of the
    # right array, work towards the front, until there is a proper match
    for i in np.sort(np.nonzero(right[:l] == left[-1])[0])[::-1]:
        # Check if the subarrays are equal
        if np.all(left[-i-1:] == right[:i+1]):
            return np.concatenate([left, right[i+1:]])
    # No match
    return np.concatenate([left, right])


a = np.array([1, 2, 4])
b = np.array([2, 4, 5])
c = np.array([2, 5, 4])
d = np.array([1, 2, 4, 5])
e = np.array([1, 2, 4, 2])
f = np.array([2, 4, 2, 5])

print(merge(a, b))
print(merge(a, c))
print(merge(a, d))
print(merge(e, b))
print(merge(e, f))

which yields

[1 2 4 5]
[1 2 4 2 5 4]
[1 2 4 5]
[1 2 4 2 4 5]
[1 2 4 2 5]

2 Comments

Nice approach, but it does seem a bit too much :D I was kind of hoping we could come up with some fancy indexing/convolution/correlation vectorized solution. I have added an answer with my own solution without numpy (should be o(n))
@Vidak I would also compare the various solutions given in the answers, for the overall size of your lists, then see which one is the fastest. NumPy may win out with large arrays with large overlaps, but perhaps that isn't relevant for your case.
1

I have an O(n) solution, albeit without Numpy:

def merge(a, b):
    n_a = len(a)
    n = min(n_a, len(b))
    m = 0
    for i in range(1, n + 1):
        if b[n - i] == a[n_a - 1 - m]:
            m += 1
        else:
            m = 0
    return a + b[m:]

Comments

0

You could do it like this.

def concatenate(a,b):
    ret = a.copy()
    for element in b:
        if not element in ret:
            ret.append(element)
    return ret

This keeps the order in a + b formation.

5 Comments

this fails for example 2 :)
Your qustions seems a bit wierd, how long must the matching sub-list be to not be copied to the final list?
all sub-arrays of length 1 until len(b) need to be checked, e.g. [1,2,4] + [4] = [1,2,4], and [1,2,4]+[2,4]=[1,2,4], and finally [1,2,4]+[1,2,4]=[1,2,4]
But then you second example does not follow that rule, just as [1,2,4] + [4] = [1,2,4], [1, 2, 4] + [2] = [1,2,4] and then [1, 2, 4] + [2, 5, 4] = [1,2,4,5]. Or am I missing something?
the order of elements is important, so in your example the correct output would be [1,2,4,2,5,4]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.