22

I have two object arrays:

var a = [
  {id: 4, name: 'Greg'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
]

var b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
]

I want to do an inner join for these two arrays a and b, and create a third array like this (if the position property is not present, then it becomes null):

var result = [{
  {id: 4, name: 'Greg', position: null},
  {id: 1, name: 'David', position: null},
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
}]

My approach:

function innerJoinAB(a,b) {
    a.forEach(function(obj, index) {
        // Search through objects in first loop
        b.forEach(function(obj2,i2){
        // Find objects in 2nd loop
        // if obj1 is present in obj2 then push to result.
        });
    });
}

But the time complexity is O(N^2). How can I do it in O(N)? My friend told me that we can use reducers and Object.assign.

I'm not able to figure this out. Please help.

7
  • 2
    You have two arrays of objects. Seems like you need to copy all the values of one array to a new one, then merge the second (and subsequent) arrays into it. Array.prototype.reduce might be a good start. What is the primary key, id? Since you're using an array to hold objects, likely you also want to create an index of ID to array index so you can easily find IDs without having to iterate over the array each time. Commented Feb 24, 2017 at 1:08
  • 7
    PS inner join might not be the right term as that from my understanding only gives a result set where there was a match in both sets (so your example would only give lines with IDs 2 & 3). This is more of a typical merge. Commented Feb 24, 2017 at 1:10
  • @NicholasSmith this is JS, not JSON Commented Feb 24, 2017 at 1:30
  • Possible duplicate of Merge two array of objects based on a key Commented May 20, 2019 at 20:25
  • 4
    What you want, based on your output example, is a full outer join, not an inner join. Commented Mar 17, 2020 at 12:17

8 Answers 8

14

I don't know how reduce would help here, but you could use a Map to accomplish the same task in O(n):

const a = [
  {id: 4, name: 'Greg'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'}];

const b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'}];

var m = new Map();
// Insert all entries keyed by ID into the Map, filling in placeholder
// 'position' since the Array 'a' lacks 'position' entirely:
a.forEach(function(x) { x.position = null; m.set(x.id, x); });

// For values in 'b', insert them if missing, otherwise, update existing values:
b.forEach(function(x) {
    var existing = m.get(x.id);
    if (existing === undefined)
        m.set(x.id, x);
    else
        Object.assign(existing, x);
});

// Extract resulting combined objects from the Map as an Array
var result = Array.from(m.values());

console.log(JSON.stringify(result));
.as-console-wrapper { max-height: 100% !important; top: 0; }

Because Map accesses and updates are O(1) (on average - because of hash collisions and rehashing, it can be longer), this makes O(n+m) (where n and m are the lengths of a and b respectively; the naive solution you gave would be O(n*m) using the same meaning for n and m).

Sign up to request clarification or add additional context in comments.

4 Comments

One issue with this: If position is set in array a, it will be lost if it is not duplicated in b.
@Gerrit0: Yar, I noted that assumption (that a always lacks a position) in the comments. You could easily adjust it to make the set of x.position when working on a conditional, but the OP's provided inputs indicated that a never had position and b always did. Similarly, this assumes that id is unique on its own (no need for name as part of the key, with the assumption that name will match if id does).
This looks like a left join to me, not an inner join.
@phil: Agreed. The OP asked for inner join, but their desired output was a left join. I provided an answer that would produce their desired output, as it seems clear they were using the wrong terminology.
11

One of the ways how to solve it.

const a = [
  {id: 4, name: 'Greg'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
];

const b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
];

const r = a.filter(({ id: idv }) => b.every(({ id: idc }) => idv !== idc));
const newArr = b.concat(r).map((v) => v.position ? v : { ...v, position: null });

console.log(JSON.stringify(newArr));
.as-console-wrapper { max-height: 100% !important; top: 0; }

5 Comments

Note that the time complexity of this is still O(N^2) (technically O(N*M) where N and M are the lengths of your two arrays)
Also, it uses name as primary key which may not be intended as names may be non-unique
@FelixDombek Please provide me more info.
In the line which filters, I think you should be comparing ids, not names
@FelixDombek Agreed, but in this particular case, it didn't make any difference (: Anyways, I've changed it.
4

If you drop the null criteria (many in the community are saying using null is bad) then there's a very simple solution

let a = [1, 2, 3];
let b = [2, 3, 4];

a.filter(x => b.includes(x)) 

// [2, 3]

2 Comments

what about 1 and 4?
Like a full outer join? it should be [].concat( a.filter(x => !b.includes(x)), b.filter(x => !a.includes(x)) )
3

To reduce the time complexity, it is inevitable to use more memory.

var a = [
  {id: 4, name: 'Greg'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
]

var b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
]     

var s = new Set();
var result = [];
b.forEach(function(e) {
    result.push(Object.assign({}, e));
    s.add(e.id);
});
a.forEach(function(e) {
    if (!s.has(e.id)) {
      var temp = Object.assign({}, e);
      temp.position = null;
      result.push(temp);
    }
});
console.log(result);

update

As @Blindman67 mentioned:"You do not reduce the problems complexity by moving a search into the native code." I've consulted the ECMAScript® 2016 Language Specification about the internal procedure of Set.prototype.has() and Map.prototype.get(), unfortunately, it seemed that they both iterate through all the elements they have.

Set.prototype.has ( value )#

The following steps are taken:

    Let S be the this value.
    If Type(S) is not Object, throw a TypeError exception.
    If S does not have a [[SetData]] internal slot, throw a TypeError exception.
    Let entries be the List that is the value of S's [[SetData]] internal slot.
    Repeat for each e that is an element of entries,
        If e is not empty and SameValueZero(e, value) is true, return true.
    Return false. 

http://www.ecma-international.org/ecma-262/7.0/#sec-set.prototype.has

Map.prototype.get ( key )#

The following steps are taken:

    Let M be the this value.
    If Type(M) is not Object, throw a TypeError exception.
    If M does not have a [[MapData]] internal slot, throw a TypeError exception.
    Let entries be the List that is the value of M's [[MapData]] internal slot.
    Repeat for each Record {[[Key]], [[Value]]} p that is an element of entries,
        If p.[[Key]] is not empty and SameValueZero(p.[[Key]], key) is true, return p.[[Value]].
    Return undefined. 

http://www.ecma-international.org/ecma-262/7.0/#sec-map.prototype.get

Perhaps, we can use the Object which can directly access its properties by their names, like the hash table or associative array, for example:

var a = [
  {id: 4, name: 'Greg'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
]

var b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
]     

var s = {};
var result = [];
b.forEach(function(e) {
    result.push(Object.assign({}, e));
    s[e.id] = true;
});
a.forEach(function(e) {
    if (!s[e.id]) {
      var temp = Object.assign({}, e);
      temp.position = null;
      result.push(temp);
    }
});
console.log(result);

9 Comments

You misunderstand the spec. Set is describing basic logic in the docs, not an actual implementation strategy. get and has are required to be sublinear, so at worst they're O(log n) and the suggested implementation is a hash table, O(n). Read earlier in the overall Set docs:
"Set objects must be implemented using either hash tables or other mechanisms that, on average, provide access times that are sublinear on the number of elements in the collection. The data structures used in this Set objects specification is only intended to describe the required observable semantics of Set objects. It is not intended to be a viable implementation model."
One question @Y.C Why did you use result.push(Object.assign({}, e)); why can't we just push result.push(e); it will give the same result?
@TechnoCorner Object.assign() offers a shallow clone, it copies the values of all enumerable own properties from one or more source objects to a target object. So if you changed the objects in the result array, it won't affect the original one, because they're different objects with same content. But if you don't mind making changes to the the original objects, in that case, obviously result.push(e) is more efficient.
@ShadowRanger So, you can't be sure which browser is using which implementation or mechanism, but you know that objects in JS can be used as hash table or associative array.
|
1

You do not reduce the problems complexity by moving a search into the native code. The search must still be done.

Also the addition of the need to null a undefined property is one of the many reasons I dislike using null.

So without the null the solution would look like

var a = [
  {id: 4, name: 'Greg',position: '7'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
]

var b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
]


function join (indexName, ...arrays) {
    const map = new Map();
    arrays.forEach((array) => {
        array.forEach((item) => {
            map.set(
                item[indexName],
                Object.assign(item, map.get(item[indexName]))
            );
        })
    })
    return [...map.values()];
}

And is called with

const joinedArray = join("id", a, b);

To join with a default is a little more complex but should prove handy as it can join any number of arrays and automatically set missing properties to a provided default.

Testing for the defaults is done after the join to save a little time.

function join (indexName, defaults, ...arrays) {
    const map = new Map();
    arrays.forEach((array) => {
        array.forEach((item) => {
            map.set(
                item[indexName], 
                Object.assign( 
                    item, 
                    map.get(item[indexName])
                )
            );
        })
    })
    return [...map.values()].map(item => Object.assign({}, defaults, item));

}

To use

const joinedArray = join("id", {position : null}, a, b);

You could add...

    arrays.shift().forEach((item) => {  // first array is a special case.
        map.set(item[indexName], item);
    });

...at the start of the function to save a little time, but I feel it's more elegant without the extra code.

Comments

0

Here is an attempt at a more generic version of a join which accepts N objects and merges them based on a primary id key.

If performance is critical, you are better off using a specific version like the one provided by ShadowRanger which doesn't need to dynamically build a list of all property keys.

This implementation assumes that any missing properties should be set to null and that every object in each input array has the same properties (though properties can differ between arrays)

var a = [
    {id: 4, name: 'Greg'},
    {id: 1, name: 'David'},
    {id: 2, name: 'John'},
    {id: 3, name: 'Matt'},
];
var b = [
    {id: 5, name: 'Mathew', position: '1'},
    {id: 600, name: 'Gracia', position: '2'},
    {id: 2, name: 'John', position: '2'},
    {id: 3, name: 'Matt', position: '2'},
];

console.log(genericJoin(a, b));

function genericJoin(...input) {
    //Get all possible keys
    let template = new Set();
    input.forEach(arr => {
        if (arr.length) {
            Object.keys(arr[0]).forEach(key => {
                template.add(key);
            });
        }
    });

    // Merge arrays
    input = input.reduce((a, b) => a.concat(b));

    // Merge items with duplicate ids
    let result = new Map();
    input.forEach(item => {
        result.set(item.id, Object.assign((result.get(item.id) || {}), item));
    });

    // Convert the map back to an array of objects
    // and set any missing properties to null
    return Array.from(result.values(), item => {
        template.forEach(key => {
            item[key] = item[key] || null;
        });
        return item;
    });
}

Comments

0

Here's a generic O(n*m) solution, where n is the number of records and m is the number of keys. This will only work for valid object keys. You can convert any value to base64 and use that if you need to.

const join = ( keys, ...lists ) =>
    lists.reduce(
        ( res, list ) => {
            list.forEach( ( record ) => {
                let hasNode = keys.reduce(
                    ( idx, key ) => idx && idx[ record[ key ] ],
                    res[ 0 ].tree
                )
                if( hasNode ) {
                    const i = hasNode.i
                    Object.assign( res[ i ].value, record )
                    res[ i ].found++
                } else {
                    let node = keys.reduce( ( idx, key ) => {
                        if( idx[ record[ key ] ] )
                            return idx[ record[ key ] ]
                        else
                            idx[ record[ key ] ] = {}
                        return idx[ record[ key ] ]
                    }, res[ 0 ].tree )
                    node.i = res[ 0 ].i++
                    res[ node.i ] = {
                        found: 1,
                        value: record
                    }
                }
            } )
            return res
        },
        [ { i: 1, tree: {} } ]
         )
         .slice( 1 )
         .filter( node => node.found === lists.length )
         .map( n => n.value )

join( [ 'id', 'name' ], a, b )

This is essentially the same as Blindman67's answer, except that it adds an index object to identify records to join. The records are stored in an array and the index stores the position of the record for the given key set and the number of lists it's been found in.

Each time the same key set is encountered, the node is found in the tree, the element at it's index is updated, and the number of times it's been found is incremented.

finally, the idx object is removed from the array with the slice, any elements that weren't found in each set are removed. This makes it an inner join, you could remove this filter and have a full outer join.

finally each element is mapped to it's value, and you have the merged array.

Comments

0

From extensive research I've done on this, no way to reduce the joining of two lists beyond O(n*m)

The classical solution that I understand most databases use, is to create an index from the smaller list and then do a scan of that index. This is essentially just pushing the O(n*m) "work" down the interpreter chain as far as possible. ie, your OS/processor API probably has a very optimized way of orchestrating list compares so you get a performance boost from them doing the job. This technically makes it O(n*m + n) but it should still be most efficient.

var a = [
  {id: 4, name: 'Greg'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
]

var b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
]

const idx = a.reduce(prev, _ => {
    prev[_.id] = _
}, {})

const result = b.reduce(prev, _ => {
  if( idx[_] !== undefined ){
    prev.push([_, idx[_.id])
}, [])

Again, as far as I understand, this is the classic solution to this probably. Would love to be wrong.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.