0

I may be going about this all wrong, so any corrections are welcome.

I have created a web scraper in Node that scrapes a list of jobs off our org's website, stores them as an array of objects, then compares that array against a previous scrape stored as a stringified array of objects in a JSON file. I am using JSON.parse() when fetching the stored array so the comparison is objects against objects. My end goal is to email which jobs have been added and removed in real-time, so I need to find the differences between the arrays.

I'm stuck on how to find the differences. My array structure is below.

I have been reading that it is impossible to accurately compare arrays of objects without a deep comparison, but I'm not sure what else to do (my knowledge is weak here). Would something like this be the right path? Compare array of objects to array of ids

[
    {
        job_id: "xxxxx",
        title: "Job 1",
        description: "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
        department: "Department: Lorem ipsum dolor sit amet",
        location: "Location: Lorem ipsum dolor sit amet"
    },
    {
        job_id: "xxxxx",
        title: "Job 2",
        description: "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
        department: "Department: Lorem ipsum dolor sit amet",
        location: "Location: Lorem ipsum dolor sit amet"
    }
]
3
  • 1
    how should look the result, you expect? Commented Jan 22, 2017 at 20:55
  • I guess a new array of all new and removed objects (jobs) with a new property that identifies each object as "new" or "removed". I can take it from there. Commented Jan 22, 2017 at 21:01
  • it's a bit week for more than one element in an array. Commented Jan 22, 2017 at 21:06

3 Answers 3

0

lodash#some could maybe help you.

Checks if predicate returns truthy for any element of collection. Iteration is stopped once predicate returns truthy.

Suppose to have sample-t1.js:

var jobsT1 = [
    {
        job_id: "1",
        title: "Job 1",
        description: "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
        department: "Department: Lorem ipsum dolor sit amet",
        location: "Location: Lorem ipsum dolor sit amet"
    },
    {
        job_id: "2",
        title: "Job 2",
        description: "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
        department: "Department: Lorem ipsum dolor sit amet",
        location: "Location: Lorem ipsum dolor sit amet"
    }
];

module.exports = {jobsT1};

and sample-t2.js:

var jobsT2 = [
    {
        job_id: "1",
        title: "Job 1",
        description: "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
        department: "Department: Lorem ipsum dolor sit amet",
        location: "Location: Lorem ipsum dolor sit amet"
    },
    {
        job_id: "3",
        title: "Job 3",
        description: "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
        department: "Department: Lorem ipsum dolor sit amet",
        location: "Location: Lorem ipsum dolor sit amet"
    }
];

module.exports = {jobsT2};

Using lodash.some method twice you can easily undestand the new and the removed jobs just matching their id.

const _ = require('lodash');

var {jobsT1} = require('./sample-t1');
var {jobsT2} = require('./sample-t2');

var newJobs = [];
var removedJobs = [];

_.forEach(jobsT2, function (n, key) {
  if(!_.some(jobsT1, {'job_id': n.job_id})) {
    newJobs.push(n);
  }
});

_.forEach(jobsT1, function (n, key) {
  if(!_.some(jobsT2, {'job_id': n.job_id})) {
    removedJobs.push(n);
  }
});

console.log('New jobs:', JSON.stringify(newJobs, undefined, 2));
console.log('========');
console.log('Removed jobs:', JSON.stringify(removedJobs, undefined, 2));

With this result:

New jobs: [
  {
    "job_id": "3",
    "title": "Job 3",
    "description": "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
    "department": "Department: Lorem ipsum dolor sit amet",
    "location": "Location: Lorem ipsum dolor sit amet"
  }
]
========
Removed jobs: [
  {
    "job_id": "2",
    "title": "Job 2",
    "description": "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
    "department": "Department: Lorem ipsum dolor sit amet",
    "location": "Location: Lorem ipsum dolor sit amet"
  }
]
Sign up to request clarification or add additional context in comments.

1 Comment

This is exactly what I was looking for. Thank you so much!
0

you can try something like this http://underscorejs.org/#difference

but i`m not sure that this function works with array of objects. If its not,you can filter your json objects using Array.prototype.filter and then compare ids

1 Comment

I tried this, and if you change .filter() to .map() to find the IDs, then use underscore's .difference() then this works! I accepted @mə'SHēn 's answer instead because it was more complete and handled everything within the example. Thank you so much for replying and showing me underscorejs!
0

You can use lodash isEqual() to do a deep comparison between objects.

https://lodash.com/docs/4.17.4#isEqual

If you need to know exactly which keys are different you would need to loop over the keys of one and compare with the other, which you can also use isEqual() for. In which case I would first use isEqual to compare the objects to see if they are equal or not. If not equal, then loop through the keys to find exactly which aren't.

1 Comment

Deep comparison in this case is not the correct way to proceed. What if the owner of the job offer will change only the description? The item would return in the diff. status even if nothing is really changed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.