1

I'm having some problems with one async process on nodejs.

I'm getting some data from a remote JSON and adding it in my array, this JSON have some duplicated values, and I need check if it already exists on my array before add it to avoid data duplication.

My problem is when I start the loop between the JSON values, the loop call the next value before the latest one be process be finished, so, my array is filled with duplicated data instead of maintain only one item per type.

Look my current code:

BookRegistration.prototype.process_new_books_list = function(data, callback) {
    var i    = 0,
        self = this;
    _.each(data, function(book) {
      i++;
      console.log('\n\n ------------------------------------------------------------ \n\n');
      console.log('BOOK: ' + book.volumeInfo.title);
      self.process_author(book, function() { console.log('in author'); });
      console.log('\n\n ------------------------------------------------------------');
      if(i == data.length) callback();
    })
  }

BookRegistration.prototype.process_author = function(book, callback) {
  if(book.volumeInfo.authors) {
    var author = { name: book.volumeInfo.authors[0].toLowerCase() };
    if(!this.in_array(this.authors, author)) {
      this.authors.push(author);
      callback();
    }
  }
}

BookRegistration.prototype.in_array = function(list, obj) {
  for(i in list) { if(list[i] === obj) return true; }
  return false;
} 

The result is:

[{name: author1 }, {name: author2}, {name: author1}]

And I need:

[{name: author1 }, {name: author2}]

UPDATED:

The solution suggested by @Zub works fine with arrays, but not with sequelize and mysql database.

When I try to save my authors list on the database, the data is duplicated, because the system started to save another array element before finish to save the last one.

What is the correct pattern on this case?

My code using database is:

BookRegistration.prototype.process_author = function(book, callback) {
  if(book.volumeInfo.authors) {
    var author = { name: book.volumeInfo.authors[0].toLowerCase() };
    var self   = this;
    models.Author.count({ where: { name: book.volumeInfo.authors[0].toLowerCase() }}).success(function(count) {
      if(count < 1) { 
        models.Author.create(author).success(function(author) {
          console.log('SALVANDO AUTHOR');
          self.process_publisher({ book:book, author:author }, callback);
        });
      } else {
        models.Author.find({where: { name: book.volumeInfo.authors[0].toLowerCase() }}).success(function(author) {
          console.log('FIND AUTHOR');
          self.process_publisher({ book:book, author:author }, callback);
        });        
      }
    });
    // if(!this.in_array(this.authors, 'name', author)) {
    //   this.authors.push(author);
    //   console.log('AQUI NO AUTHOR');
    //   this.process_publisher(book, callback);
    // }
  }
}

How can I avoid data duplication in an async process?

1 Answer 1

2

This is because you are comparing different objects and result is always false.

Just for experiment type in the console:

var obj1 = {a:1};
var obj2 = {a:1};
obj1 == obj2;    //false

When comparing objects (as well as arrays) it only results true when obj1 links to obj2:

var obj1 = {a:1};
var obj2 = obj1;
obj1 == obj2;    //true

Since you create new author objects in each process_author call you always get false when comparing.

In your case the solution would be to compare name property for each book:

BookRegistration.prototype.in_array = function(list, obj) {
  for(i in list) { if(list[i].name === obj.name) return true; }
  return false;
}


EDIT (related to your comment question):

I would rewrite process_new_books_list method as follows:

BookRegistration.prototype.process_new_books_list = function(data, callback) {
    var i = 0,
        self = this;
    (function nextBook() {
        var book = data[i];
        if (!book) {
            callback();
            return;
        }
        self.process_author(book, function() {
            i++;
            nextBook();
        });
    })();
}

In this case next process_author is being called not immediately (like with _.each), but after callback is executed, so you have consequence in your program.

Not sure is this works though.

Sorry for my English, I'm not a native English speaker

Sign up to request clarification or add additional context in comments.

1 Comment

Hi, it works fine, but now, I have another related problem, when I do the same process but using database, the data is duplicated. I think that it's because the latency of database process is bigger, and the loop start to process a new author before finishi process the last one. I update my question with the code using databases. Can you please have a look?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.