2

I have a block of code that looks like this,

entities = self.session.query(Entities).filter(Entities.parent_id == 0)
    
index_data = {}
for entity in entities:
    data = entity.__dict__
    data['addresses'] = [address.__dict__ for address in entity.addresses]
    data['relationships'] = [relationship.__dict__ for relationship in entity.relationships_parent]
    index_data[data['id']] = data

I am trying to read the data, process it a little bit as shown above and store in the NoSQL database (I have only given the reading part here).

I noticed that this loop takes a really long time to complete. I suspect it is due to the amount of data I read with the first line of code? Maybe in terms of network delays? Because the memory utilization here is fine.

I need a solution using SQLAlchemy code.

I am thinking that maybe this could be solved by reading the data in chunks? How can I do this? If the problem is something else, how can I solve it?

3
  • 1
    entity.relationships_parent probably sends another query for every entity. Try using other load strategy docs.sqlalchemy.org/en/14/orm/… Commented Nov 25, 2021 at 15:15
  • 1
    The thing is, I tried running this on a small dataset and it was pretty fast. However, when I run it on a larger dataset, it became really slow. What could be the reason for that? Commented Nov 25, 2021 at 15:46
  • This really helped. Maybe you can add this as an answer. The 'selectin' strategy gave me a real boost in speed. Commented Dec 17, 2021 at 5:12

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.