9

Can someone tell me a way to add data into pandas dataframe in python while multiple threads are going to use a function in which data has to be appended into a dataframe...?

My code scrapes data from a URL and then i was using df.loc[index]... to add the scrapped row into the dataframe.

Since I've started a multi thread which basically assigns each URL to each thread. So in short many pages are being scraped at once...

How do I append those rows into the dataframe?

1 Answer 1

9

Adding rows to dataframes one-by-one is not recommended. I suggest you build your data in lists, then combine those lists at the end, and then only call the DataFrame constructor once at the end on the full data set.

Example:

# help from http://stackoverflow.com/a/28463266/3393459
# and http://stackoverflow.com/a/2846697/3393459


from multiprocessing.dummy import Pool as ThreadPool 
import requests
import pandas as pd


pool = ThreadPool(4) 

# called by each thread
def get_web_data(url):
    return {'col1': 'something', 'request_data': requests.get(url).text}


urls = ["http://google.com", "http://yahoo.com"]
results = pool.map(get_web_data, urls)


print results
print pd.DataFrame(results)
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you.. That's an idea for sure. How do I manage a workaround to index each list? Since any thread can generate any list name at any time. So giving an index to start with and then increase it one by one may not be the right choice...
Not sure what you mean. I posted example code so we can talk more concretely. When multiprocessing my understanding is you can't have any guarantees about the order in which results come back... If you want to post your code that might also be helpful.
I just took your list advice and just appended all the data into a list and then finally transferred it to pandas dataframe and it worked perfectly for my case! Thanks a lot :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.