0

I have what seems like a rather simple questions but can't wrap my head around them.

I have a pandas dataframe for Tweets. The location of the users is registered in a variable named "Location" in various ways:

When the location is well recorded, I often get:

{'country_code': 'tr', 'state': 'Central Anatolia Region', 'county': 'Çankaya', 'city': 'Ankara'}

or

('country_code': 'tr', 'state': 'Black Sea Region', 'city': 'Trabzon'}

But sometimes, all I get is:

{'country_code': 'tr'}

('country_code': 'tr', 'state': 'Batman'}

and often, there's nothing and all that's registered is this:

{}

I want to write a script that can create new variables in my pandas dataframe for these individual values. In other words, if country_code is registered for a specific row, then I want the value in question to be recorded in a variable named country_code. And so on for state, county, and city. If nothing is there, it can simply input a blank or an NA for all the missing variables in question (county, state, city).

The end result should be such that I have four new variables in my dataframe: country-code, state, county, and city, based on the values registered in the "Location" variable with something (or nothing) registered for these values.

Can someone help by any chance?

Thank you so much!

4
  • i am confused because when you are describing DataFrame you are showing a dict. Is it a list of dict that you are referring to? Commented Nov 10, 2020 at 2:44
  • Thanks for the reply Inyoung! The variable Location in my pandas dataframe has these values--they seem to be registered as a series: type(newdf2['Location']) Out[31]: pandas.core.series.Series Commented Nov 10, 2020 at 3:10
  • 1
    pandas will automatically fill missing variables with NULL. Try printing some rows from newdf2. Commented Nov 10, 2020 at 5:14
  • I understand, thanks Inyoung. But the problem is that I want to create four new variables based on the values registered for either country_code, city, county, and state in the variable "Location". Commented Nov 10, 2020 at 12:43

1 Answer 1

0

I was able to fix the problem by working with the original JSON file directly. All I did was store the location data into the different categories I was looking by using a for and if loop similar to what others suggest here. I did so instead of trying to use pandas specific functions to store the data registered in variable "Location" into different variables in my dataset.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.