changing data types of multiple columns at once in python/pandas

Question

I have a dataframe with 30 columns. when I load the data with pd.read_csv() method, all the columns' data types by default is set to object.

I want to change col-1 & col-5 to int & rest of the columns to category.

my question is, how can I set the remaining columns to category at once,

I know I can do something cumbersome like below

    +------------------------------------------------+
    | df['col-1'] = df['col-1'].astype('int)         |
    +------------------------------------------------+
    | df['col-2'] = df['col-2'].astype('category')   |
    | ...                                            |
    | df['col-5'] = df['col-5'].astype('int')        |
    +------------------------------------------------+
    | ...                                            |
    | df['col-29'] = df['col-29'].astype('category') |
    +------------------------------------------------+
    | df['col-30'] = df['col-30'].astype('category') |
    +------------------------------------------------+

is there any way I could do something like below while reading the csv

pd.read_csv('myfile.csv', dtype={('col-1','col-5') : int, 'rest' : category})?

is this possible??

cs95 · Accepted Answer · 2019-06-06 23:18:14Z

5

Initialise a dictionary mapping column names to the required types, then pass the dictionary to DataFrame.astype:

dtypes = {c: 'category' for c in df}
dtypes.update({c: 'int' for c in ('col1', 'col5')}

out = df.astype(dtypes)

Note that you'll still need to explicitly enumerate every column — there currently isn't any scope for specifying contiguous slices to astype.

Alternatively, you'd do

int64_cols = ['col1', 'col5'] 
df.loc[:, df.columns.difference(int64_cols)] = (
      df[df.columns.difference(int64_cols)].astype('category'))

df.loc[int64_cols] = df.loc[int64_cols].astype(int)

Which is two calls to astype instead of one.

answered Jun 6, 2019 at 23:18

cs95

406k106 gold badges745 silver badges798 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Tommy Over a year ago

Thank you very much & really appreciate your time. since, I'm a beginner in python/pandas , It was easy for me to understand @Erfan answer & was able to tweak it to other situations. hence, I accepted his answer. no doubt your answer looks more pro. I hope I can dissect your answer to get some advance tips.

cs95 Over a year ago

@Tommy I undetstand. What you could do instead is upvote both of our answers, even if you can only accept one.

Tommy Over a year ago

Done. I am able to now. just a while ago I couldn't. coz' I had less than 15 reputations..

Erfan · Accepted Answer · 2019-06-06 23:20:42Z

1

Another way would be to use astype in a for loop.

cat_cols = [col for col in df.columns if col not in ['col1', 'col5']]

for col in cat_cols:
    df[col] = df[col].astype('category')

answered Jun 6, 2019 at 23:20

Erfan

43.4k10 gold badges76 silver badges86 bronze badges

Collectives™ on Stack Overflow

changing data types of multiple columns at once in python/pandas

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related