2

I have a dataframe with 30 columns. when I load the data with pd.read_csv() method, all the columns' data types by default is set to object.

I want to change col-1 & col-5 to int & rest of the columns to category.

my question is, how can I set the remaining columns to category at once,

I know I can do something cumbersome like below

    +------------------------------------------------+
    | df['col-1'] = df['col-1'].astype('int)         |
    +------------------------------------------------+
    | df['col-2'] = df['col-2'].astype('category')   |
    | ...                                            |
    | df['col-5'] = df['col-5'].astype('int')        |
    +------------------------------------------------+
    | ...                                            |
    | df['col-29'] = df['col-29'].astype('category') |
    +------------------------------------------------+
    | df['col-30'] = df['col-30'].astype('category') |
    +------------------------------------------------+

is there any way I could do something like below while reading the csv

pd.read_csv('myfile.csv', dtype={('col-1','col-5') : int, 'rest' : category})?

is this possible??

2 Answers 2

5

Initialise a dictionary mapping column names to the required types, then pass the dictionary to DataFrame.astype:

dtypes = {c: 'category' for c in df}
dtypes.update({c: 'int' for c in ('col1', 'col5')}

out = df.astype(dtypes)

Note that you'll still need to explicitly enumerate every column — there currently isn't any scope for specifying contiguous slices to astype.


Alternatively, you'd do

int64_cols = ['col1', 'col5'] 
df.loc[:, df.columns.difference(int64_cols)] = (
      df[df.columns.difference(int64_cols)].astype('category'))

df.loc[int64_cols] = df.loc[int64_cols].astype(int)

Which is two calls to astype instead of one.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you very much & really appreciate your time. since, I'm a beginner in python/pandas , It was easy for me to understand @Erfan answer & was able to tweak it to other situations. hence, I accepted his answer. no doubt your answer looks more pro. I hope I can dissect your answer to get some advance tips.
@Tommy I undetstand. What you could do instead is upvote both of our answers, even if you can only accept one.
Done. I am able to now. just a while ago I couldn't. coz' I had less than 15 reputations..
1

Another way would be to use astype in a for loop.

cat_cols = [col for col in df.columns if col not in ['col1', 'col5']]

for col in cat_cols:
    df[col] = df[col].astype('category')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.