20

Is it possible to store arbitrary numpy arrays as the values of a single column in a dataframe of Pandas?

The arrays are all 2-dimensional, and I intend to use them to calculate values for other columns in the same dataframe.

To provide some context of what I'm trying to do here:

Each array is an adjacency matrix of some network, and for each network I want to calculate its various characteristics (e.g. density, centralities, clustering coefficient, etc) which are in fact other columns in the same dataframe.

4
  • may be it's me, but I think that question is not clear enough Commented Oct 24, 2013 at 19:11
  • FWIW I don't think your intention sits well with your request. While you can store arbitrary objects as values, you can't really do much with them in a vectorized fashion. Commented Oct 24, 2013 at 19:13
  • @DSM, the arrays are really just adjacency matrices of different graphs, and the other columns in the same dataframe are various network characteristics calculated based on each matrix. Do you suggest that I should decompose the matrix and store each row of the matrix in a separate column? Commented Oct 24, 2013 at 19:16
  • 1
    @RomanPekar, I've edited and provided more info in my question. Commented Oct 24, 2013 at 19:25

2 Answers 2

16

Store them as elements as you would do for any other data:

import numpy as np
import pandas as pd
a = np.arange(10).reshape(2,5)
b = np.arange(10, 20).reshape(2,5)
pd.DataFrame({'foo':[42,51], 'arr':[a,b]})
Out[10]: 
                                            arr  foo
0            [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]   42
1  [[10, 11, 12, 13, 14], [15, 16, 17, 18, 19]]   51

Note that what you try to do sounds more to use a Panel.

Sign up to request clarification or add additional context in comments.

3 Comments

Note, Panel is now deprecated.
This df can not be stored in PyTables format, df.to_hdf('test.hdf', key='xxx', format='table') will fail
@Boud by this way the numpy arrays get stored as a Series. Is there any way I can save it as numpy.ndarray in dataframe?
0

What do you mean store arbitrary numpy arrays as the values of a column in a dataframe of Pandas?

Something like this?

import numpy as np
import pandas as pd


x = np.random.randn(50, 25)
random_frame = pd.DataFrame(x)

This will store the array x in a DataFrame where the column names are 0, 1, 2, 3... Could you clarify? I think this is more a comment, but I don't know if I can comment yet.

2 Comments

i want to store arrays as values of a single column inside a dataframe, if possible.
I'm guessing that the 2 dimensional is n x m, and not nx1 right? I don't know if you can store an n x m array as a single column of a dataframe. Possible that I just haven't seen it. Would love to see how though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.