0

I have a Pandas df with multiple columns and each cell inside has a various number of elements of a Numpy array. I would like plot all the elements of the array for every cell within column.

I have tried

plt.plot(df['column'])
plt.plot(df['column'][0:])

both gives a ValueErr: setting an array element with a sequence

It is very important that these values get plotted to its corresponding index as the index represents linear time in this dataframe. I would really appreciate it if someone showed me how to do this properly. Perhaps there is a package other than matplotlib.pylot that is better suited for this?

Thank you

2 Answers 2

3

plt.plot needs a list of x-coordinates together with an equally long list of y-coordinates. As you seem to want to use the index of the dataframe for the x-coordinate and each cell contents for the y-coordinates, you need to repeat the x-values as many times as the length of the y-coordinates.

Note that this format doesn't suit a line plot, as connecting subsequent points would create some strange vertical lines. plt.plot accepts a marker as its third parameter, for example '.' to draw a simple dot at each position.

A code example:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

N = 30
df = pd.DataFrame({f'column{c}':
                       [np.random.normal(np.random.uniform(10, 100), 1, np.random.randint(3, 11)) for _ in range(N)]
                   for c in range(1, 6)})
legend_handles = []
colors = plt.cm.Set1.colors
desired_columns = df.columns
for column, color in zip(desired_columns, colors):
    for ind, cell in df[column].iteritems():
        if len(cell) > 0:
            plotted, = plt.plot([ind] * len(cell), cell, '.', color=color)
    legend_handles.append(plotted)
plt.legend(legend_handles, desired_columns)
plt.show()

example plot

Note that pandas really isn't meant to store complete arrays inside cells. The preferred way is to create a dataframe in "long" form, with each value in a separate row (with the "index" repeated). Most functions of pandas and seaborn don't understand about arrays inside cells.

Here's a way to create a long form which can be called using Seaborn:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

N = 30
df = pd.DataFrame({f'column{c}':
                       [np.random.normal(np.random.uniform(10, 100), 1, np.random.randint(3, 11)) for _ in range(N)]
                   for c in range(1, 6)})

desired_columns = df.columns
df_long_data = []
for column in desired_columns:
    for ind, cell in df[column].iteritems():
        for val in cell:
            dict = {'timestamp': ind, 'column_name': column, 'value': val}
            df_long_data.append(dict)
df_long = pd.DataFrame(df_long_data)
sns.scatterplot(x='timestamp', y='value', hue='column_name', data=df_long)
plt.show()

seaborn example

Sign up to request clarification or add additional context in comments.

1 Comment

Did this answer your question?
0

As per your problem, you have numpy arrays in each cell which you wanna plot. To pass your data to plt.plot() method you might need to pass every cell individually as whenever you try to pass it as a whole like you did, it is actually a sequence that you are passing. But the plot() method will accept a numpy array. This might help:

for column in df.columns:
    for cell in df[column]:
        plt.plot(cell)
        plt.show()

2 Comments

hi, I gave that a try. It outputs a separate empty plot for each cell. I will need one single graph for the whole column
you can use multiplot functionality from the Matplotlib module: This will help python-course.eu/matplotlib_subplots.php

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.