Convert from matplotlib to ggplot2 within python

Question

I have build framework to do some algorithm evaluation. I have build methods to calculate based on data that I am passing into these method. RMSE@K, NDCG@K, MAE@K etc.

ndcg = []
rmse = []
mae = []
for i in xrange(11):
    results = generate_metrics(data_file, i)
    ndcg.append(np.mean(results['ndcg']))
    rmse.append(np.mean(results['rmse']))
    mae.append(np.mean(results['mae']))
plt.plot(ndcg)
plt.plot(rmse)
plt.plot(mae)
plt.plot()
plt.show()

I want to use ggplot within python to plot this in one graph: X axis is @k values which is 0-10 and y axis relevant value in each list.

how can I convert above lists to a data frame like this:

   at_k      ndcg      rmse       mae
1     1 0.4880583 0.3438043 0.3400933
2     2 0.4880583 0.3438043 0.3400933
3     3 0.4880583 0.3438043 0.3400933
4     4 0.4880583 0.3438043 0.3400933
5     5 0.4880583 0.3438043 0.3400933
6     6 0.4880583 0.3438043 0.3400933
7     7 0.4880583 0.3438043 0.3400933
8     8 0.4880583 0.3438043 0.3400933
9     9 0.4880583 0.3438043 0.3400933
10   10 0.4880583 0.3438043 0.3400933

and plot it using ggplot

Carsten · Accepted Answer · 2024-05-27 22:04:41Z

2

Please note that this answer uses yhat'g ggpy for a python ggplot port. There exist other Python grammar of graphics implementations, such as plotnine, for which this answer does not work.

After generating some random data in the same form as your dataset using

import numpy as np
ndcg, rmse, mae = [], [], []
for i in xrange(11):
    rand = np.random.sample(3)
    ndcg.append(rand[0])
    rmse.append(rand[1])
    mae.append(rand[2])

I can create a Pandas DataFrame from it:

    import pandas as pd
at_k = range(1, 12)
df = pd.DataFrame({"at_k": at_k, "ndcg": ndcg, "rmse": rmse, "mae": mae})
print df

This outputs

    at_k       mae      ndcg      rmse
0      1  0.153102  0.546553  0.794357
1      2  0.882718  0.342260  0.762997
2      3  0.153298  0.695626  0.581455
3      4  0.073772  0.491996  0.384631
4      5  0.014066  0.369490  0.606842
5      6  0.892553  0.818312  0.396829
6      7  0.143114  0.739370  0.812050
7      8  0.847054  0.323221  0.932366
8      9  0.122838  0.613340  0.393237
9     10  0.645705  0.486312  0.138259
10    11  0.339063  0.223995  0.115242

Yay! But we can't use this for plotting with yhat's ggplot yet. Following this example, we need to transform the data:

df2 = pd.melt(df[['at_k', 'mae', 'ndcg', 'rmse']], id_vars=['at_k'])
print df2

Now we've got something like this (truncated):

    at_k variable     value
0      1      mae  0.153102
1      2      mae  0.882718
2      3      mae  0.153298
3      4      mae  0.073772
...
30     9     rmse  0.393237
31    10     rmse  0.138259
32    11     rmse  0.115242

Now it's the time to plot:

ggplot(aes(x='at_k', y='value', colour='variable'), data=df2) +\
    geom_point()

enter image description here

edited May 27, 2024 at 22:04

answered Feb 9, 2015 at 21:08

Carsten

18.5k4 gold badges52 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

add-semi-colons Over a year ago

This is fantastic, I am going to switch geom_point with geom_line()

kraggle Over a year ago

thats not how you convert from matplotlib.pyplot to plotnine.ggplot. thats only making a data frame for ggplot.

Carsten Over a year ago

@kraggle Sorry I didn't make that more clear. The ggplot port I used was this one. This code will, in all likelihood, not work with plotnine. I've edited my answer to address this uncertainty. (And I've just checked, the python-ggplot tag refers to the library I've linked.)

Collectives™ on Stack Overflow

Convert from matplotlib to ggplot2 within python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related