-1

I have build framework to do some algorithm evaluation. I have build methods to calculate based on data that I am passing into these method. RMSE@K, NDCG@K, MAE@K etc.

ndcg = []
rmse = []
mae = []
for i in xrange(11):
    results = generate_metrics(data_file, i)
    ndcg.append(np.mean(results['ndcg']))
    rmse.append(np.mean(results['rmse']))
    mae.append(np.mean(results['mae']))
plt.plot(ndcg)
plt.plot(rmse)
plt.plot(mae)
plt.plot()
plt.show()

I want to use ggplot within python to plot this in one graph: X axis is @k values which is 0-10 and y axis relevant value in each list.

how can I convert above lists to a data frame like this:

   at_k      ndcg      rmse       mae
1     1 0.4880583 0.3438043 0.3400933
2     2 0.4880583 0.3438043 0.3400933
3     3 0.4880583 0.3438043 0.3400933
4     4 0.4880583 0.3438043 0.3400933
5     5 0.4880583 0.3438043 0.3400933
6     6 0.4880583 0.3438043 0.3400933
7     7 0.4880583 0.3438043 0.3400933
8     8 0.4880583 0.3438043 0.3400933
9     9 0.4880583 0.3438043 0.3400933
10   10 0.4880583 0.3438043 0.3400933

and plot it using ggplot

1 Answer 1

2

Please note that this answer uses yhat'g ggpy for a python ggplot port. There exist other Python grammar of graphics implementations, such as plotnine, for which this answer does not work.

After generating some random data in the same form as your dataset using

import numpy as np
ndcg, rmse, mae = [], [], []
for i in xrange(11):
    rand = np.random.sample(3)
    ndcg.append(rand[0])
    rmse.append(rand[1])
    mae.append(rand[2])

I can create a Pandas DataFrame from it:

    import pandas as pd
at_k = range(1, 12)
df = pd.DataFrame({"at_k": at_k, "ndcg": ndcg, "rmse": rmse, "mae": mae})
print df

This outputs

    at_k       mae      ndcg      rmse
0      1  0.153102  0.546553  0.794357
1      2  0.882718  0.342260  0.762997
2      3  0.153298  0.695626  0.581455
3      4  0.073772  0.491996  0.384631
4      5  0.014066  0.369490  0.606842
5      6  0.892553  0.818312  0.396829
6      7  0.143114  0.739370  0.812050
7      8  0.847054  0.323221  0.932366
8      9  0.122838  0.613340  0.393237
9     10  0.645705  0.486312  0.138259
10    11  0.339063  0.223995  0.115242

Yay! But we can't use this for plotting with yhat's ggplot yet. Following this example, we need to transform the data:

df2 = pd.melt(df[['at_k', 'mae', 'ndcg', 'rmse']], id_vars=['at_k'])
print df2

Now we've got something like this (truncated):

    at_k variable     value
0      1      mae  0.153102
1      2      mae  0.882718
2      3      mae  0.153298
3      4      mae  0.073772
...
30     9     rmse  0.393237
31    10     rmse  0.138259
32    11     rmse  0.115242

Now it's the time to plot:

ggplot(aes(x='at_k', y='value', colour='variable'), data=df2) +\
    geom_point()

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

This is fantastic, I am going to switch geom_point with geom_line()
thats not how you convert from matplotlib.pyplot to plotnine.ggplot. thats only making a data frame for ggplot.
@kraggle Sorry I didn't make that more clear. The ggplot port I used was this one. This code will, in all likelihood, not work with plotnine. I've edited my answer to address this uncertainty. (And I've just checked, the python-ggplot tag refers to the library I've linked.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.