0

I'm using code from the example official documentation https://plotly.com/python/box-plots/ -> Box Plot With Precomputed Quartiles. In such cases, plotly using q1 for mix and q3 for max.

import plotly.graph_objects as go

fig = go.Figure()

fig.add_trace(go.Box(q1=[ 1, 2, 3 ], median=[ 4, 5, 6 ],
                  q3=[ 7, 8, 9 ], lowerfence=[-1, 0, 1],
                  upperfence=[7, 8, 9], mean=[ 2.2, 2.8, 3.2 ],
                  sd=[ 0.2, 0.4, 0.6 ], notchspan=[ 0.2, 0.4, 0.6 ], name="Precompiled Quartiles"))

fig.show()

enter image description here I already have calculated min and max and want to use them.

This is how the plot looks when I'm passing the data frame with all records. Plotly calculates itself q1,q3,max, min - everything is good except for the performance. I assume this is because values are displayed on the plot, and it makes it very heavy to render. enter image description here

I like to calc aggregates first and use them if possible. So, I've calculated q1,q3,max, min. You can see that max is greater than experience for the BrightData group. enter image description here

My expectation is that experience and max can be displayed in one plot and have different values here.

4
  • You can update upperfence to put the max value you have. Commented Dec 18, 2024 at 15:47
  • @rehaqds The max value can be greater than upperfence. So, when I create a plot, passing all the values plotly displays it correctly, but the performance is pure. That's why I want to precompute q1,q3, median, max, and min and create the plot from them. There are keyword arguments for all that I need except the min and max. Commented Dec 18, 2024 at 19:20
  • I guess you mean because of outliers. To my knowledge there is no min/max parameters but maybe this helps: stackoverflow.com/questions/68565475/… Commented Dec 18, 2024 at 20:30
  • I've updated the question and added pictures of the plot when I pass the whole dataset and when I use agregates and manually add traces to the plot. I hope it will help to understand my expectations. Commented Dec 19, 2024 at 8:36

1 Answer 1

0

I ran into a similar problem recently. Too many data points and a slow-loading figure. Try plotting only the points outside of your chosen upper and lower fences, then updating the precomputed IQR values.

df_orig = pd.read_csv(os.path.join(POST_PROC_F, "ttfb_lat.csv"))
out_box = df_orig.groupby('vendor')['ttfb'].describe() #defaults to percentiles=[.25, .5, .75] but you can include whatever you like
out_outliers = df_orig[~df_orig['ttfb'].between(df_orig.groupby('vendor')['ttfb'].transform(lambda x: x.quantile(0.25)),
                                               df_orig.groupby('vendor')['ttfb'].transform(lambda x: x.quantile(0.75)))]
fig = go.Figure()
for vendor in out_box.index:
    fig.add_trace(go.Box(x=[vendor],
                         y=out_outliers[out_outliers['vendor']==vendor]['ttfb']
                         q1=[out_box.loc[vendor, '25%']],
                         median=[out_box.loc[vendor, '50%']],
                         q3=[out_box.loc[vendor, '75%']],
                         mean=[out_box.loc[vendor, 'mean']],
                         sd=[out_box.loc[vendor, 'std']],
                         name=vendor,
                         marker_color='#1f77b4',
                         showlegend=False,
                         orientation='v') )

fig.show()
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! YOur PD skills are too advanced for me at the moment ) I have an error ``` 'Series' object has no attribute 'within' ``` I haven't been using this func before, so I'll google to find out how to use it proper way.
sorry was coding too fast, meant between. I'll update.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.