0

I've edited my question, I believe it is more didactic that way,

I'm plotting a chart using matplotlib and I'm facing issues with the formatting of the axes. I can't figure out how to force him to use the same scientific formatting all the time : In the bellow example, e4 (instead of e4 and e2). Also I would like to have always two decimals - Any idea ? the doc on that is not very extensive.

Creating a random df of data :

import numpy as np
import matplotlib.pyplot as plt
from pandas.stats.api import ols
import pandas as pd

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(100000)
y = x *100 + (np.random.randn()*100)

Calculating the linear regression :

df = pd.DataFrame({'x':x,'y':y})
res = ols(y=df['y'], x=df['x'])
df['yhat'] = df['x']*res.beta[0] + res.beta[1]

Plotting :

plt.scatter(df['x'], df['y'])  
plt.plot(df['x'], df['yhat'], color='red') 
plt.title('Scatter graph with linear regression')              
plt.xlabel('X')
plt.ylabel('Y')
plt.ticklabel_format(style='sci', scilimits=(0,0))
plt.ylim(0)
plt.xlim(0)

Please find the output here

1 Answer 1

1

As far as I can tell, matplotlib does not offer exactly this options out of the box. The documentation is indeed sparse (Ticker API is the place to go). The Formatter classes are responsible for formatting the tick values. Out of the ones offered, only ScalarFormatter (the default formatter) offers scientific formatting, however, it does not allow the exponent or number of significant digits to be fixed. One alternative would be to use either FixedFormatter or FuncFormatter, which essentially allow you to freely choose the tick values (the former can be indirectly selected using plt.gca().set_xticklabels). However, none of them allow you to choose the so called offset_string which is the string displayed at the end of the axis, customary used for a value offset, but ScalarFormatter also uses it for the scientific multiplier.

Thus, my best solution consists of a custom formatter derived from ScalarFormatter, where instead of autodetecting order of magnitude and format string, those are just fixed by the used:

from matplotlib import rcParams
import matplotlib.ticker

if 'axes.formatter.useoffset' in rcParams:
    # None triggers use of the rcParams value
    useoffsetdefault = None
else:
    # None would raise an exception
    useoffsetdefault = True

class FixedScalarFormatter(matplotlib.ticker.ScalarFormatter):
    def __init__(self, format, orderOfMagnitude=0, useOffset=useoffsetdefault, useMathText=None, useLocale=None):
        super(FixedScalarFormatter,self).__init__(useOffset=useOffset,useMathText=useMathText,useLocale=useLocale)
        self.base_format = format
        self.orderOfMagnitude = orderOfMagnitude

    def _set_orderOfMagnitude(self, range):
        """ Set orderOfMagnitude to best describe the specified data range.

        Does nothing except from preventing the parent class to do something.
        """
        pass

    def _set_format(self, vmin, vmax):
        """ Calculates the most appropriate format string for the range (vmin, vmax).

        We're actually just using a fixed format string.
        """
        self.format = self.base_format
        if self._usetex:
            self.format = '$%s$' % self.format
        elif self._useMathText:
            self.format = '$\mathdefault{%s}$' % self.format   

Note that the default value of ScalarFormatter's constructor parameter useOffset changed at some point, mine tries to guess which one is the right one.

Attach this class to one or both axes of your plots as follows:

plt.gca().xaxis.set_major_formatter(FixedScalarFormatter('%.2f',4))
plt.gca().yaxis.set_major_formatter(FixedScalarFormatter('%.2f',4))
Sign up to request clarification or add additional context in comments.

5 Comments

nice -- although shouldn't it emit at least a warning when the values are outside the range that this easily fits?
Should it? You choose the format string, if you want to have 1e6 printed with 4 digits after the decimal (i.e. 10 significant digits), then that's what you get :). Seriously: That would involve parsing the format string, not completely trivial. If you write self.format = '%.'+str(self.sigfigs)+'f' instead of self.base_format (as in the original ScalarFormatter), then you could play some games with sigfigs of course.
Even when I'm writing code for myself I think "here's enough rope to hang yourself with" is suboptimal code design. Maybe especially when writing for myself. Anyway, I wasn't thinking of parsing a warning out of the format string automatically, but tuning the warning when one sets the string & other values.
Works but what an amount of lines for "such a simple thing". I'm hesitating to simply drop my request to have two decimals and solely have the 1e6 all the time simply using yy, locs = plt.yticks() plt.yticks(yy, [a/1e6 for a in yy]) xx, locs = plt.xticks() plt.xticks(xx, [a/1e6 for a in xx])
Ah, but the two decimals is an easy thing, you use the FormatStrFormatter i.e. plt.gca().xaxis.set_major_formatter(mpl.ticker.FormatStrFormatter('%.2f')). If combined with your manual rescaling, what you'll be missing out is the offset_string`, which goes at the end of the axes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.