How do I refactor a simple commandline script to be object oriented?

Question

I don't have a lot of experience with object-oriented Python and want to refactor a simple command-line tool. My current script simply imports needed libraries, has a few function definitions, uses global variables (which I know is bad practice) and uses argparse all in one .py file, e.g.:

import argparse

dict = {}

#Some code to populate the dict, used both in checking the argument and later in the script

def check_value(value):
    if not dict.has_key(value):
        raise argparse.ArgumentTypeError("%s is invalid." % value)
    return value

parser = argparse.ArgumentParser(…)
parser.add_argument('…', type=check_value, help='…')

args = parser.parse_args()

# Some other code that uses the dict

As well, the way that I handle some of the argument parsing uses a function similar to "abc" which modifies a dictionary I need later in the script. How do I make this less ugly? Or is using global variables acceptable for a simple commandline script?

what is xyz and why are you importing it and not using it. Also, you are using argparse, but you didn't import it. Can you post your actual code? — joel goldstick
– joel goldstick, Commented Jun 18, 2016 at 17:34

Alex Barry · Accepted Answer · 2016-06-18 19:22:29Z

Let's create a single class that mimics our dict, and get's passed to these methods instead of operating on a global variable, and you can tell me if this is more what you're looking for. The fact of the matter is, you are already using a bit of object oriented programming and don't know it (the dict class)

Step 1: The Dict Class

Python Object Oriented Programming revolves around Classes, which can get instantiated one or more times, and these use functions defined on them. Then we call the methods with classname.class_method(input_variables) and get a return value just like if we called a non-bound function. In fact, there is a well defined difference between a global function and a function explicitly tied to a class instance. This is the difference between a 'bound' and 'unbound' method, and is where we get the magic name, self.

class ExampleDict(object):

    #Called when a new class instance is created
    def __init__(self, test):
        self.dict = {}
        self.dict["test"] = test

    #Called when len() is called with a class instance as an argument
    def __len__(self):
        return len(self.dict)

    #Called when the internal dict is accessed by instance_name[key]
    def __getitem__(self, key):
        return self.dict[key]

    #Called when a member of the internal dict is set by 
    #instance_name[key] = value
    def __setitem__(self, key, value):
        self.dict[key] = value

    #Called when a member of the internal dict is removed by del instance[key]
    def __delitem__(self, key):
        del self.dict[key]

    #Return an iterable list of (key, value) pairs
    def get_list(self):
        return self.dict.items()

    #Replace your global function with a class method
    def check_value(self, key):
        if not self.dict.has_key(value):
            raise argparse.ArgumentTypeError("%s is invalid." % value)
        return value

A few notes:

This class definition must appear before any instances are created
A Class instance is created with the following syntax: d = ExampleDict()

Step 2: Removing Global Variables

Now, we want to remove the use of global variables. In many cases, as you see above, we can turn your methods into class methods. In cases where this isn't appropriate, you can accept an object as an argument to a global function. In this case, we want your method to accept the dict it operates on as an argument rather than operating on a global variable directly

def check_value_global(inp_dict, value):
    if not inp_dict.has_key(value):
        raise argparse.ArgumentTypeError("%s is invalid." % value)
    return value

Step 3: Declaring Instances of the Class & Using them

Now, let's define what happens when we run the script and declare some class instances, then pass them to the method or execute their class methods:

if __name__ == "__main__":

    #Declare an instance of the class
    ex = ExampleDict("testing")

    print(ex["test"])
    ex["test2"] = "testing2"
    check_value_global(ex, "test")
    print("At next section")
    ex.check_value("test2")
    print("At final section")

For a more in-depth discussion of the power of object-oriented programming in Python, please see here

Edit Based on Comment

Ok so let's look at argparse in particular. This is going to parse command line arguments fed to the script (an alternative here would simply be reading from sys.argv).

In theory, we could include this anywhere, but we really should include it either directly after if __name__=="__main__":, or in a method called after this statement. Why?

Anything after this statement is run only if the python script is called directly from the command line, rather than imported as a module. Let's say you wanted to import your script into another python script and use it there. You don't require command line arguments in this case, so you don't want to try and parse them.

With all this said, we now know that we have both a dict object and an argparse object initialized in the main segment (after if __name__=="__main__":). How do we pass those to functions and classes defined above them in the program?

Well we have many options, the most common I utilize are:

Where appropriate, we can redefine the dict class we are using to allow methods to be called as class methods after initialization
We can pass objects to functions, as shown in step 2 above
We can define a singleton class, which takes arguments of the dict and argparse object and stores them. Then, all major program flow in the relevant area runs through the singleton and these references are always available

Here's an example:

class SingletonExample(object):

    def __init__(self, dict_obj, arg_obj):
        self.dict = dict_obj
        self.args = arg_obj

    def some_script_function(self):
        pass
        #Use your self.dict and self.args arguments

The fact of the matter is that you are really talking about Design Patterns, and the way you solve this problem will be dictated by the design you choose. For instance, the third solution here is commonly referred to as the singleton pattern. Decide what pattern best suits the task at hand, do some research on it, and this will tell you how to structure your objects and methods.

Thanks for your detailed answer. Although this will hopefully help someone looking for more general information about classes in Python, I do know what a class is and how to use it. I guess my question is more about argparse itself, since I'm not sure where the ArgumentParser object needs to be defined in order for commandline arguments to be seen. Does it not need to be in the "main()" method? Where should I define a dictionary that needs to be used in argument parsing AND in the rest of my program without using global variables?

David C. Bishop · Accepted Answer · 2016-06-19 12:29:38Z

You will probably want to concentrate on dependency injection. Basically pass all the things you need to use into your objects/functions.

I like to have a args_to_options() factory function. It returns an Options class (in Python a dict would probably be fine) with the flags and properties set or errors out if there is an issue. Make sure it's just responsible for constructing the options and nothing else. You could even have a separate validation function if you wanted.

Note my Python is a bit rusty, so beware.

class App:
    def __init__(self, options):
        self.apply_options(options)
    def apply_options(self, options):
        if options.do_the_thing:
            self.thing_dooer = true
        # or maybe just self.options = options
    def run(self):
        if self.thing_dooer:
            do_thing()

class Options:
    do_the_thing = False
    server_address = None

    def validate(self):
        # if it's in a bad state, raise exception
        # (or return a bool you check)
        if self.server_address is None:
            # raise here

def args_to_options(args):
    # Parse arguments here
    # raise exception if parsing fails
    return options

def Main(stdout, stderr, args, file_opener):
    try:
       options = args_to_options(args)
       options.stdout = stdout
       options.stderr = stderr
       options.validate()
    except:
       print_error(file=stderr)
       print_help(file=stdout)
    app = App(options, file_opener)
    app.run()

if __name__ == "__name__":
    Main(sys.stdout, sys.stderr, sys.argv, open)

I actually make a separate Main function that's called from the real main (in Python the 'real main' would be in the if name check). I will pass stdout, stdin, stderr as needed and treat them as generic reader/writer streams. Possibly a generic local file-system interface (some class that wraps open(), os.link, os.mkdir as needed). In Go I use AferoFS (although it doesn't support linking). If you have a lot of that then a custom class to hold it all.

Then I won't use prints without specifying the stream writer and all my file open() calls go through the passed in interface.

A lot of that is probably overkill, especially in Python where you can monkey patch out the implementations of things like open and stdout, but it's a pattern I use in other places and allows for testing.

In addition to testing you get the flexiblity to modify what your code does. For example it could files over http by replacing the open()).

You could also set the stdout/stderr directly in the options rather than the App class.

It's often better to use higher level abstractions for the interfaces though. For example rather than passing in stdout/stderr to the app, a logger might be better. Rather than a low level file system interface, a generic 'DataStore' that treats filenames as a key for a value lookup and returns the actual object. Databases are done in much the same way.

But the low level stuff might be needed for example if your script is genuinely doing a lot of low level stuff. For example making directories, links, checking file stats and so on.

Even then its worth stepping back and thinking about a more 'generic' concept of what you are doing. Are you making directories? or adding categories to a library that just happens to be stored in directory format. Under the hood it would implemented be the same way, but the architecture will be more generic and cleaner. It will let you separate your I/O processing/parsing, from the logic of your application.

Also the low level interfaces can be used to make the higher level ones so the higher level ones are more testable.

I think what you'd be looking for in Python here might be pinject

Collectives™ on Stack Overflow

How do I refactor a simple commandline script to be object oriented?

2 Answers 2

Step 1: The Dict Class

Step 2: Removing Global Variables

Step 3: Declaring Instances of the Class & Using them

Edit Based on Comment

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Step 1: The Dict Class

Step 2: Removing Global Variables

Step 3: Declaring Instances of the Class & Using them

Edit Based on Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related