2

I've come to a point where I need to rewrite a couple of methods of my class into a C-extension to increase performance (intersection_len, union_len). My C background is very limited and I have never written an extension for Python. In the extension I need to use a list that is an attribute of the class (self.table; it is a list of lists and None-objects). The extension should return an integer. So, the question is, how can I pass a list of lists and None-objects to a C-extension and read it there? Thank you in advance.

P.S. I need a custom hash table, because I use rolling hash algorithm, that is not present in any Python module Google is aware of.

class HashTable:
    def __init__(self, table_length, base):
        self.table_length = table_length
        self.base = base
        self.items = 0
        self.table = [None for _ in xrange(self.table_length)]

    def __iter__(self):
        return iter(self.table)

    def __getitem__(self, hash_value):
        return self.table[hash_value]

    def __len__(self):
        return self.items

    def __str__(self):
        return str(self.table)

    def insert(self, item, hash_value):
        if self.table[hash_value] is None:
            self.table[hash_value] = [item]
            self.items += 1
        else:
            if item not in self.table[hash_value]:
                self.table[hash_value].append(item)
                self.items += 1

    def intersection_len(self, table):
        if table.table_length != self.table_length or table.base != self.base:
            raise ValueError('Tables must have equal length and hash function base')
        result = 0
        for i, items in enumerate(table):
            if items is None or self.table[i] is None:
                continue
            for item in items:
                if item in self.table[i]:
                    result += 1
        return result

    def union_len(self, table):
        if table.table_length != self.table_length or table.base != self.base:
            raise ValueError('Tables must have equal length and hash function base')
        result = 0
        for i in xrange(self.table_length):
            if table[i] is None and self.table[i] is None:
                continue
            elif table[i] is None:
                result += len(self.table[i])
            elif self.table[i] is None:
                result += len(table[i])
            else:
                result += len(table[i])
                for item in table[i]:
                    if item not in self.table[i]:
                        result += 1
        return result

    def dump(self):
        for i in xrange(self.table_length):
            if self.table[i] is not None:
                self.table[i] = None`
11
  • I haven't looked in detail at the methods you'd like to speed up, but can't numpy help here (with, for example, a masked array to cope with the None values)? Commented Dec 25, 2014 at 22:59
  • I assume your actual question here is that you'd like to speed-up those two methods; have you considered Cython? Commented Dec 25, 2014 at 23:01
  • @Evert I've never really used Cython. If I could keep this class in Cython and the rest of my code in Python that would be great. Rewriting all the code into Cython will take a hell lot of time. Commented Dec 25, 2014 at 23:12
  • 1
    @GrayFall9, cython is incredibly easy to use. I found this series of video from scipy 2013 very helpful youtube.com/watch?v=JKCjsRDffXo Commented Dec 26, 2014 at 0:51
  • 1
    After playing about with your code for a while it seems like the only improvement I found you could do in python is to change union_len. It seems to be much slower than intersection_len (6 times slower on my machine). However, you could significantly improve the execution time of union_len by simply rewriting it as return len(self) + len(other) - self.intersection_len(other) Commented Dec 26, 2014 at 4:45

3 Answers 3

4

Simplest way to build a C extension for python is via a setup.py file such as:

from distutils.core import setup, Extension

lolan = Extension('lolan', sources = ['lolanmodule.c'])

setup (name = 'Example',
        version = '1.0',
        description = 'Just an exapmle',
        ext_modules = [lolan])

The C coding proper could be something like:

#include <Python.h>

static PyObject*
count_nones(PyObject* self, PyObject* args)
{
    PyObject* lol;

    if (!PyArg_ParseTuple(args, "O", &lol))
        return NULL;

    Py_ssize_t len = PySequence_Length(lol);
    int result = 0;
    for(Py_ssize_t i=0; i<len; ++i) {
        PyObject* item = PySequence_GetItem(lol, i);
        if(item==Py_None) ++result;
        Py_DECREF(item);
    }
    return Py_BuildValue("i", result);
}

static PyMethodDef LolanMethods[] =
{
     {"count_nones", count_nones, METH_VARARGS, "Count Nones."},
     {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC
initlolan(void)
{
     (void) Py_InitModule("lolan", LolanMethods);
}

in lolanmodule.c in the same directory. To try it, cd to that directory and run:

$ python setup.py build_ext --inplace
running build_ext
building 'lolan' extension
cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c lolanmodule.c -o build/temp.macosx-10.9-intel-2.7/lolanmodule.o
cc -bundle -undefined dynamic_lookup -arch x86_64 -arch i386 -Wl,-F. build/temp.macosx-10.9-intel-2.7/lolanmodule.o -o /Users/aleax/lolan.so

(output will differ depending on your platform -- this is in MacOSX 10.9).

Then, try it out interactively:

$ python
Python 2.7.5 (default, Mar  9 2014, 22:15:05) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import lolan
>>> l=[None]*3 + [[] for _ in range(4)]
>>> lolan.count_nones(l)
3
>>> lolan.count_nones(l*7)
21
>>> 

Of course you need much better unit tests than this, but this starts to show that function count_nones in module lolan does accept a "list of lists and Nones" (as the module name acronymizes:-) and counts how many of the argument's items are None.

Here I've focused on the "abstract level" functions presented at https://docs.python.org/2/c-api/abstract.html ; with some type-constraints, you may getter slightly better performance with "concrete level" functions as per https://docs.python.org/2/c-api/concrete.html , but it doesn't tend to make a huge difference.

Sign up to request clarification or add additional context in comments.

8 Comments

You seemed to have glossed-over the accessing of any nested lists -- perhaps because you think it's obvious from what you have provided.
Good to hear, @GrayFall9 -- so maybe you can accept the answer?
@martineau, the code I've posted shows that, given a PyObject* x that's a list, PySequence_Length(o) gives you its length, and PySequence_GetItem(o, i) gives you the PyObject* that is its ith item (which you must Py_DECREF when you're done with it). This applies no mater how you've obtained o in the first place -- so, for example, it applies if it's a list obtained as an item of another list. Given all this, it doesn't seem there's any mystery left about "nested lists", does there?-)
I meant simply you didn't show what would have to be done if PySequence_GetItem(lol, i) returned a list object rather than Py_None, and one then wanted to iterate though it's members. As I said, it could be considered fairly obvious...
Depends on the specific data structures and needs, and on whether C or C++ is actually used. C++'s std::vector is typically excellent compared with Python's list, for example; but Python's dict and sort are best-in-class. For a list-of-lists-and-Nones I have no specific experience, so I would rather try several variants and measure performance on the specific platforms and workloads of interest, rather than guess. "You can't manage what you don't measure"...!-)
|
4

Why do you believe rewriting this algorithm in C will give you the performance you require? (What is the performance you require compared to this version?)

I would probably start by writing a performance test suite, then improving the Python algorithm using Python primitives like zip instead of nested looping (for i in xrange(..)), pulling CSEs out of loops, etc...

Before starting on an extension I would run the improved Python version under pypy to get an idea of what you could expect from creating a C extension. Finally, if an extension was called for I would investigate Cython and Boost before using C (since a Python extension is a rather hard learning curve for C -- and not likely to give you something that runs fast).

2 Comments

I tried izip and it resulted in a minor improvement. The thing is I need to compare half a million of hash tables. That's why I need something dramatically faster. Seems like I should go with Cython. Thanks a lot for your help.
I tend to agree with thebjorn...I wouldn't give up on a pure-Python solution without looking for a better algorithm using built-in first. Many of them are already written in C. If you can change algorithms, another possibility might be multiprocessing depending on your hardware and exactly what you're doing.
2

As the size of a list increases, using x in some_list gets slower. The in operation has to sequentially read each item in the list until a match is found. I would try changing your data structure to use a list of sets.

Untested, but I think the only code that needs to change is insert().

def insert(self, item, hash_value):
    if self.table[hash_value] is None:
        self.table[hash_value] = set()
    self.table[hash_value].add(item)
    self.items += 1

4 Comments

Sets should also make the intersection and union methods much quicker too. The for loop to establish the number of items in those methods is currently O(n^2), becomes a much quicker set operation eg. result += len(table(i) & self.table(i)).
This made things slightly worse (because of the overhead of additional hash calculation). Collisions, though present, are quite rare, so comparing two sublists takes mostly O(1). Nevertheless, I agree that your method would be of greater use if my hash function made more collisions.
@Dunes this leads to a significant overhead of building and removing a hell lot of new sets in memory.
Building those sets is done in C, and should be very efficient. Sounds like you're expecting a sparse data set, which does make things quite different.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.