2

Imagine a text file of 20 mb. I am reading char by char and extracting the useful information. I have actually 2 main functions, one is reading the file and the second is extracting the info. Something like this:

def reader(path):
    f = open(path, 'r')
    source = f.read()
    f.close()

    while True:
        # here is where I read char by char and call the function extractor

def extractor(s):
    # here I extract the useful information

Now, my goal is to continue to read while extractor is working. So basicaly, my question is what is the appropriate way to accomplish my goal?

4
  • Which version of Python? Because in 3.2+, I recommend concurrent.futures module. Commented Sep 5, 2011 at 18:50
  • 1
    Are you actually seeing a performance problem without concurrent read and process? Reading 20MB from a modern hard disk should take only a couple of seconds, so gaining that time back is the absolute limit on the potential speedup. Commented Sep 5, 2011 at 18:54
  • Well actually, I am writing a program that will connect several websites, so I feel like speeding up even a nanosecond from anywhere I can would be in my favor. Commented Sep 5, 2011 at 18:56
  • If your program require connecting to multiple sources/websites and receivers, then you should consider Twisted for even-driven (asychronous) application: twistedmatrix.com Commented Sep 5, 2011 at 19:09

1 Answer 1

3

You can use producer/consumer threads. The threads can be synchronized using a Queue.Queue.

EDIT: an example of a producer/consumer system:

from threading import Thread
from Queue import Queue


def produce(queue, n_items):
    for d in range(n_items):
        queue.put(d)
        print "put {0} in queue".format(d)

def consume(queue, n_items):
    d = 0
    while d != n_items -1: # You need some sort of stop condition
        d = queue.get()
        print "got {0} from queue".format(d)

def start_producer_and_consumer(wait):
    q = Queue()
    consumer_thread = Thread(target = consume, args = (q, 10))
    producer_thread = Thread(target = produce, args = (q, 10))
    producer_thread.start()
    consumer_thread.start()
    if wait:
        producer_thread.join()
        consumer_thread.join()

if __name__ == '__main__':
    start_producer_and_consumer(True)

As you will see if you execute this, everything will be consumed in the correct order.

Sign up to request clarification or add additional context in comments.

2 Comments

I am having problems with threading. For instance, if I put 1,2,3,4,5,6,7,8,9 in the queue using threading, I strangely receive a result like 1,3,4,5,2,6,8,7,9
Edited my answer to address that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.