file stream processing in python

Question

I've got a data file where each "row" is delimited by \n\n\n. My solution is to isolate those rows by first slurping the file, and then splitting rows:

 for row in slurped_file.split('\n\n\n'):
    ...

Is there an "awk-like" approach I could take to parse the file as a stream within Python 2.7.9 , and split lines according to a given string value ? Thanks.

Is there a specific reason the file.read(num_bytes) method doesn't work for you? Just trying to better understand the requirements. It seems a lazy-generator based on reading bytes into a buffer and yielding split strings would be ideal for this. — aruisdante
– aruisdante, Commented Feb 19, 2015 at 17:48
There is a bug/feature request for such thing to be added into Python standard library; see also this question, but there is an easier workaround too. — Antti Haapala
– Antti Haapala, Commented Feb 19, 2015 at 18:06
The \n\n\n delimit large blocs of data (which will fit in memory, but I don't know in advance the size of those blocs). — user2105469
– user2105469, Commented Feb 19, 2015 at 18:09
Yes, three consecutive line feeds when parsing with od -c. — user2105469
– user2105469, Commented Feb 24, 2015 at 9:32

Antti Haapala · Accepted Answer · 2015-02-19 18:17:13Z

3

So there is no such thing in the standard library. But we can make a custom generator to iterate over such records:

def chunk_iterator(iterable):
    chunk = []
    empty_lines = 0
    for line in iterable:
        chunk.append(line)
        if line == '\n':
            empty_lines += 1
            if empty_lines == 2:
                yield ''.join(chunk[:-2])
                empty_lines, chunk = 0, []
        else:
            empty_lines = 0

    yield ''.join(chunk)

Use as:

with open('filename') as f:
    for chunk in chunk_iterator(f):
        ...

This will use the per-line iteration of file written in C in CPython and thus be faster than the general record separator solution.

edited Feb 19, 2015 at 18:17

answered Feb 19, 2015 at 18:11

Antti Haapala

135k23 gold badges298 silver badges349 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

file stream processing in python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related