0

Say I got an array of str:

['12.5', '7', '45', '\n', '13.7', '52', '34.3', '\n']

And I want to split it by value, in this case by '\n', so it becomes:

[['12.5',  '7', '45'],
 ['13.7', '52', '34.3']]

I don't want to enumerate every element since it's time consuming when input has a large scale. So I wonder if there are some functions or python tricks that can easily achieve this.

P.S.

I've saw this question but it doesn't help much. Mainly because I don't quite understand how np.where() works with np.split(), also because I'm working on str type.

Another thing might be helpful is that my final goal is to generate a matrix of numbers (maybe float type), so I'll also be glad to know if there's any numpy function can do this.

4
  • Even if you don't want to use a loop to iterate through your elements and you prefer using "some functions or python tricks that can easily achieve this", these tools you are looking for will use a loop. So why not use one yourself for such a basic operation ? Commented Jan 22, 2018 at 8:36
  • @IMCoins I learned from some courses that many packages are using GPU computing matrices, which is faster than implement by myself with some explicit for loop. Commented Jan 22, 2018 at 8:38
  • @AmarthGûl Unfortunately, most of the packages that do that are 3rd party packages, and a loop is usually your best bet because it is implemented in C. Commented Jan 22, 2018 at 8:45
  • @cᴏʟᴅsᴘᴇᴇᴅ Well, when implementing matrix computations, I found numpy functions are way more faster than operations written by myself. So I was actually hoping numpy could save me again. Now seems you're right, the answers below are still using for loops Commented Jan 22, 2018 at 8:52

3 Answers 3

2

You can use itertools.groupby which, of course, does iterate the list, but is highly optimized:

from itertools import groupby

l = ['12.5', '7', '45', '\n', '13.7', '52', '34.3', '\n']

[list(g) for k, g in groupby(lst, '\n'.__eq__) if not k]
# [['12.5', '7', '45'], ['13.7', '52', '34.3']]

Or, with float conversion:

[list(map(float, g)) for k, g in groupby(lst, '\n'.__eq__) if not k]
# [[12.5, 7.0, 45.0], [13.7, 52.0, 34.3]]
Sign up to request clarification or add additional context in comments.

3 Comments

Alternatively, one might also use pandas for similar functionality.
Or [list(g) for k, g in groupby(lst, '\n'.__eq__) if not k]
@Kasramvd Very good point. Updated my answer. Mayby slightly less obvious to the beginner's eye, but definitely worth avoiding the lambda.
1

Using numpy:

rows = np.split(z, np.where(arr == '\n')[0] + 1)[:-1]
mat = np.array(rows).astype(np.float)

Alternatively, if we're sure to be dealing with a matrix, you could simply search for the first occurrence of '\n', reshape, and slice using that.

first = np.argmax(arr == '\n')
mat = arr.reshape(-1, first + 1)[:, 0:first].astype(np.float)

This might be faster.

Comments

0

I made a thing for this once upon a time. A chunking module. It's made to work similar to str.split

pip install chunking

Then

>>> from chunking import split
>>> a_list = ["foo", 'bar', 'SEP', 'bacon', 'eggs']
>>> split(a_list, 'SEP')
[['foo', 'bar'], ['bacon', 'eggs']]

There's also chunking.iter_split, which is a generator variant of that.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.