2

Say I have the following:

my_list = np.array(["abc", "def", "ghi"])

and I'd like to get:

np.array(["ef", "hi"])

I tried:

my_list[1:,1:]

But then I get:

IndexError: too many indices for array

Does Numpy support slicing strings?

5 Answers 5

2

No, you cannot do that. For numpy np.array(["abc", "def", "ghi"]) is a 1D array of strings, therefore you cannot use 2D slicing.

You could either define your array as a 2D array or characters, or simply use list comprehension for slicing,

In [4]: np.asarray([el[1:] for el in my_list[1:]])
Out[4]: 
array(['ef', 'hi'], dtype='|S2')
Sign up to request clarification or add additional context in comments.

1 Comment

Starting in numpy 1.23.0 you can
1

Your array of strings stores the data as a contiguous block of characters, using the 'S3' dtype to divide it into strings of length 3.

In [116]: my_list
Out[116]: 
array(['abc', 'def', 'ghi'], 
      dtype='|S3')

A S1,S2 dtype views each element as 2 strings, with 1 and 2 char each:

In [115]: my_list.view('S1,S2')
Out[115]: 
array([('a', 'bc'), ('d', 'ef'), ('g', 'hi')], 
     dtype=[('f0', 'S1'), ('f1', 'S2')])

select the 2nd field to get an array with the desired characters:

In [114]: my_list.view('S1,S2')[1:]['f1']
Out[114]: 
array(['ef', 'hi'], 
      dtype='|S2')

My first attempt with view was to split the array into single byte strings, and play with the resulting 2d array:

In [48]: my_2dstrings = my_list.view(dtype='|S1').reshape(3,-1)

In [49]: my_2dstrings
Out[49]: 
array([['a', 'b', 'c'],
       ['d', 'e', 'f'],
       ['g', 'h', 'i']], 
      dtype='|S1')

This array can then be sliced in both dimensions. I used flatten to remove a dimension, and to force a copy (to get a new contiguous buffer).

In [50]: my_2dstrings[1:,1:].flatten().view(dtype='|S2')
Out[50]: 
array(['ef', 'hi'], 
      dtype='|S2')

If the strings are already in an array (as opposed to a list) then this approach is much faster than the list comprehension approaches.

Some timings with the 1000 x 64 list that wflynny tests

In [98]: timeit [s[1:] for s in my_list_64[1:]]
10000 loops, best of 3: 173 us per loop   # mine's slower computer

In [99]: timeit np.array(my_list_64).view('S1').reshape(64,-1)[1:,1:].flatten().view('S63')
1000 loops, best of 3: 213 us per loop

In [100]: %%timeit arr =np.array(my_list_64)
   .....: arr.view('S1').reshape(64,-1)[1:,1:].flatten().view('S63')   .....: 
10000 loops, best of 3: 23.2 us per loop

Creating the array from the list is slow, but once created the view approach is much faster.


See my edit history for my earlier notes on np.char.

Comments

0

As per Joe Kington here, python is very good at string manipulations and generator/list comprehensions are fast and flexible for string operations. Unless you need to use numpy later in your pipeline, I would urge against it.

[s[1:] for s in my_list[1:]]

is fast:

In [1]: from string import ascii_lowercase
In [2]: from random import randint, choice
In [3]: my_list_rand = [''.join([choice(ascii_lowercase) 
                                 for _ in range(randint(2, 64))])
                        for i in range(1000)]
In [4]: my_list_64 = [''.join([choice(ascii_lowercase) for _ in range(64)])
                      for i in range(1000)]

In [5]: %timeit [s[1:] for s in my_list_rand[1:]]
10000 loops, best of 3: 47.6 µs per loop
In [6]: %timeit [s[1:] for s in my_list_64[1:]]
10000 loops, best of 3: 45.3 µs per loop

Using numpy just adds overhead.

Comments

0

Starting with numpy 1.23.0, I added a mechanism to change the dtype of views of non-contiguous arrays. That means you can view your array as individual characters, slice it how you like, and then build it back together. Before this would require a copy, as @hpaulj's answer clearly shows.

>>> my_list = np.array(["abc", "def", "ghi"])
>>> my_list[:, None].view('U1')[1:, 1:].view('U2').squeeze()
array(['ef', 'hi'])

I'm working on another layer of abstraction, specifically for string arrays called np.slice_ (currently work-in-progress in PR #20694, but the code is functional). If that should get accepted, you will be able to do

>>> np.char.slice_(my_list[1:], 1)
array(['ef', 'hi'])

Comments

-2

Your slicing is incorrectly syntaxed. You only need to do my_list[1:] to get what you need. If you want to copy the elements twice onto a list, You can do something = mylist[1:].extend(mylist[1:])

1 Comment

Note that I'm talking about multi-dimensional slicing in Numpy, which does allow the syntax that I used. Also, my_list[1:] gets me ['def', 'ghi'], not ['ef', 'hi'].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.