0

I have a python list ['a1', 'b1', 'a2', 'b2','a3', 'b3']. Set m=3 and I want get this list using loops, because here m=3 could be a larger number such as m=100.

Since we can have

m = 3

['a' + str(i) for i in np.arange(1,m+1)]
# ['a1', 'a2', 'a3']

['b' + str(i) for i in np.arange(1,m+1)]
# ['b1', 'b2', 'b3']

then I try to get ['a1', 'b1', 'a2', 'b2','a3', 'b3'] using

[ ['a','b'] + str(i) for i in np.arange(1,m+1)]

and have TypeError: can only concatenate list (not "str") to list

Then I try

[ np.array(['a','b']) + str(i) for i in np.arange(1,m+1)]

and I still get errors as UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U1'), dtype('<U1')) -> None.

How can I fix the problem? And even more, how to get something like ['a1', 'b1', 'c1', 'a2', 'b2','c2','a3', 'b3', 'c3'] through similar ways?

1
  • 1
    Use range instead of np.arange for these iterations. '+' only works for like objects - lists with lists, strings with strings. Commented Jun 17, 2022 at 4:04

4 Answers 4

2

A simple combined list comprehension would work as pointed out in the @j1-lee's answer (and later in other answers).

import string


def letter_number_loop(n, m):
    letters = string.ascii_letters[:n]
    numbers = range(1, m + 1)
    return [f"{letter}{number}" for number in numbers for letter in letters]

Similarly, one could use itertools.product(), as evidenced in Nick's answer, to obtain substantially the same:

import itertools


def letter_number_it(n, m):
    letters = string.ascii_letters[:n]
    numbers = range(1, m + 1)
    return [
        f"{letter}{number}"
        for number, letter in itertools.product(numbers, letters)]

However, it is possible to write a NumPy-vectorized approach, making use of the fact that if the dtype is object, the operations do follow the Python semantics.

import numpy as np


def letter_number_np(n, m):
    letters = np.array(list(string.ascii_letters[:n]), dtype=object)
    numbers = np.array([f"{i}" for i in range(1, m + 1)], dtype=object)
    return (letters[None, :] + numbers[:, None]).ravel().tolist()

Note that the final numpy.ndarray.tolist() could be avoided if whatever will consume the output is capable of dealing with the NumPy array itself, thus saving some relatively small but definitely appreciable time.


Inspecting Output

The following do indicate that the functions are equivalent:

funcs = letter_number_loop, letter_number_it, letter_number_np

n, m = 2, 3
for func in funcs:
    print(f"{func.__name__!s:>32}  {func(n, m)}")
              letter_number_loop  ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']
                letter_number_it  ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']
                letter_number_np  ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']

Benchmarks

For larger inputs, this is substantially faster, as evidenced by these benchmarks:

timings = {}
k = 16
for n in (2, 20):
    for k in range(1, 10):
        m = 2 ** k
        print(f"n = {n}, m = {m}")
        timings[n, m] = []
        base = funcs[0](n, m)
        for func in funcs:
            res = func(n, m)
            is_good = base == res
            timed = %timeit -r 64 -n 64 -q -o func(n, m)
            timing = timed.best * 1e6
            timings[n, m].append(timing if is_good else None)
            print(f"{func.__name__:>24}  {is_good}  {timing:10.3f} µs")

to be plotted with:

import matplotlib.pyplot as plt
import pandas as pd

n_s = (2, 20)
fig, axs = plt.subplots(1, len(n_s), figsize=(12, 4))
for i, n in enumerate(n_s):
    partial_timings = {k[1]: v for k, v in timings.items() if k[0] == n}
    df = pd.DataFrame(data=partial_timings, index=[func.__name__ for func in funcs]).transpose()
    df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', ax=axs[i], title=f"n = {n}")

benchmarks

These show that the explicitly looped versions (letter_number_loop() and letter_number_it()) are somewhat comparable, while the NumPy-vectorized (letter_number_np()) fares much better relatively quickly for larger inputs, up to ~2x speed-up.

Sign up to request clarification or add additional context in comments.

4 Comments

I do love a good chart! Great answer.
letter_number_np is significantly faster than other suggestions when the data are "large" e.g., m=20 and 10 letters. However, it is significantly slower than other methods when the numbers are low e.g., m=3 and 2 letters. So I guess that if performance is critical it's good to have an idea of the magnitude of your data before choosing the technique to process them
It's great and the timing plots are very impressive!
@AlbertWinestein It sure is. The turning point may be between m ~ 50 and m ~ 3 for n in the (2, 20) range. Eventually the preferential method should be picked based on "local" timings anyways. However, it is common in computer science to pay more attention to the limit of large n because for small n even if it the relative speed difference is high, the absolute difference is small, while for larger n usually the relative and absolute speed go in pair. Obviously, if this is part of a larger problem, the optimization cannot ignore its context.
2

You need to iterate on both the range of numbers and the list of strings

In [106]: [s+str(i) for i in range(1,4) for s in ['a','b']]
Out[106]: ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']

1 Comment

Very concise and exact code!
1

You can have more than one for in a list comprehension:

prefixes = ['a', 'b', 'c']
m = 3

output = [f"{prefix}{num}" for num in range(1, m+1) for prefix in prefixes]
print(output) # ['a1', 'b1', 'c1', 'a2', 'b2', 'c2', 'a3', 'b3', 'c3']

If you have multiple fors, those will be nested, as in

for num in range(1, m+1):
    for prefix in prefixes:
        ...

1 Comment

Nice and clearly-structured coding!
1

You could use itertools.product to get all combinations of the the letters and the m range and then join them in an f-string (rather than using join as one element is an integer so would require converting to a string):

[f'{x}{y}' for x, y in itertools.product(range(1, m+1), ['a', 'b'])]

Output:

['a1', 'b1', 'a2', 'b2', 'a3', 'b3']

6 Comments

Very neat but significantly slower (by a factor of ~30%) than a traditional list comprehension (Python 3.10.5 on macOS)
@AlbertWinestein yeah, it comes out about 25% slower on my Win11 machine; but I would expect that as the lists get larger the performance difference ought to decrease
@AlbertWinestein I suspect that is because of the small size of the data
@Nick I do not understand why I get such a significant difference (but I do) when I unpack the return from product like this: [f'{y}{x}' for x, y in itertools.product(range(1, m+1), ['a', 'b'])]
@AlbertWinestein interesting, that is quite a bit (10-15%) faster on my machine too. I guess perhaps it's having to pack the values into a tuple and then unpack them again. Regardless I'll update the answer. Thanks
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.