Python concatenate (multiple) strings to number

Question

I have a python list ['a1', 'b1', 'a2', 'b2','a3', 'b3']. Set m=3 and I want get this list using loops, because here m=3 could be a larger number such as m=100.

Since we can have

m = 3

['a' + str(i) for i in np.arange(1,m+1)]
# ['a1', 'a2', 'a3']

['b' + str(i) for i in np.arange(1,m+1)]
# ['b1', 'b2', 'b3']

then I try to get ['a1', 'b1', 'a2', 'b2','a3', 'b3'] using

[ ['a','b'] + str(i) for i in np.arange(1,m+1)]

and have TypeError: can only concatenate list (not "str") to list

Then I try

[ np.array(['a','b']) + str(i) for i in np.arange(1,m+1)]

and I still get errors as UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U1'), dtype('<U1')) -> None.

How can I fix the problem? And even more, how to get something like ['a1', 'b1', 'c1', 'a2', 'b2','c2','a3', 'b3', 'c3'] through similar ways?

Use range instead of np.arange for these iterations. '+' only works for like objects - lists with lists, strings with strings. — hpaulj
– hpaulj, Commented Jun 17, 2022 at 4:04

norok2 · Accepted Answer · 2022-06-17 06:50:28Z

2

A simple combined list comprehension would work as pointed out in the @j1-lee's answer (and later in other answers).

import string


def letter_number_loop(n, m):
    letters = string.ascii_letters[:n]
    numbers = range(1, m + 1)
    return [f"{letter}{number}" for number in numbers for letter in letters]

Similarly, one could use itertools.product(), as evidenced in Nick's answer, to obtain substantially the same:

import itertools


def letter_number_it(n, m):
    letters = string.ascii_letters[:n]
    numbers = range(1, m + 1)
    return [
        f"{letter}{number}"
        for number, letter in itertools.product(numbers, letters)]

However, it is possible to write a NumPy-vectorized approach, making use of the fact that if the dtype is object, the operations do follow the Python semantics.

import numpy as np


def letter_number_np(n, m):
    letters = np.array(list(string.ascii_letters[:n]), dtype=object)
    numbers = np.array([f"{i}" for i in range(1, m + 1)], dtype=object)
    return (letters[None, :] + numbers[:, None]).ravel().tolist()

Note that the final numpy.ndarray.tolist() could be avoided if whatever will consume the output is capable of dealing with the NumPy array itself, thus saving some relatively small but definitely appreciable time.

Inspecting Output

The following do indicate that the functions are equivalent:

funcs = letter_number_loop, letter_number_it, letter_number_np

n, m = 2, 3
for func in funcs:
    print(f"{func.__name__!s:>32}  {func(n, m)}")

              letter_number_loop  ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']
                letter_number_it  ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']
                letter_number_np  ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']

Benchmarks

For larger inputs, this is substantially faster, as evidenced by these benchmarks:

timings = {}
k = 16
for n in (2, 20):
    for k in range(1, 10):
        m = 2 ** k
        print(f"n = {n}, m = {m}")
        timings[n, m] = []
        base = funcs[0](n, m)
        for func in funcs:
            res = func(n, m)
            is_good = base == res
            timed = %timeit -r 64 -n 64 -q -o func(n, m)
            timing = timed.best * 1e6
            timings[n, m].append(timing if is_good else None)
            print(f"{func.__name__:>24}  {is_good}  {timing:10.3f} µs")

to be plotted with:

import matplotlib.pyplot as plt
import pandas as pd

n_s = (2, 20)
fig, axs = plt.subplots(1, len(n_s), figsize=(12, 4))
for i, n in enumerate(n_s):
    partial_timings = {k[1]: v for k, v in timings.items() if k[0] == n}
    df = pd.DataFrame(data=partial_timings, index=[func.__name__ for func in funcs]).transpose()
    df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', ax=axs[i], title=f"n = {n}")

These show that the explicitly looped versions (letter_number_loop() and letter_number_it()) are somewhat comparable, while the NumPy-vectorized (letter_number_np()) fares much better relatively quickly for larger inputs, up to ~2x speed-up.

edited Jun 17, 2022 at 6:50

answered Jun 17, 2022 at 6:34

norok2

27.1k6 gold badges83 silver badges110 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Nick Over a year ago

I do love a good chart! Great answer.

jackal Over a year ago

letter_number_np is significantly faster than other suggestions when the data are "large" e.g., m=20 and 10 letters. However, it is significantly slower than other methods when the numbers are low e.g., m=3 and 2 letters. So I guess that if performance is critical it's good to have an idea of the magnitude of your data before choosing the technique to process them

John Stone Over a year ago

It's great and the timing plots are very impressive!

norok2 Over a year ago

@AlbertWinestein It sure is. The turning point may be between m ~ 50 and m ~ 3 for n in the (2, 20) range. Eventually the preferential method should be picked based on "local" timings anyways. However, it is common in computer science to pay more attention to the limit of large n because for small n even if it the relative speed difference is high, the absolute difference is small, while for larger n usually the relative and absolute speed go in pair. Obviously, if this is part of a larger problem, the optimization cannot ignore its context.

hpaulj · Accepted Answer · 2022-06-17 04:08:43Z

2

You need to iterate on both the range of numbers and the list of strings

In [106]: [s+str(i) for i in range(1,4) for s in ['a','b']]
Out[106]: ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']

answered Jun 17, 2022 at 4:08

hpaulj

233k14 gold badges260 silver badges392 bronze badges

1 Comment

John Stone Over a year ago

Very concise and exact code!

j1-lee · Accepted Answer · 2022-06-17 03:17:40Z

1

You can have more than one for in a list comprehension:

prefixes = ['a', 'b', 'c']
m = 3

output = [f"{prefix}{num}" for num in range(1, m+1) for prefix in prefixes]
print(output) # ['a1', 'b1', 'c1', 'a2', 'b2', 'c2', 'a3', 'b3', 'c3']

If you have multiple fors, those will be nested, as in

for num in range(1, m+1):
    for prefix in prefixes:
        ...

answered Jun 17, 2022 at 3:17

j1-lee

13.9k3 gold badges16 silver badges27 bronze badges

1 Comment

John Stone Over a year ago

Nice and clearly-structured coding!

Nick · Accepted Answer · 2022-06-17 07:04:15Z

1

You could use itertools.product to get all combinations of the the letters and the m range and then join them in an f-string (rather than using join as one element is an integer so would require converting to a string):

[f'{x}{y}' for x, y in itertools.product(range(1, m+1), ['a', 'b'])]

Output:

['a1', 'b1', 'a2', 'b2', 'a3', 'b3']

edited Jun 17, 2022 at 7:04

answered Jun 17, 2022 at 3:19

Nick

147k23 gold badges67 silver badges106 bronze badges

6 Comments

jackal Over a year ago

Very neat but significantly slower (by a factor of ~30%) than a traditional list comprehension (Python 3.10.5 on macOS)

Nick Over a year ago

@AlbertWinestein yeah, it comes out about 25% slower on my Win11 machine; but I would expect that as the lists get larger the performance difference ought to decrease

juanpa.arrivillaga Over a year ago

@AlbertWinestein I suspect that is because of the small size of the data

jackal Over a year ago

@Nick I do not understand why I get such a significant difference (but I do) when I unpack the return from product like this: [f'{y}{x}' for x, y in itertools.product(range(1, m+1), ['a', 'b'])]

Nick Over a year ago

@AlbertWinestein interesting, that is quite a bit (10-15%) faster on my machine too. I guess perhaps it's having to pack the values into a tuple and then unpack them again. Regardless I'll update the answer. Thanks

|

Collectives™ on Stack Overflow

Python concatenate (multiple) strings to number

4 Answers 4

Inspecting Output

Benchmarks

4 Comments

1 Comment

1 Comment

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Inspecting Output

Benchmarks

4 Comments

1 Comment

1 Comment

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related