2

For testing data, I am in need of quickly creating large files of random text. I have one solution, taken from here and given below:

import random
import string

n = 1024 ** 2  # 1 Mb of text
chars = ''.join([random.choice(string.letters) for i in range(n)])

with open('textfile.txt', 'w+') as f:
    f.write(chars)

My problem is that this takes 653 ms to perform, way too much for my uses.

Is there a faster way to quickly generate text files with random text?

12
  • I'm curious, what is the use case here? Commented Jul 15, 2017 at 20:54
  • Possible duplicate of Generating random text strings of a given pattern Commented Jul 15, 2017 at 20:54
  • You can put random.choice(string.letters) for i in range(n) into generator and use yield to make it faster Commented Jul 15, 2017 at 20:54
  • Use some thing like faker library for python Commented Jul 15, 2017 at 20:55
  • 1
    @JonasAdler I timed an approach with numpy that gets this down to 370ms. Is that still too slow? Commented Jul 15, 2017 at 21:01

1 Answer 1

2

Create a numpy array of letters:

In [662]: letters = np.array(list(chr(ord('a') + i) for i in range(26))); letters
Out[662]: 
array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
       'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'],
      dtype='<U1')

Use np.random.choice to generate random indices b/w 0 and 26, and index letters to generate random text:

np.random.choice(letters, n)

Timings:

In [664]: n = 1024 ** 2

In [701]: %timeit np.random.choice(letters, n)
100 loops, best of 3: 15.1 ms per loop

Alternatively,

In [705]: %timeit np.random.choice(np.fromstring(letters, dtype='<U1'), n)
100 loops, best of 3: 14.1 ms per loop
Sign up to request clarification or add additional context in comments.

5 Comments

I am able to modify this somewhat and get an order of magnitude better performance: np.random.choice(np.fromstring(string.letters, dtype='S1'), n), total time 17 ms. Could you update the answer to that and I'll accept that answer?
@JonasAdler That gives you a list of chars, right? You'll want to join them together.
It seems f.write accepts char arrays. The result looks alright and writing is basically instant.
@JonasAdler I got you a bit faster, if you don't mind the fact they're not binary strings.
@JonasAdler Glad to help :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.