1

I want to transform a folder of text documents in the following format:

texts = ['text of document 1', 'text of document 2', 'text of document 3',...]

in order to apply text mining methods.

So far my code is the following:

import os
file= "*.txt"
path = "C:\\"
texts=[]

for files in os.listdir(path):
     with open(path + files) as f:
         for x in f:
             texts.append(x)

Unfortunately, the outcome differs from the wanted one:

texts = ['line 1 of document 1', 'line 2 of document 1', …]

What am I doing wrongly? Can anybody suggest an improvement for my code?

2
  • So you want to read all the txt files in a folder and store their content in a list? Commented Mar 19, 2019 at 7:32
  • Yes, I already used f.read(), but then the list has empty entries: texts = ['','','',...] Commented Mar 19, 2019 at 7:32

1 Answer 1

3

for line in file: (or in your case, for x in f:) iterates over the lines in a file.

Use the .read() method instead. That will read the entire file into a string:

for files in os.listdir(path):
     with open(path + files) as f:
         texts.append(f.read())

Edit: I just saw your comment about empty entries. If your directory contains empty files, you can prevent them from being added:

for files in os.listdir(path):
     with open(path + files) as f:
         contents = f.read()
         if contents.strip(): # will also remove files that contain only whitespace
             texts.append(f.read())
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.