2

I want a Python program to import a list of words from a text file and print out the content of the text file as two lists. The data in the text file is on this form:

A Alfa
B Betta
C Charlie

I want a Python program to print out one list with A,B,C and one with Alfa, Betta, Charlie.

This is what I've written:

english2german = open('english2german.txt', 'r')
englist = []
gerlist = []

for i, line in enumerate(english2german):
    englist[i:], gerlist[i:] = line.split()

This is making two lists, but will only print out the first letter in each word. How can I make my code to print out the whole word?

6 Answers 6

6

You want something like this:

english2german = open("english2german.txt")
englist = []
gerlist = []

for line in english2german:
    (e, g) = line.split()
    englist.append(e)
    gerlist.append(g)

The problem with your code before is that englist[i:] is actually a slice of a list, not just a single index. A string is also iterable, so you were basically stuffing a single letter into several indices. In other words, something like gerlist[0:] = "alfa" actually results in gerlist = ['a', 'l', 'f', 'a'].

Sign up to request clarification or add additional context in comments.

Comments

6

And even shorter than amo-ej1's answer, and likely faster:

In [1]: english2german = open('english2german.txt')
In [2]: eng, ger = zip(*( line.split() for line in english2german ))
In [3]: eng
Out[3]: ('A', 'B', 'C')
In [4]: ger
Out[4]: ('Alfa', 'Betta', 'Charlie')

If you're using Python 3.0 or from future_builtins import zip, this is memory-efficient too. Otherwise replace zip with izip from itertools if english2german is very long.

3 Comments

That's.. horrible. It might be faster, but I really doubt it's "usefully-faster", and it's far harder to read (the * especially)
it's the 'unzip' operation, it's a fairly common idiom to join up pairs of things.
I've benchmarked the zip method against the code in mipadi's answer. zip is slightly slower with a small set of data, but slightly quicker with 10,000 lines... but the difference is about 0.05 on each..
3

just an addition: you're working with files. please close them :) or use the with construct:

with open('english2german.txt') as english2german:
  englist, gerlist = zip(*(line.split() for line in english2german))

Comments

1

Like this you mean:

english2german = open('k.txt', 'r')
englist = []
gerlist = []

for i, line in enumerate(english2german):
    englist.append(line.split()[0])
    gerlist.append(line.split()[1])

print englist
print gerlist

which generates:

['A', 'B', 'C'] ['Alfa', 'Betta', 'Charlie']

Comments

1

The solutions already posted are OK if you have no spaces in any of the words (ie each line has a single space). If I understand correctly, you are trying to build a dictionary, so I would suggest you consider the fact that you can also have definitions of multiple word expressions. In that case, you'd better use some other character instead of a space to separate the definition from the word. Something like "|", which is impossible to appear in a word.

Then, you do something like this:

for line in english2german:
    (e, g) = line.split("|")
    englist.append(e)
    gerlist.append(g)

2 Comments

-1: changing the file format. Use parition instead of split -- same effect--no change to the file format.
Oh well, I didn't say he has to change the file format! I just suggested. I don't really see how partition can fix the problem I described, anyway.
1

Slightly meta-answer(?) to Autoplectic's suggestion of using zip()

With 3 lines in the input file (from the supplied data in the question):

The zip() method takes an average of 0.404729390144 seconds, compared to 0.341339087486 with the simple for loop constructing two lists (the code from mipadi's currently accepted answer).

With 10,000 lines in the input file (random generated 3-12 character words. I reduced the timeit.repeat() values to 100 times, repeated twice):

zip() took an average of 1.43965339661 seconds, compared to 1.52318406105 with the for loop.

Both benchmarks were done using Python version 2.5.1

Hardly a huge difference.. Given how much more readable the simple for loop is, I would recommend using it.. The zip code might be a bit quicker with large files, but the difference is about 0.083 seconds with 10,000 lines..

Benchmarking code:

import timeit

# https://stackoverflow.com/questions/743248/something-wrong-with-output-from-list-in-python/743313#743313
code_zip = """english2german = open('english2german.txt')
eng, ger = zip(*( line.split() for line in english2german ))
"""

# https://stackoverflow.com/questions/743248/something-wrong-with-output-from-list-in-python/743268#743268
code_for = """english2german = open("english2german.txt")
englist = []
gerlist = []

for line in english2german:
    (e, g) = line.split()
    englist.append(e)
    gerlist.append(g)
"""

for code in [code_zip, code_for]:
    t = timeit.Timer(stmt = code)
    try:
        times = t.repeat(10, 10000)
    except:
        t.print_exc()
    else:
        print "Code:"
        print code
        print "Time:"
        print times
        print "Average:"
        print sum(times) / len(times)
        print "-" * 20

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.