Python loop improvements

Question

I am wondering how efficiently to calculate the distribution of words on array based on the words from another array.

We are given the array of words test the task is to aggregate the occurrences of words from test in new array s

for word in test:
    if word not in s:
        mydict[s.count(word)] = 0
    else:           
        mydict[s.count(word)] += 1

This code is very slow, partially due to the lack of performance improvements and due to very slow Python's nature in itetations.

What is the best way to improve the above code?

alko · Accepted Answer · 2013-12-03 12:10:59Z

1

You repeat count iteration for every word in test, adding overhead of word lookup with if word not in s. Improvement might be in calculating counts once:

from collections import Counter
counts = Counter(s)

then getting hystogram in second pass:

distribution = Counter(counts[v] for v in set(test))

Demo:

>>> test = list('abcdef')
>>> s = list('here comes the sun')
>>> counts = Counter(s)
>>> distribution = Counter(counts[v] for v in set(test))
>>> distribution
Counter({0: 4, 1: 1, 4: 1})

answered Dec 3, 2013 at 12:10

alko

48.7k12 gold badges99 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

thefourtheye · Accepted Answer · 2013-12-03 12:13:10Z

1

You can use Counter and that is what they are for

from collections import Counter
print Counter(Counter(test).values())

For example,

test = ["the", "sun", "rises", "in", "the", "sun"]
from collections import Counter
print Counter(test)
print Counter(Counter(test).values())

Output

Counter({'sun': 2, 'the': 2, 'rises': 1, 'in': 1})
Counter({1: 2, 2: 2})

edited Dec 3, 2013 at 12:13

answered Dec 3, 2013 at 12:06

thefourtheye

241k53 gold badges466 silver badges505 bronze badges

Collectives™ on Stack Overflow

Python loop improvements

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related