1

I'd like to count specific things from a file, i.e. how many times "--undefined--" appears. Here is a piece of the file's content:

"jo:ns  76.434
pRE     75.417
zi:     75.178
dEnt    --undefined--
ba      --undefined--

I tried to use something like this. But it won't work:

with open("v3.txt", 'r') as infile:
    data = infile.readlines().decode("UTF-8")

    count = 0
    for i in data:
        if i.endswith("--undefined--"):
            count += 1
    print count

Do I have to implement, say, dictionary of tuples to tackle this or there is an easier solution for that?

EDIT:

The word in question appears only once in a line.

3
  • Can the word in question appear more than once in a line? If so, how should it be counted? And how would an entry like "see:--undefined--" be counted? Commented Feb 20, 2018 at 13:12
  • @MrT No, the word can only appear once in a line. Commented Feb 20, 2018 at 13:16
  • You should always edit your question, when you add information. People tend not to read comments. Commented Feb 20, 2018 at 13:38

5 Answers 5

3

you can read all the data in one string and split the string in a list, and count occurrences of the substring in that list.

with open('afile.txt', 'r') as myfile:
    data=myfile.read().replace('\n', ' ')

data.split(' ').count("--undefined--")

or directly from the string :

data.count("--undefined--")
Sign up to request clarification or add additional context in comments.

Comments

1

readlines() returns the list of lines, but they are not stripped (ie. they contain the newline character). Either strip them first:

data = [line.strip() for line in data]

or check for --undefined--\n:

if line.endswith("--undefined--\n"):

Alternatively, consider string's .count() method:

file_contents.count("--undefined--")

Comments

1

Or don't limit yourself to .endswith(), use the in operator.

data = ''
count = 0

with open('v3.txt', 'r') as infile:
    data = infile.readlines()
print(data)

for line in data:
    if '--undefined--' in line:
        count += 1

count

Comments

1

When reading a file line by line, each line ends with the newline character:

>>> with open("blookcore/models.py") as f:
...    lines = f.readlines()
... 
>>> lines[0]
'# -*- coding: utf-8 -*-\n'
>>> 

so your endswith() test just can't work - you have to strip the line first:

if i.strip().endswith("--undefined--"):
    count += 1

Now reading a whole file in memory is more often than not a bad idea - even if the file fits in memory, it still eats fresources for no good reason. Python's file objects are iterable, so you can just loop over your file. And finally, you can specify which encoding should be used when opening the file (instead of decoding manually) using the codecs module (python 2) or directly (python3):

# py3
with open("your/file.text", encoding="utf-8") as f:

# py2:
import codecs
with codecs.open("your/file.text", encoding="utf-8") as f:

then just use the builtin sum and a generator expression:

result = sum(line.strip().endswith("whatever") for line in f)

this relies on the fact that booleans are integers with values 0 (False) and 1 (True).

Comments

1

Quoting Raymond Hettinger, "There must be a better way":

from collections import Counter

counter = Counter()
words = ('--undefined--', 'otherword', 'onemore')

with open("v3.txt", 'r') as f:
    lines = f.readlines()
    for line in lines:
        for word in words:
            if word in line:
                counter.update((word,))  # note the single element tuple

print counter

5 Comments

I hardly see how this is "a better way".
Not my words, it is Raymond's. :-) He says Counter() is a better way to count things and I agree.
Depends on what you want to count, really. Counter is great when you have different things to count, in this case it's just plain overkill.
I'm assuming he is not just counting that specific word but many as he says "'I'd like to count specific things" so Counter() is a better option as far I as can tell.
Edited to make it more generic and count a list of words instead of just one.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.