0

i'm doing an automatic language detection in python using stopwords

but i'm getting KeyError when trying to test the code. this is the code

import nltk
from nltk.corpus import stopwords

def scoreFunction(wholetext):
    dictiolist={}
    scorelist={}
    NLTKlanguage = ["dutch","finnish","german","italian","portuguese","spanish","turkish","danish","english"," french","hungarian","norwegian","russian","swedish"]
    FREElanguages = [""]
    languages= NLTKlanguages + FREElanguages
    for lang in NLTKlanguages:
        dictiolist[lang]=stopwords.words(lang)
        tokens=nltk.tokenize.word_tokenize(wholetext)
        tokens=[t.lower() for t in tokens]
        freq_dist=nltk.FreqDist(tokens)
    for lang in languages:
        scorelist[lang]=0
    for word in freq_dist.keys()[0:20]:
        if word in dictiolist[lang]:
            scorelist[lang]+=1
    return scorelist

def whichLanguage(scorelist):
    maximum=0
    for item in scorelist:
        value = scorelist[item]
        if maximum<value:
            maximum = value
            lang = item
    return lang

whene i run it scoreFunction("hillo my name is osfar and i'm genius") i get the error Traceback (most recent call last): File "", line 1, in

scoreFunction("hello my name is osfar and i'm very genius") 
File "C:/Users/osama1/Desktop
/fun-test", line 17, in scoreFunction 
if word in dictiolist[lang]:
KeyError: ''
1
  • 1
    Add all relevant information to your actual post, not in comments. Commented Apr 24, 2013 at 8:40

1 Answer 1

1

Your problem is in the following block of code:

for word in freq_dist.keys()[0:20]:
    if word in dictiolist[lang]:
    scorelist[lang]+=1

You're using the variable lang in this for loop, but you aren't defining it anywhere. Which means that its value is undefined; as it happens, its value is "" (the empty string) because that was the last value it had in your previous for loop.

What you apparently meant to do is:

for word in freq_dist.keys()[0:20]:
    for lang in languages:
        if word in dictiolist[lang]:
        scorelist[lang]+=1

By the way, there's an easier way to do what you're trying to do: use a Counter. See http://docs.python.org/2.7/library/collections.html#counter-objects for more information.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.