Error using langdetect in python: "No features in text"

Question

Hey I have a csv with multilingual text. All I want is a column appended with a the language detected. So I coded as below,

from langdetect import detect 
import csv
with open('C:\\Users\\dell\\Downloads\\stdlang.csv') as csvinput:
with open('C:\\Users\\dell\\Downloads\\stdlang.csv') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)

    all = []
    row = next(reader)
    row.append('Lang')
    all.append(row)

    for row in reader:
        row.append(detect(row[0]))
        all.append(row)

    writer.writerows(all)

But I am getting the error as LangDetectException: No features in text

The traceback is as follows

runfile('C:/Users/dell/.spyder2-py3/temp.py', wdir='C:/Users/dell/.spyder2-py3')
Traceback (most recent call last):

  File "<ipython-input-25-5f98f4f8be50>", line 1, in <module>
    runfile('C:/Users/dell/.spyder2-py3/temp.py', wdir='C:/Users/dell/.spyder2-py3')

  File "C:\Users\dell\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
    execfile(filename, namespace)

  File "C:\Users\dell\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/dell/.spyder2-py3/temp.py", line 21, in <module>
    row.append(detect(row[0]))

  File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector_factory.py", line 130, in detect
    return detector.detect()

  File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector.py", line 136, in detect
    probabilities = self.get_probabilities()

  File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector.py", line 143, in get_probabilities
    self._detect_block()

  File "C:\Users\dell\Anaconda3\lib\site-packages\langdetect\detector.py", line 150, in _detect_block
    raise LangDetectException(ErrorCode.CantDetectError, 'No features in text.')

LangDetectException: No features in text.

This is how my csv looks like 1)skunkiest smokiest yummiest strain pain killer and mood lifter 2)Relaxation, euphorique, surélevée, somnolence, concentré, picotement, une augmentation de l’appétit, soulager la douleur Giggly, physique, esprit sédation 3)Reduzierte Angst, Ruhe, gehobener Stimmung, zerebrale Energie, Körper Sedierung 4)Calmante, relajante muscular, Relajación Mental, disminución de náuseas 5)重いフルーティーな幸せ非常に強力な頭石のバースト

Please help me with this.

you cannot read and write the same file! use another file for output! — Jean-François Fabre
– Jean-François Fabre ♦, Commented Nov 24, 2016 at 10:10
Could you include the first few lines of the CSV so we could take a look? — Haroldo_OK
– Haroldo_OK, Commented Nov 24, 2016 at 10:11
Perhaps you could also add line counting to the loop, so that, when the error happens, it will be possible to know exactly which of the rows caused it. — Haroldo_OK
– Haroldo_OK, Commented Nov 24, 2016 at 10:13
@Jean-FrançoisFabre tried with different file as output, but that doesn't work! — user7140275
– user7140275, Commented Nov 24, 2016 at 10:17

Mark Cramer · Accepted Answer · 2017-10-14 04:30:04Z

18

You can use something like this to detect which line in your file is throwing the error:

for row in reader:
    try:
        language = detect(row[0])
    except:
        language = "error"
        print("This row throws and error:", row[0])
    row.append(language)
    all.append(row)

What you're going to see is that it probably fails at "重いフルーティーな幸せ非常に強力な頭石のバースト". My guess is that detect() isn't able to 'identify' any characters to analyze in that row, which is what the error implies.

Other things, like when the input is only a URL, also cause this error.

answered Oct 14, 2017 at 4:30

Mark Cramer

2,9045 gold badges36 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user3242036 Over a year ago

Isn't the point of this package to detect languages that do not contain letters like Japanese, Chinese, etc? I don't get an error when running detect('重いフルーティーな幸せ非常に強力な頭石のバースト')

Mark Cramer Over a year ago

this is an old post, so maybe the package has been updated? when i ran it in 2017 that's where it failed.

Rola · Accepted Answer · 2020-07-14 18:56:07Z

10

The error occurred when passing an object with no letters to detect. At least one letter should be there.

To reproduce, run any of below commands:

detect('.')
detect(' ')
detect('5')
detect('/')

So, you may apply some text pre-processing first to drop records in which row[0] value is an empty string,a null value, a white space, a number, a special character, or simply doesn't include any alphabets.

answered Jul 14, 2020 at 18:56

Rola

2,0441 gold badge18 silver badges14 bronze badges

1 Comment

Pedram Over a year ago

This was my case, and one can simply check the string using bool(re.match('^(?=.*[a-zA-Z])', your_string)) with import re to see whether there is at least one alphabet in your string

Max Kleiner · Accepted Answer · 2018-09-07 12:44:43Z

5

the problem is a null text or something like ' ' with no value; check this in a condition and loop your reader in a list comprehension or

from langdetect import detect   
textlang = [detect(elem) for elem in textlist if len(elem) > 50]

textlang = [detect(elem) if len(elem) > 50 else elem == 'no' for elem in textlist]

or with a loop

  texl70 = df5['Titletext']
  langdet = []                                                    

  for i in range(len(df5)):                                         
    try:                                                          
       lang=detect(texl70[i])                                      
    except:                                                       
       lang='no'                                                  
       print("This row throws error:", texl70[i])                 
    langdet.append(lang)

edited Sep 7, 2018 at 12:44

answered Sep 6, 2018 at 19:15

Max Kleiner

1,6581 gold badge16 silver badges14 bronze badges

3 Comments

Max Kleiner Over a year ago

>>> textlang = [detect(elem) if len(elem) > 50 else elem == 'no' for elem in textlist] Its better to keep your list in sync and add a false if no lang detected

Oleg Melnikov Over a year ago

This error is not just for empty strings, but for text containing only an email address, URL, a sequence of numbers, etc. Checking these is a bit tricky, but error handling shown above should be helpful.

Max Kleiner Over a year ago

Right, there's less sense to detect lang in an email or URL, Thanks

Stranger16 · Accepted Answer · 2022-08-04 07:36:09Z

0

The error occurrs when string has no letters. If you want to ignore that row and continue the process.

for i in df.index:
  str = df.iloc[i][1]  
  try:
    lang = detect(str)
  except:
    continue

edited Aug 4, 2022 at 7:36

answered Aug 4, 2022 at 7:30

Stranger16

11 bronze badge

Comments

Marcin · Accepted Answer · 2023-07-05 22:56:10Z

0

It is a bad practice to catch all possible exceptions. Let me propose something more complete, more readable and safer:

rx_letters = re.compile("[a-z]+", re.I)

for row in reader:
    try:
        if rx_letters.search(row[0]) is not None:
            row.append(detect(row[0]))
    except LangDetectException as e:
        row.append("?")
        print(f"Lang detect failed for: '{row[0]}'")

rx_letters check can be skipped, but I find it more elegant to check for the most basic condition.

answered Jul 5, 2023 at 22:56

Marcin

4,3172 gold badges30 silver badges55 bronze badges

Collectives™ on Stack Overflow

Error using langdetect in python: "No features in text"

5 Answers 5

2 Comments

1 Comment

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

1 Comment

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related