1

In the following code, I am trying to get elements that can be trained on SpaCy NER Model (in the 9th line of code).

from ast import literal_eval
import re

train_data_list = []

for i in range(len(train_data)):
    a = re.search(train_data.subtext[i], train_data.text[i])
    if a is not None:
        element = '("' +train_data.text[i] + '"' + ', {"entities": [(' + 
        str(a.start()) + ',' + str(a.end()) + ',"SKILL")]})'
        train_data_list.append(literal_eval(element))

But I am encountering the following error

 SyntaxError: EOL while scanning string literal

Thanks in Advance.

9
  • Look at the text value of element as the time of literal_eval. Fix the code to ensue it is valid: I suspect it might be .. 'funky'. Commented Nov 1, 2018 at 6:17
  • The text value of train_data consists of continous text. I am encountering problem only in few cases. (I mean while processing certain text values only.) Commented Nov 1, 2018 at 6:22
  • Exactly! Because some of those values result in a string that cannot be parsed with literal_eval. If a specific example is identified the problem should be 'clear'. Include the specific value of element in such failing cases in the question, so that proper solutions can be suggested. Commented Nov 1, 2018 at 6:25
  • The example when the code fails is when the text value is as follows. \ncreate asset tracking database used for gain/loss profits, facility overhead, and finance research, including\nassisting in the implementation of sap business one. email correspondence, and proposal correspondence (both the\ncreation and assessment of). contract negotiations from customer/client to third party vendors and facilities.\nbuilt solid, transparent client /vendor relationships, with high client/vendor retention. Even in the case where the text has " It worked fine. Commented Nov 1, 2018 at 6:28
  • That's not the full text of element, which would be something like ("...", {"entities": [(...,"SKILL")]}) were the ...'s are "some data". (I was wrong on the " bit - that would be a different error if manifested ^_^.) Commented Nov 1, 2018 at 6:29

2 Answers 2

2

You cannot split a long line into multiple lines hitting enter. Either change your element= line to a single line like this

element = '("' +train_data.text[i] + '"' + ', {"entities": [(' + str(a.start()) + ',' + str(a.end()) + ',"SKILL")]})'

or add a \ at the end of the line

element = '("' +train_data.text[i] + '"' + ', {"entities": [(' + \
        str(a.start()) + ',' + str(a.end()) + ',"SKILL")]})'
Sign up to request clarification or add additional context in comments.

4 Comments

The code I am writing in my notebook is actually in on one line.
Are you copying your code somewhere from the notebook to your local IDLE?
No, I am running in the notebook itself.
I am having no problem by pasting that line into my jupyter notebook. Double check the line that you have in your notebook.
0

One (or more) of the element strings supplied to literal_eval cannot be parsed by literal_eval.. That is, the program syntax is valid (or else the program would fail without running anything!), and it is one or more of the element values supplied to literal_eval is not valid Python!

The first step is to identify some 'invalid' values, eg.

from ast import literal_eval
import re

train_data_list = []

for i in range(len(train_data)):
    a = re.search(train_data.subtext[i], train_data.text[i])
    if a is not None:
        element = '("' +train_data.text[i] + '"' + ', {"entities": [(' + str(a.start()) + ',' + str(a.end()) + ',"SKILL")]})'
        try:
            data = literal_eval(element)
            train_data_list.append(data)
        except:
            print("Failed to parse element as a Python literal!")
            print(">>")
            print(repr(element))
            print("<<")

If the above "runs" (fsvo. "runs") then the proposed hypothesis holds the non-relevant answers can be ignored ;-)

Anyway, the solution is to not use literal_eval at all. Instead, create an object directly:

for i in range(len(train_data)):
    a = re.search(train_data.subtext[i], train_data.text[i])
    if a is not None:
        # might be a bit off.. YMMV.
        data = (train_data.text[i],
                {"entities": [(str(a.start()), str(a.end()), "SKILL")]})
        train_data_list.append(data)

Now, if values of train_data.text[i] contain a \n - that is, the literal two-character '\' and 'n' escape sequence - there may be additional work required to turn those into newline characters .. but one step at a time. And no step should be backward! :D

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.