0

So I am new to Python and I want to do the following.

I have a file with a bunch of sentences that looks like this:

- [frank bora three](noun) [go](action) level [three hundred sixty](value)
- [jack blad four](noun) [stay](action) level [two hundred eleven](value)

I want to be able to reproduce a file that looks like this:

text:'frank bora three', entityType:'noun'
text:'jack blad four', entityType:'noun'   
text:'go', entityType:'action'    
text:'stay', entityType:'action'
text:'three hundred sixty', entityType:'value'
text:'two hundred eleven', entityType:'value'

What I need is to delete the first hymph, identify every text in between the two square brackets as a text, and then for their entityType it will be what we have in between round brackets that follows the text between the squarebrackets. ther thing is that we can have some words that are not between brackets and that should be ignored.

Approach: The first thing I tried is to do is put all the sentences in an array:

import re
with open('new_file.txt') as f1:
    lines = f1.readlines()
array_length = len(lines)
for i in range(array_length):
    lines[i]=re.sub(r"\b/-\w+", "", lines[i])
print (lines[0])

After that I tried to remove the hymph using re but it's not working for me, the hymphs were still there when I tried to print the array.

I hope my question is clear.

Thank you in advance,

3
  • Please post the re code that you tried, and in what way it did not work. This is the real crux of your question. Commented Mar 9, 2020 at 14:38
  • 1
    Add this important info by editing your question - people don't always scan comments for additional info. Commented Mar 9, 2020 at 14:47
  • 1
    Okay done, thank you. Commented Mar 9, 2020 at 14:57

2 Answers 2

1

It's often easier, when parsing a complex string like this, to have a two-stage approach. If we first split each string:

temp = foo.split(')')[0:3]

gives for the first string, a list of strings:

temp = ['[frank bora three](noun', ' [go](action', ' level [three hundred sixty](value']

Now we can write simpler regexes to pull out the desired text from each substring:

re_text = re.compile(r'\[.+\]')
re_entity = re.compile(r'\(.+')
mytext = []
myentitites = []
for target in temp:
     mytext.append(re.search(re_text, target).group().strip('[]'))
     myentities.append(re.search(re_entity, target).group().strip('()'))

So now you have two lists:

mynouns = ['frank bora three', 'go', 'three hundred sixty']
myentities = ['noun', 'action', 'value']

Zip them together and make a new list of tuple pairs:

result = list(zip(mynouns, myentities)) #fix

which looks like this:

[('frank bora three', 'noun'),
 ('go', 'action'),
 ('three hundred sixty', 'value')]

And now you can feed these into a string. (To group this collection of strings for your desired output, you can make a list of strings and then sort it by the last word before outputting to a file)

Sign up to request clarification or add additional context in comments.

1 Comment

Just noticed I had a typo in that list(zip) statement, now fixed
1

You don't really need a regex:

Just string split between the brackets :)

s = "- [frank bora three]asdasd(noun) [go](action) level [three hundred sixty](value)"

print(s[s.find("[")+1:s.find("]")]) #text inside []
print(s[s.find("(")+1:s.find(")")]) #noun inside ()

Now you need to reed in your file splitlines and loop over:

stringfile = """- [frank bora three](noun) [go](action) level [three hundred sixty](value)
- [jack blad four](noun) [stay](action) level [two hundred eleven](value)"""


for s in stringfile.splitlines():
    text = s[s.find("[")+1:s.find("]")]
    noun = s[s.find("(")+1:s.find(")")]

    print(text)
    print(noun)

1 Comment

Thank you for your answer, I have accepted the other one because in my example there are entities other than noun (action, value..), but it's still answers to the problem .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.