So I am new to Python and I want to do the following.
I have a file with a bunch of sentences that looks like this:
- [frank bora three](noun) [go](action) level [three hundred sixty](value)
- [jack blad four](noun) [stay](action) level [two hundred eleven](value)
I want to be able to reproduce a file that looks like this:
text:'frank bora three', entityType:'noun'
text:'jack blad four', entityType:'noun'
text:'go', entityType:'action'
text:'stay', entityType:'action'
text:'three hundred sixty', entityType:'value'
text:'two hundred eleven', entityType:'value'
What I need is to delete the first hymph, identify every text in between the two square brackets as a text, and then for their entityType it will be what we have in between round brackets that follows the text between the squarebrackets. ther thing is that we can have some words that are not between brackets and that should be ignored.
Approach: The first thing I tried is to do is put all the sentences in an array:
import re
with open('new_file.txt') as f1:
lines = f1.readlines()
array_length = len(lines)
for i in range(array_length):
lines[i]=re.sub(r"\b/-\w+", "", lines[i])
print (lines[0])
After that I tried to remove the hymph using re but it's not working for me, the hymphs were still there when I tried to print the array.
I hope my question is clear.
Thank you in advance,