0

I want to assign a part of a dataframe object I created that contain values that I need to a new object. However, I'm getting an error when I try to run my python file, which says that the string object has no attribute for the object that I created containing the values. Not sure what is wrong.

AttributeError: 'str' object has no attribute 'vowel_map'

The training.txt file is:

COED:K OW1 EH2 D
PURVIEW:P ER1 V Y UW2
HEHIR:HH EH1 HH IH0 R
MUSCLING:M AH1 S AH0 L IH0 NG
NONPOISONOUS:N AA0 N P OY1 Z AH0 N AH0 S
LAVECCHIA:L AA0 V EH1 K IY0 AH0
BUCKLED:B AH1 K AH0 L D
EATEN:IY1 T AH0 N
SCIMED:S AY1 M EH2 D
MORTIS:M AO1 R T IH0 S
CONSERVATOR:K AH0 N S ER1 V AH0 T ER0

The python file I'm running is:

import pandas as pd
import string

vowels = ('AA','AE','AH','AO','AW','AY','EH','ER','EY','IH','IY','OW','OY','UH','UW')

def remove_stress(string):
    if type(string) in [list, tuple]:
        string = ' '.join(string)
    return ''.join([i for i in string if not i.isdigit()]).split()

def phoneme_map(phon_list, phoneme_list):
    return [1 if phoneme in phoneme_list else 0 for phoneme in phon_list]

def get_words(file_path):
    words = pd.read_csv(file_path, sep=':', names = ['word', 'string_of_phon'])
    words['phon_list'] = words.string_of_phon.apply(str.split)
    words['stressless_phon_list'] = words.string_of_phon.apply(remove_stress)
    words['vowel_map'] = words.stressless_phon_list.apply(phoneme_map, args = (vowels,))

    return words

if __name__ == '__main__':
    data_loc = 'training.txt'
    words = get_words(data_loc)

    word_vowels = [word.vowel_map for word in words]
4
  • What are you trying to achieve? It's not quite clear... Could you post your desired data set? Commented May 13, 2017 at 9:55
  • I just want to be able to assign the vowel_map data to word_vowels, then do this later: df = pd.DataFrame(word_vowels, columns= vowel_map) Commented May 13, 2017 at 9:58
  • do you mean: word_vowels = words.vowel_map? It's hard to understand not being able to see your desired data set... Commented May 13, 2017 at 10:01
  • The problem is in the list comprehension in your last line. words is a DataFrame object, whose __iter__ yields the column names (strings). I'm guessing you actually want to iterate through the rows, selecting only the 'word_vowel' values. If that's what you want, then @MaxU suggestion is the way to go. If in the future you would like to iterate through the rows of a dataframe, you could use df.iterrows(). Commented May 13, 2017 at 12:59

2 Answers 2

2

If you want to one-hot-encode vowels:

from sklearn.feature_extraction.text import CountVectorizer

vowels = ['AA','AE','AH','AO','AW','AY','EH','ER','EY','IH','IY','OW','OY','UH','UW']

df = pd.read_csv(file_path, sep=':', names = ['word', 'string_of_phon'])
vect = CountVectorizer(vocabulary=[v.lower() for v in vowels])    
X = vect.fit_transform(df['string_of_phon'].str.replace(r'\d+', ''))    
r = pd.DataFrame(X.A, columns=vect.get_feature_names(), index=df.index)

yields

In [138]: r
Out[138]:
    ao  er  uw  eh  oy  ey  ow  ih  uh  ah  ay  iy  ae  aw  aa
0    0   0   0   1   0   0   1   0   0   0   0   0   0   0   0
1    0   1   1   0   0   0   0   0   0   0   0   0   0   0   0
2    0   0   0   1   0   0   0   1   0   0   0   0   0   0   0
3    0   0   0   0   0   0   0   1   0   2   0   0   0   0   0
4    0   0   0   0   1   0   0   0   0   2   0   0   0   0   1
5    0   0   0   1   0   0   0   0   0   1   0   1   0   0   1
6    0   0   0   0   0   0   0   0   0   2   0   0   0   0   0
7    0   0   0   0   0   0   0   0   0   1   0   1   0   0   0
8    0   0   0   1   0   0   0   0   0   0   1   0   0   0   0
9    1   0   0   0   0   0   0   1   0   0   0   0   0   0   0
10   0   2   0   0   0   0   0   0   0   2   0   0   0   0   0

you can join it with the original DF:

In [139]: df.join(r)
Out[139]:
            word               string_of_phon  ao  er  uw  eh  oy  ey  ow  ih  uh  ah  ay  iy  ae  aw  aa
0           COED                  K OW1 EH2 D   0   0   0   1   0   0   1   0   0   0   0   0   0   0   0
1        PURVIEW                P ER1 V Y UW2   0   1   1   0   0   0   0   0   0   0   0   0   0   0   0
2          HEHIR              HH EH1 HH IH0 R   0   0   0   1   0   0   0   1   0   0   0   0   0   0   0
3       MUSCLING         M AH1 S AH0 L IH0 NG   0   0   0   0   0   0   0   1   0   2   0   0   0   0   0
4   NONPOISONOUS  N AA0 N P OY1 Z AH0 N AH0 S   0   0   0   0   1   0   0   0   0   2   0   0   0   0   1
5      LAVECCHIA        L AA0 V EH1 K IY0 AH0   0   0   0   1   0   0   0   0   0   1   0   1   0   0   1
6        BUCKLED              B AH1 K AH0 L D   0   0   0   0   0   0   0   0   0   2   0   0   0   0   0
7          EATEN                  IY1 T AH0 N   0   0   0   0   0   0   0   0   0   1   0   1   0   0   0
8         SCIMED                S AY1 M EH2 D   0   0   0   1   0   0   0   0   0   0   1   0   0   0   0
9         MORTIS              M AO1 R T IH0 S   1   0   0   0   0   0   0   1   0   0   0   0   0   0   0
10   CONSERVATOR    K AH0 N S ER1 V AH0 T ER0   0   2   0   0   0   0   0   0   0   2   0   0   0   0   0
Sign up to request clarification or add additional context in comments.

Comments

1

IIUC:

In [85]: vowels = set(vowels)

In [86]: words['vowel_map'] =  \
            words['string_of_phon'].str.replace(r'\d+', '').str.split() \
                 .apply(lambda x: [int(i in vowels) for i in x])

In [87]: words
Out[87]:
            word               string_of_phon                       vowel_map
0           COED                  K OW1 EH2 D                    [0, 1, 1, 0]
1        PURVIEW                P ER1 V Y UW2                 [0, 1, 0, 0, 1]
2          HEHIR              HH EH1 HH IH0 R                 [0, 1, 0, 1, 0]
3       MUSCLING         M AH1 S AH0 L IH0 NG           [0, 1, 0, 1, 0, 1, 0]
4   NONPOISONOUS  N AA0 N P OY1 Z AH0 N AH0 S  [0, 1, 0, 0, 1, 0, 1, 0, 1, 0]
5      LAVECCHIA        L AA0 V EH1 K IY0 AH0           [0, 1, 0, 1, 0, 1, 1]
6        BUCKLED              B AH1 K AH0 L D              [0, 1, 0, 1, 0, 0]
7          EATEN                  IY1 T AH0 N                    [1, 0, 1, 0]
8         SCIMED                S AY1 M EH2 D                 [0, 1, 0, 1, 0]
9         MORTIS              M AO1 R T IH0 S              [0, 1, 0, 0, 1, 0]
10   CONSERVATOR    K AH0 N S ER1 V AH0 T ER0     [0, 1, 0, 0, 1, 0, 1, 0, 1]

now you can assign calculated column to another object:

In [88]: word_vowels = words.vowel_map

In [89]: word_vowels
Out[89]:
0                       [0, 1, 1, 0]
1                    [0, 1, 0, 0, 1]
2                    [0, 1, 0, 1, 0]
3              [0, 1, 0, 1, 0, 1, 0]
4     [0, 1, 0, 0, 1, 0, 1, 0, 1, 0]
5              [0, 1, 0, 1, 0, 1, 1]
6                 [0, 1, 0, 1, 0, 0]
7                       [1, 0, 1, 0]
8                    [0, 1, 0, 1, 0]
9                 [0, 1, 0, 0, 1, 0]
10       [0, 1, 0, 0, 1, 0, 1, 0, 1]
Name: vowel_map, dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.