Try filtering out non-alphabetical characters with replace then apply + pd.unqiue:
import pandas as pd
df = pd.DataFrame({
'id': [1, 2, 3],
'q': ['hello?', 'helloWorld', 'hi']
})
df['k'] = df['q'].replace(r'\W', '', regex=True) \
.apply(lambda x: pd.unique(list(x.lower())))
print(df)
df:
id q k
0 1 hello? [h, e, l, o]
1 2 helloWorld [h, e, l, o, w, r, d]
2 3 hi [h, i]
Or if order doesn't matter set is an option:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'id': [1, 2, 3],
'q': ['hello?', 'helloWorld', 'hi']
})
df['k'] = df['q'].replace(r'\W', '', regex=True) \
.apply(lambda x: np.array([*set(x.lower())]))
print(df)
df:
id q k
0 1 hello? [h, o, l, e]
1 2 helloWorld [o, e, r, d, h, l, w]
2 3 hi [i, h]
kValue = [h, e, l, o], I want to insert something likeexisting_dataframe['k'][0] = kValue