I have a pandas dataframe containing very long strings in the 'page' column that I am trying to extract a substring from:
Example string: /ex/search/!tu/p/z1/zVJdb4IwFP0r88HH0Sp-hK/dz/d5/L2dBISEvZ0FBIS9nQSEh/?s&search_query=example one&y=0&x=0
Using regex, I am having a hard time determining how to extract the string between the two ampersands and removing all other characters part of the greater string.
So far, my code looks like this:
import pandas as pd
import re
dataset = pd.read_excel(r'C:\Users\example.xlsx')
dataframe = pd.DataFrame(dataset)
dataframe['Page'] = format = re.search(r'&(.*)&',str(dataframe['Page']))
dataframe.to_excel(r'C\Users\output.xlsx)
The code above runs but does not output anything to my new spreadsheet.
Thank you in advance.
df.head()into a code block in your questionsdataframe.Page.str.split("&").str[1]?dataframe['Page'].str.extract(r'&([^&]+)&')will do.