1

I would like to extract the first url from the array listed below. I want to use Python3 with regular expressions, but I can't match the string.

This is what I tried

import pandas as pd
import re


reg = "\['\S*"


myDataFrame = pd.read_csv('Refactored_Test_1.csv')

imageColumn = myDataFrame.loc[:,"image"]
print(imageColumn)

for element  in imageColumn: 
    print(element)

['https://ui.assets-asda.com/dm/asdagroceries/8000500217078_T1?defaultImage=asdagroceries/noImage&resMode=sharp2&id=nHnSx1&fmt=jpg&fit=constrain,1&wid=188&hei=188', 'https://ui.assets-asda.com/dm/asdagroceries/8000500217078_T2?defaultImage=asdagroceries/noImage&resMode=sharp2&id=PS8Sl2&fmt=jpg&fit=constrain,1&wid=188&hei=188']

enter image description here

2

1 Answer 1

1

You could use a capturing group and repeat the non whitespace char 1+ more times and match the ' that comes after it.

\['(\S+)'

Regex demo

If you want a match only, you could use lookarounds:

(?<=\[')\S+(?=')

Regex demo

Sign up to request clarification or add additional context in comments.

4 Comments

And to extract it, it should be? imageColumn = imageColumn.str.extract(r'\['(\S+)')
Like that I think you have to escape the quote r'\[\'(\S+)'
I get "unexpected character after line error" imageColumn = imageColumn.str.extract(r'(?<=\[')\S+(?=')')
Try escaping both single quotes imageColumn = imageColumn.str.extract(r'(?<=\[\')\S+(?=\')')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.