7

I have a dataframe where one of the columns which is in string format looks like this

    filename
 0  Machine02-2022-01-28_00-21-45.blf.424
 1  Machine02-2022-01-28_00-21-45.blf.425
 2  Machine02-2022-01-28_00-21-45.blf.426
 3  Machine02-2022-01-28_00-21-45.blf.427
 4  Machine02-2022-01-28_00-21-45.blf.428

I want my column to look like this

      filename
 0    2022-01-28 00-21-45 424
 1    2022-01-28 00-21-45 425
 2    2022-01-28 00-21-45 426
 3    2022-01-28 00-21-45 427
 4    2022-01-28 00-21-45 428

I tried this code

df['filename'] = df['filename'].str.extract(r"(\d{4}-\d{1,2}-\d{1,2})_(\d{2}-\d{2}-\d{2}).*\.(\d+)", r"\1 \2 \3")

I am getting this error, unsupported operand type(s) for &: 'str' and 'int'.
Can anyone please tell me where I am doing wrong ?

1
  • Not sure why you're getting this error but here is what it means: & is the so-called "bit-wise and operator" which applies "AND" bit by bit (thus the name). Python converts ints to binary on the fly for thisoperator, but for strings this is not possible. You get a similar error when you try + a int and string. Commented Mar 23, 2022 at 8:01

4 Answers 4

5

Use str.replace and add .*- to remove strings like Machine02-:

df['filename'] = df['filename'].str.replace(r".*-(\d{4}-\d{1,2}-\d{1,2})_(\d{2}-\d{2}-\d{2}).*\.(\d+)", r"\1 \2 \3")
print(df)

# Output
                  filename
0  2022-01-28 00-21-45 424
1  2022-01-28 00-21-45 425
2  2022-01-28 00-21-45 426
3  2022-01-28 00-21-45 427
4  2022-01-28 00-21-45 428
Sign up to request clarification or add additional context in comments.

7 Comments

I tried this, it didn't worked
I don't understand, it seems to work with your example, no?
Its not working with my example
@Vinay_S this does exactly what you want, as advertised, I have tried it out. What error are you facing?
@Corralien Thanks for this input. I had leading space, that's why it was not working. Now its working. Thanks once again
|
4

please try this:

df['filename'] = df['filename'].str.split('-',1).apply(lambda x:' '.join(x[1].split('_')).replace('.blf.',' '))

Comments

4

Use replace

df['filename']=df['filename'].str.replace('Machine|\.blf\.',' ',regex=True).str.strip().str.replace('^\d+\-','',regex=True)



 filename
0  2022-01-28_00-21-45 424
1  2022-01-28_00-21-45 425
2  2022-01-28_00-21-45 426
3  2022-01-28_00-21-45 427
4  2022-01-28_00-21-45 428

or

Extract values between e02 and .blf

df['filename']=df['filename'].str.extract('((?<=[e02])[\w|\-]+(?=[.blf]))')



    filename
0  02-2022-01-28_00-21-45
1  02-2022-01-28_00-21-45
2  02-2022-01-28_00-21-45
3  02-2022-01-28_00-21-45
4  02-2022-01-28_00-21-45

1 Comment

I think you missed the suffix (424, 425, 426, ...)
1

Regex are nice, but sometimes is easier and more readable to make a replace, if the arguments won't ever change:

df['filename'] = df['filename'].str.replace('Machine02-','',regex=False)
df['filename'] = df['filename'].str.replace('.blf.',' ',regex=False)

1 Comment

Thanks for this answer. I am currently using it and it will work only if string has 'Machine02-' . Sometimes I may have random names like M2, Mach2, etc. I have a huge data and I can't keep looking like that. So I am trying another method which I explained in question

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.