0

I have written one python code in which final output is coming with '_' symbol. So i want to remove this symbol.

import re
from itertools import groupby

file = ["meta_data_02154.csv", "meta_data_021694.csv", "meta_data_loop_02365.csv", "meta_data_loops_0256365.csv", "output.csv"]

f = [list(i) for j, i in groupby(file, lambda a : re.split(r'\d*.csv$', a)[0])]
print(f)

for pattern in f:
        #print(pattern)
        print((re.split(r'\d*.csv$', pattern[0]))[0])

Output:

[['meta_data_02154.csv', 'meta_data_021694.csv'], ['meta_data_loop_02365.csv'], ['meta_data_loops_0256365.csv'], ['output.csv']]
meta_data_
meta_data_loop_
meta_data_loops_
output

Desire Output:

[['meta_data_02154.csv', 'meta_data_021694.csv'], ['meta_data_loop_02365.csv'], ['meta_data_loops_0256365.csv'], ['output.csv']]
meta_data
meta_data_loop
meta_data_loops
output
4
  • 1
    use rstrip('_') Commented Sep 13, 2019 at 9:40
  • print((re.split(r'_?\d*.csv$', pattern[0]))[0])? Commented Sep 13, 2019 at 9:43
  • @shivam patel, both your desired and gotten output are the same. Commented Sep 13, 2019 at 9:49
  • print((re.split(r'\d*.csv$', pattern[0]))[0].rstrip('_')) Commented Sep 13, 2019 at 9:52

5 Answers 5

2

use rstrip()

import re
from itertools import groupby

file = ["meta_data_02154.csv", "meta_data_021694.csv", "meta_data_loop_02365.csv", "meta_data_loops_0256365.csv", "output.csv"]

f = [list(i) for j, i in groupby(file, lambda a : re.split(r'\d*.csv$', a)[0])]
print(f)

for pattern in f:
        #print(pattern)
        print((re.split(r'\d*.csv$', pattern[0]))[0].rstrip('_'))
Sign up to request clarification or add additional context in comments.

Comments

1

Use rstrip()

val = "sad_"
print(val.rstrip('_'))
Output: sad

Description

rstip() Returns a copy of the string with right trailing characters removed.

Alternatively print(val[:-1]) will give same result in this case.

Comments

0

Try pattern r'_?\d*.csv$'.

Ex:

import re
from itertools import groupby

file = ["meta_data_02154.csv", "meta_data_021694.csv", "meta_data_loop_02365.csv", "meta_data_loops_0256365.csv", "output.csv"]

f = [list(i) for j, i in groupby(file, lambda a : re.split(r'\d*.csv$', a)[0])]
print(f)

for pattern in f:
    #print(pattern)
    print((re.split(r'_?\d*.csv$', pattern[0]))[0])

Comments

0

You may use print((re.split(r'\d*.csv$', pattern[0]))[0].rstrip('_') but you might as well use a better regex and .search instead of split.

I'm not sure what you used groupby for.

import re

file = ["meta_data_02154.csv", "meta_data_021694.csv", "meta_data_loop_02365.csv", "meta_data_loops_0256365.csv", "output.csv"]

for pattern in file:
    print(re.search(r'(.+)\d*.csv$', pattern).group(1))

Outputs

meta_data_02154
meta_data_021694
meta_data_loop_02365
meta_data_loops_0256365
output

Comments

0

You could use a one-liner, just splitting the filenames:

file = ["meta_data_02154.csv", "meta_data_021694.csv", "meta_data_loop_02365.csv", "meta_data_loops_0256365.csv", "output.csv"]
filePatterns = set([f.rsplit('_', 1)[0].rsplit('.csv')[0] for f in file])
print(filePatterns)

Prints:

{'meta_data_loops', 'meta_data', 'meta_data_loop', 'output'}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.