0

I have this type of string

"Cat/Wheat , Com, Ogl/oyher Face Express/Star,"

I want to get like this

["Cat,Wheat,Com,Ogl,oyher,Face,Express,Star"]

Basically split at "," and "/"

I tried using split function but for that I had to use double forloop not so efficient

I did some research and came across regex

re.split('\W+',string , 1)

but this is not working ,what should I add to filter

3
  • I think your expected output is incorrect. Commented Aug 1, 2019 at 18:20
  • Are you really trying to get a single string in a list like "Cat,Wheat,Com,Ogl,oyher,Face,Express,Star" Or do you want a list of individual items like ['Cat', 'Wheat'...] Commented Aug 1, 2019 at 18:32
  • Why do you limit it to one split? Use re.split('\W+', data) and you'll get ['Cat', 'Wheat', 'Com', 'Ogl', 'oyher', 'Face', 'Express', 'Star', '']. This can be transformed to the expected result quite easy (if you really want the result you showed in the question). Commented Aug 1, 2019 at 18:33

3 Answers 3

3

It's not clear why you are adding the maxsplit argument of 1 to your split() – that prevents it from splitting everything you want.

Without it you get:

> import re
> s = "Cat/Wheat , Com, Ogl/oyher Face Express/Star,"
> re.split(r'\W+', s)
['Cat', 'Wheat', 'Com', 'Ogl', 'oyher', 'Face', 'Express', 'Star', '']

That's pretty close expect for the soul-crushing empty at the end. You can filter that out, but you might be happier with re.findall() to match what you want rather than splitting what you don't:

> import re
> s = "Cat/Wheat , Com, Ogl/oyher Face Express/Star,"
> re.findall(r'\w+', s)
['Cat', 'Wheat', 'Com', 'Ogl', 'oyher', 'Face', 'Express', 'Star']

To get a single comma-separated string (if that's what you want) you can join:

> import re
> s = "Cat/Wheat , Com, Ogl/oyher Face Express/Star,"
> ",".join(re.findall(r'\w+', s))
'Cat,Wheat,Com,Ogl,oyher,Face,Express,Star'
Sign up to request clarification or add additional context in comments.

Comments

1
>> import re

>> data = "Cat/Wheat , Com, Ogl/oyher Face Express/Star,"

>> words = re.findall(r"[\w']+", data)

>> print(words)
['Cat', 'Wheat', 'Com', 'Ogl', 'oyher', 'Face', 'Express', 'Star']

Comments

1

If you are after timing, you are probably better of with a series of Python string manipulations:

def multisplit(s, splits=('/', ','), base_split=' '):
    for split in splits:
        s = s.replace(split, base_split)
    return s.split() if not base_split.split() else list(filter(bool, s.split(base_split))

or, even faster (for slightly larger inputs):

def multisplit2(s, splits=('/', ','), base_split=' '):
    s = functools.reduce(lambda t, r: t.replace(s, base_split), splits, s)
    return s.split() if not base_split.split() else list(filter(bool, s.split(base_split))

A quick comparison with re-based solutions indicate a 5x to 10x speed up for the proposed approach:

import re


def re_findall(s):
    return re.findall(r"[\w']+", s)

def re_split(s):
    return list(filter(bool, re.split('\W+', s)))


s = "Cat/Wheat , Com, Ogl/oyher Face Express/Star,"
print(re_findall(s))
print(re_split(s))
print(multisplit(s))
# ['Cat', 'Wheat', 'Com', 'Ogl', 'oyher', 'Face', 'Express', 'Star']

%timeit re_findall(s)
# 2.54 µs ± 9.14 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit re_split(s)
# 3.05 µs ± 6.54 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit multisplit(s)
# 631 ns ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit multisplit2(s)
# 908 ns ± 12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit re_findall(s * 1000)
# 1.55 ms ± 5.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit re_split(s * 1000)
# 1.96 ms ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit multisplit(s * 1000)
# 222 µs ± 7.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit multisplit2(s * 1000)
# 149 µs ± 1.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.