Find string between two substrings [duplicate]

Question

How do I find a string between two substrings ('123STRINGabc' -> 'STRING')?

My current method is like this:

>>> start = 'asdf=5;'
>>> end = '123jasd'
>>> s = 'asdf=5;iwantthis123jasd'
>>> print((s.split(start))[1].split(end)[0])
iwantthis

However, this seems very inefficient and un-pythonic. What is a better way to do something like this?

Forgot to mention: The string might not start and end with start and end. They may have more characters before and after.

Your additional information makes it almost necessary to use regexes for maximum correctness. — Jesse Dhillon
– Jesse Dhillon, Commented Jul 30, 2010 at 6:39
What's wrong with your own solution? I actually prefer it to the one you accepted. — reubano
– reubano, Commented Nov 10, 2014 at 12:06
I was trying to do this as well but for multiple instances it looks like using *? to do a non greedy search and then just cutting off the string with s[s.find(end)] worked for tracking multiple instances — lathomas64
– lathomas64, Commented Jan 9, 2019 at 23:07
@reubano: one feature/bug of this code is that it does not raise an exception when the end text does not occur in the original text. The accepted answer fixes this. — Kasper Dokter
– Kasper Dokter, Commented Jan 19, 2022 at 14:50
just a note: s[1:-1] will also do what you had.. though i like .group(1) or (.*?) non-greedy from below better — alchemy
– alchemy, Commented Oct 30, 2022 at 23:04

Demodave · Accepted Answer · 2024-04-17 21:41:03Z

525

import re

s = 'asdf=5;iwantthis123jasd'
result = re.search('asdf=5;(.*)123jasd', s)
print(result.group(1))

# returns 'iwantthis'

edited Apr 17, 2024 at 21:41

Demodave

6,7007 gold badges47 silver badges63 bronze badges

answered Jul 30, 2010 at 5:59

Nikolaus Gradwohl

20.1k3 gold badges48 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

15 Comments

jdd Over a year ago

@Jesse Dhillon -- what about @Tim McNamara's suggestion of something like ''.join(start,test,end) in a_string?

user5713018 Over a year ago

What if I need to find between 2 substrings and the second one is repeated after first one? Something like this: s= 'asdf=5;I_WANT_ONLY_THIS123jasdNOT_THIS123jasd

do-ic Over a year ago

Add ? to make it non greedy result = re.search('asdf=5;(.*?)123jasd', s)

Sql_Pete_Belfast Over a year ago

How can this be amended to select data between start/end if the start/end is duplicated? e.g. say i wanted to select both strings separately between <> i would like to send <message> to <name> and return result1='message' and result2 = 'name'

Lenka Pitonakova Over a year ago

This however extracts the string between the first and the LAST occurrence of the 2nd string, which may be incorrect, especially when parsing HTML. Unfortunately, this question appears closed so I cannot post my answer.

|

cji · Accepted Answer · 2010-07-30 06:27:01Z

189

s = "123123STRINGabcabc"

def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
    except ValueError:
        return ""

def find_between_r( s, first, last ):
    try:
        start = s.rindex( first ) + len( first )
        end = s.rindex( last, start )
        return s[start:end]
    except ValueError:
        return ""


print find_between( s, "123", "abc" )
print find_between_r( s, "123", "abc" )

gives:

123STRING
STRINGabc

I thought it should be noted - depending on what behavior you need, you can mix index and rindex calls or go with one of the above versions (it's equivalent of regex (.*) and (.*?) groups).

edited Jul 30, 2010 at 6:27

answered Jul 30, 2010 at 5:58

cji

6,7332 gold badges23 silver badges16 bronze badges

7 Comments

Jesse Dhillon Over a year ago

He said that he wanted a way that was more Pythonic, and this is decidedly less so. I'm not sure why this answer was picked, even OP's own solution is better.

jdd Over a year ago

Agreed. I'd use the solution by @Tim McNamara , or the suggestion by the same of something like start+test+end in substring

cji Over a year ago

Right, so it's less pythonic, ok. Is it less efficient than regexps too? And there's also @Prabhu answer you need to downvote, as it suggest the same solution.

Ida Over a year ago

+1 too, for a more generic and reusable (by import) solution.

reubano Over a year ago

+1 since it works better than the other solutions in the case where end is found more than once. But I do agree that the OP's solution is more simpler.

|

David Arenburg · Accepted Answer · 2017-01-03 12:29:39Z

147

start = 'asdf=5;'
end = '123jasd'
s = 'asdf=5;iwantthis123jasd'
print s[s.find(start)+len(start):s.rfind(end)]

gives

iwantthis

edited Jan 3, 2017 at 12:29

David Arenburg

92.4k18 gold badges145 silver badges202 bronze badges

answered Sep 13, 2013 at 15:54

ansetou

1,6111 gold badge10 silver badges5 bronze badges

4 Comments

Kenny Powers Over a year ago

I upvoted this because it works regardless of input string size. Some of the other methods assumed you'd know the length ahead of time.

Kevin Crum Over a year ago

yes it works by without input size however it does assume the string exists

Lenka Pitonakova Over a year ago

This however extracts the string between the first and the LAST occurrence of the 2nd string, which may be incorrect, especially when parsing HTML. Unfortunately, this question appears closed so I cannot post my answer.

Paul Sumpner Over a year ago

That's python! No need for regular expressions :)

Tim McNamara · Accepted Answer · 2010-07-30 05:56:47Z

63

s[len(start):-len(end)]

answered Jul 30, 2010 at 5:56

Tim McNamara

18.5k5 gold badges54 silver badges84 bronze badges

2 Comments

jdd Over a year ago

This is very nice, assuming start and end are always at the start and end of the string. Otherwise, I would probably use a regex.

Tim McNamara Over a year ago

I went the most Pythonic answer to the original question I could think of. Testing using the in operator would probably be faster than regexp.

3 revs, 2 users 88% · Accepted Answer · 2023-07-12 14:07:46Z

49

Just converting the OP's own solution into an answer:

def find_between(s, start, end):
    return s.split(start)[1].split(end)[0]

edited Jul 12, 2023 at 14:07

community wiki

3 revs, 2 users 88%
reubano

Comments

Ooker · Accepted Answer · 2015-08-27 14:31:52Z

39

String formatting adds some flexibility to what Nikolaus Gradwohl suggested. start and end can now be amended as desired.

import re

s = 'asdf=5;iwantthis123jasd'
start = 'asdf=5;'
end = '123jasd'

result = re.search('%s(.*)%s' % (start, end), s).group(1)
print(result)

edited Aug 27, 2015 at 14:31

Ooker

3,4045 gold badges42 silver badges91 bronze badges

answered Jul 30, 2010 at 7:47

Tim McNamara

18.5k5 gold badges54 silver badges84 bronze badges

4 Comments

Dentrax Over a year ago

I'm getting this: 'NoneType' object has no attribute 'group'

Tim McNamara Over a year ago

That means a match wasn't found. Check your regular expression.

cwhisperer Over a year ago

@Dentrax is right: should return nothing not an error

MTay Over a year ago

I think Tim means that the search should return None as there were no matches. Since the search returned 'None', applying of .group(1) at the end causes the error.

Fernando Wittmann · Accepted Answer · 2020-09-17 21:56:08Z

36

If you don't want to import anything, try the string method .index():

text = 'I want to find a string between two substrings'
left = 'find a '
right = 'between two'

# Output: 'string'
print(text[text.index(left)+len(left):text.index(right)])

edited Sep 17, 2020 at 21:56

answered Jul 21, 2018 at 13:32

Fernando Wittmann

2,62726 silver badges20 bronze badges

5 Comments

pbount Over a year ago

I am loving it. simple, single-line, clear enough, no additional imports and works out of the box. I have no idea what is the deal with the over-engineered answers above.

AndreFeijo Over a year ago

This is not checking whether the "right" text is actually at the right side of the text. If there are any occurrences of "right" before the text it won't work.

Fernando Wittmann Over a year ago

@AndreFeijo I agree with you, this was my first solution when trying to extract texts and I wanted to avoid regex weird syntax. However, in situations as you mentioned, I would use regex instead.

ericksho Over a year ago

in that case (not all of cases) you could find left and then right, although it's a two line code text = text[text.index(left)+len(left):len(role)] text = text[0:text.index(right)]

Arun Mohan Over a year ago

Hi Fernando, for this text "ADRIANOPICCININIC216186162022-07-27 09:36:33Z" i am looking to extract only "C21618616", how can i do that?

tstoev · Accepted Answer · 2013-09-24 11:23:28Z

16

source='your token _here0@df and maybe _here1@df or maybe _here2@df'
start_sep='_'
end_sep='@df'
result=[]
tmp=source.split(start_sep)
for par in tmp:
  if end_sep in par:
    result.append(par.split(end_sep)[0])

print result

must show: here0, here1, here2

the regex is better but it will require additional lib an you may want to go for python only

answered Sep 24, 2013 at 11:23

tstoev

1,43511 silver badges12 bronze badges

2 Comments

Sterex Over a year ago

This worked for me. Thank you for extending the solution for multiple occurrences.

ohsoifelse Over a year ago

I was exactly looking for this, It helps for multiple occurrences, This post needs more upvotes :p.

John La Rooy · Accepted Answer · 2010-07-30 06:03:29Z

15

Here is one way to do it

_,_,rest = s.partition(start)
result,_,_ = rest.partition(end)
print result

Another way using regexp

import re
print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]

or

print re.search(re.escape(start)+"(.*)"+re.escape(end),s).group(1)

edited Jul 30, 2010 at 6:03

answered Jul 30, 2010 at 5:58

John La Rooy

306k54 gold badges378 silver badges514 bronze badges

Comments

Mnyikka · Accepted Answer · 2018-01-19 08:37:03Z

6

Here is a function I did to return a list with a string(s) inbetween string1 and string2 searched.

def GetListOfSubstrings(stringSubject,string1,string2):
    MyList = []
    intstart=0
    strlength=len(stringSubject)
    continueloop = 1

    while(intstart < strlength and continueloop == 1):
        intindex1=stringSubject.find(string1,intstart)
        if(intindex1 != -1): #The substring was found, lets proceed
            intindex1 = intindex1+len(string1)
            intindex2 = stringSubject.find(string2,intindex1)
            if(intindex2 != -1):
                subsequence=stringSubject[intindex1:intindex2]
                MyList.append(subsequence)
                intstart=intindex2+len(string2)
            else:
                continueloop=0
        else:
            continueloop=0
    return MyList


#Usage Example
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y68")
for x in range(0, len(List)):
               print(List[x])
output:


mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","3")
for x in range(0, len(List)):
              print(List[x])
output:
    2
    2
    2
    2

mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y")
for x in range(0, len(List)):
               print(List[x])
output:
23
23o123pp123

answered Jan 19, 2018 at 8:37

Mnyikka

1,27118 silver badges13 bronze badges

1 Comment

Abhishek Singh Over a year ago

Extraordinary answer. I'd hire a guy like you

Reinstate Monica - Goodbye SE · Accepted Answer · 2014-04-24 09:21:35Z

5

To extract STRING, try:

myString = '123STRINGabc'
startString = '123'
endString = 'abc'

mySubString=myString[myString.find(startString)+len(startString):myString.find(endString)]

edited Apr 24, 2014 at 9:21

answered Feb 20, 2013 at 11:51

Reinstate Monica - Goodbye SE

3,5726 gold badges45 silver badges64 bronze badges

Comments

thecollinsprogram · Accepted Answer · 2017-01-15 10:28:53Z

4

You can simply use this code or copy the function below. All neatly in one line.

def substring(whole, sub1, sub2):
    return whole[whole.index(sub1) : whole.index(sub2)]

If you run the function as follows.

print(substring("5+(5*2)+2", "(", "("))

You will pobably be left with the output:

(5*2

rather than

5*2

If you want to have the sub-strings on the end of the output the code must look like below.

return whole[whole.index(sub1) : whole.index(sub2) + 1]

But if you don't want the substrings on the end the +1 must be on the first value.

return whole[whole.index(sub1) + 1 : whole.index(sub2)]

answered Jan 15, 2017 at 10:28

thecollinsprogram

411 bronze badge

Comments

Wesley Kitlasten · Accepted Answer · 2016-05-19 18:51:28Z

These solutions assume the start string and final string are different. Here is a solution I use for an entire file when the initial and final indicators are the same, assuming the entire file is read using readlines():

def extractstring(line,flag='$'):
    if flag in line: # $ is the flag
        dex1=line.index(flag)
        subline=line[dex1+1:-1] #leave out flag (+1) to end of line
        dex2=subline.index(flag)
        string=subline[0:dex2].strip() #does not include last flag, strip whitespace
    return(string)

Example:

lines=['asdf 1qr3 qtqay 45q at $A NEWT?$ asdfa afeasd',
    'afafoaltat $I GOT BETTER!$ derpity derp derp']
for line in lines:
    string=extractstring(line,flag='$')
    print(string)

Gives:

A NEWT?
I GOT BETTER!

Tony Veijalainen · Accepted Answer · 2010-07-30 07:16:36Z

This I posted before as code snippet in Daniweb:

# picking up piece of string between separators
# function using partition, like partition, but drops the separators
def between(left,right,s):
    before,_,a = s.partition(left)
    a,_,after = a.partition(right)
    return before,a,after

s = "bla bla blaa <a>data</a> lsdjfasdjöf (important notice) 'Daniweb forum' tcha tcha tchaa"
print between('<a>','</a>',s)
print between('(',')',s)
print between("'","'",s)

""" Output:
('bla bla blaa ', 'data', " lsdjfasdj\xc3\xb6f (important notice) 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f ', 'important notice', " 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f (important notice) ', 'Daniweb forum', ' tcha tcha tchaa')
"""

Love and peace - Joe Codeswell · Accepted Answer · 2015-01-10 20:01:16Z

This is essentially cji's answer - Jul 30 '10 at 5:58. I changed the try except structure for a little more clarity on what was causing the exception.

def find_between( inputStr, firstSubstr, lastSubstr ):
'''
find between firstSubstr and lastSubstr in inputStr  STARTING FROM THE LEFT
    http://stackoverflow.com/questions/3368969/find-string-between-two-substrings
        above also has a func that does this FROM THE RIGHT   
'''
start, end = (-1,-1)
try:
    start = inputStr.index( firstSubstr ) + len( firstSubstr )
except ValueError:
    print '    ValueError: ',
    print "firstSubstr=%s  -  "%( firstSubstr ), 
    print sys.exc_info()[1]

try:
    end = inputStr.index( lastSubstr, start )       
except ValueError:
    print '    ValueError: ',
    print "lastSubstr=%s  -  "%( lastSubstr ), 
    print sys.exc_info()[1]

return inputStr[start:end]

AXO · Accepted Answer · 2017-02-05 05:59:11Z

from timeit import timeit
from re import search, DOTALL


def partition_find(string, start, end):
    return string.partition(start)[2].rpartition(end)[0]


def re_find(string, start, end):
    # applying re.escape to start and end would be safer
    return search(start + '(.*)' + end, string, DOTALL).group(1)


def index_find(string, start, end):
    return string[string.find(start) + len(start):string.rfind(end)]


# The wikitext of "Alan Turing law" article form English Wikipeida
# https://en.wikipedia.org/w/index.php?title=Alan_Turing_law&action=edit&oldid=763725886
string = """..."""
start = '==Proposals=='
end = '==Rival bills=='

assert index_find(string, start, end) \
       == partition_find(string, start, end) \
       == re_find(string, start, end)

print('index_find', timeit(
    'index_find(string, start, end)',
    globals=globals(),
    number=100_000,
))

print('partition_find', timeit(
    'partition_find(string, start, end)',
    globals=globals(),
    number=100_000,
))

print('re_find', timeit(
    're_find(string, start, end)',
    globals=globals(),
    number=100_000,
))

Result:

index_find 0.35047444528454114
partition_find 0.5327825636197754
re_find 7.552149639286381

re_find was almost 20 times slower than index_find in this example.

josh · Accepted Answer · 2010-07-30 06:20:22Z

1

My method will be to do something like,

find index of start string in s => i
find index of end string in s => j

substring = substring(i+len(start) to j-1)

edited Jul 30, 2010 at 6:20

answered Jul 30, 2010 at 5:56

josh

14.6k12 gold badges53 silver badges60 bronze badges

Comments

Matthew Dunn · Accepted Answer · 2017-10-05 00:39:20Z

1

Parsing text with delimiters from different email platforms posed a larger-sized version of this problem. They generally have a START and a STOP. Delimiter characters for wildcards kept choking regex. The problem with split is mentioned here & elsewhere - oops, delimiter character gone. It occurred to me to use replace() to give split() something else to consume. Chunk of code:

nuke = '~~~'
start = '|*'
stop = '*|'
julien = (textIn.replace(start,nuke + start).replace(stop,stop + nuke).split(nuke))
keep = [chunk for chunk in julien if start in chunk and stop in chunk]
logging.info('keep: %s',keep)

answered Oct 5, 2017 at 0:39

Matthew Dunn

1415 bronze badges

Comments

Akshay · Accepted Answer · 2018-04-18 09:34:18Z

0

Further from Nikolaus Gradwohl answer, I needed to get version number (i.e., 0.0.2) between('ui:' and '-') from below file content (filename: docker-compose.yml):

    version: '3.1'
services:
  ui:
    image: repo-pkg.dev.io:21/website/ui:0.0.2-QA1
    #network_mode: host
    ports:
      - 443:9999
    ulimits:
      nofile:test

and this is how it worked for me (python script):

import re, sys

f = open('docker-compose.yml', 'r')
lines = f.read()
result = re.search('ui:(.*)-', lines)
print result.group(1)


Result:
0.0.2

edited Apr 18, 2018 at 9:34

answered Apr 18, 2018 at 9:29

Akshay

1691 silver badge4 bronze badges

3 Comments

Dmitry Bubnenkov Over a year ago

Using Docker for simple task is bad practice.

Akshay Over a year ago

@DmitryBubnenkov what does the above post has to do anything with Docker usage/implementation? It's all about finding a string between two substrings in a file.

digitaluniverse Over a year ago

I thought this use case was great. My use case was a css file with encoded base64 text it just shows not every text file needs to be .txt

Chris Martin · Accepted Answer · 2017-04-11 02:53:41Z

-3

This seems much more straight forward to me:

import re

s = 'asdf=5;iwantthis123jasd'
x= re.search('iwantthis',s)
print(s[x.start():x.end()])

answered Apr 11, 2017 at 2:53

Chris Martin

19

1 Comment

Korzak Over a year ago

This requires you to know the string you're looking for, it doesn't find whatever string is between the two substrings, as the OP requested. The OP wants to be able to get the middle no matter what it is, and this answer would require you to know the middle before you start.

Collectives™ on Stack Overflow

Find string between two substrings [duplicate]

20 Answers 20

15 Comments

7 Comments

4 Comments

2 Comments

Comments

4 Comments

5 Comments

2 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

3 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

20 Answers 20

15 Comments

7 Comments

4 Comments

2 Comments

Comments

4 Comments

5 Comments

2 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

3 Comments

1 Comment

Linked

Related