10

I'd like to test if a regex will match part of a string at a specific index (and only starting at that specific index). For example, given the string "one two 3 4 five", I'd like to know that, at index 8, the regular expression [0-9]+ will match "3". RegularExpression.IsMatch and Match both take a starting index, however they both will search the entire rest of the string for a match if necessary.

string text="one two 3 4 five";
Regex num=new Regex("[0-9]+");

//unfortunately num.IsMatch(text,0) also finds a match and returns true
Console.WriteLine("{0} {1}",num.IsMatch(text, 8),num.IsMatch(text,0));

Obviously, I could check if the resulting match starts at the index I am interested in, but I will be doing this a large number of times on large strings, so I don't want to waste time searching for matches later on in the string. Also, I won't know in advance what regular expressions I will actually be testing against the string.

I don't want to:

  1. split the string on some boundary like whitespace because in my situation I won't know in advance what a suitable boundary would be
  2. have to modify the input string in any way (like getting the substring at index 8 and then using ^ in the regex)
  3. search the rest of the string for a match or do anything else that wouldn't be performant for a large number of tests against a large string.

I would like to parse a potentially large user supplied body of text using an arbitrary user supplied grammar. The grammar will be defined in a BNF or PEG like syntax, and the terminals will either be string literals or regular expressions. Thus I will need to check if the next part of the string matches any of the potential terminals as driven by the grammar.

2
  • 1
    Can you explained what you're trying to do in a broader sense? Your restrictions on what you don't want to do are confusing. Commented Aug 11, 2009 at 20:34
  • I added a brief description of what I am doing. Also, the requirements really boil down to: I don't want to do anything slow and I don't have in depth knowledge of what I am trying to parse up front. Commented Aug 11, 2009 at 23:34

4 Answers 4

14

How about using Regex.IsMatch(string, int) using a regular expression starting with \G (meaning "start of last match")?

That appears to work:

using System;
using System.Text.RegularExpressions;

class Test
{
    static void Main()
    {
        string text="one two 3 4 five";
        Regex num=new Regex(@"\G[0-9]+");

        Console.WriteLine("{0} {1}",
                          num.IsMatch(text, 8), // True
                          num.IsMatch(text, 0)); // False
    }
}
Sign up to request clarification or add additional context in comments.

3 Comments

Interesting, if there's a way to artifically set the last match position then this might work out. Otherwise I don't think it will help because I will be jumping between different regular expressions and different locations.
I had a chance to try this out and it seems to do exactly what I want. It treats the passed start index as the "start of last match" regardless of where the last match actually was. Perfect, thanks!
And just to add a bit of info for anyone else that has this problem, regular-expressions.info/continue.html describes the \G anchor. It appears mean either "start of last match" or "start of match attempt" depending on the implementation. In some implementations it probably won't solve this problem, but it appears to be "start of match attempt" in C# and works nicely for matching at a specific location.
2

If you only want to search a substring of the text, grab that substring before the regex.

myRegex.Match(myString.Substring(8, 10));

5 Comments

Doesn't look like this modifies the input string, so +1. If point 2 isn't just about changing the input string, it needs to be edited.
Well, it's modifying the input to the regular expression. Given the "doing this a large number of times on large strings" I wouldn't have thought a substring was an ideal solution.
It seems like he wants to match against a specific series of characters in a string. Why doesn't substring makes sense?
@Rob: Because it will involve copying large amounts of data repeatedly - and unnecessarily, given that you can tell the regex engine where to start looking for a match.
This would be too slow because I don't have any max length other than the size of the string, which could be tens to hundreds of megabytes.
1

I'm not sure I fully understand the question, but it seems to me that you can simply make the position part of the regular expression, e.g.

^.{8}[\d]

which will match if there are 8 characters between the start of the string and a digit.

1 Comment

This isn't ideal, because it would involve modifying the regex for each position I want to test against. It would also depend on the regex being smart enough to optimize ^.{8} into something that jumps immediately to position 8.
0

If you know the max length of a potential match in the string you check for this would limit the scanning of the string.

If you're only checking for numbers this is probably easier than if you check for arbitrary expressions. The nature of Regex is to scan until the end in order to find a match. If you want to prevent scanning you need to include a length, or use something other than Regex.

string text = "one two 3 4 five";
Regex num = new Regex("[0-9]+");
int indexToCheck = 8;
int maxMatchLength = ...;
Match m = num.Match(text, indexToCheck, maxMatchLength);

Do you know anything about what types of expressions might be run against the strings, and will scanning the entire string be too much of an overhead?

num.Match will return the first hit if it exists, and then stop scanning. If you want more matches you would call m.NextMatch() to continue the scanning of matches.

2 Comments

Unfortunately I don't know what the regular expressions will be beforehand and cannot provide a max length other than the rest of the string.
The expression to find could have a varying length, depending on whitespace e.g. new-lines and indented paragraph starts, or whatever.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.