3

I have a regex pattern defined as

var pattern = ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))";

and I am trying to split some CSV like strings to get fields

Some example strings that WORK with this regex are

_input[0] = ""; // expected single blank field
_input[1] = "A,B,C"; // expected three individual fields
_input[2] = "\"A,B\",C"; // expected two fields 'A,B' and C
_input[3] = "\"ABC\"\",\"Text with,\""; // expected two fields, 'ABC"', 'Text with,'
_input[4] = "\"\",ABC\",\"next_field\""; // expected two fields, '",ABC', 'next_field'

However, this is not working

_input[5] = "\"\"\",ABC\",\"next_field\"";

I am expecting three fields

'"', 'ABC"', 'next_field'

But I am getting two fields

'"",ABC', 'next_field'

Can anybody help with this regex?

I think the strange part is that the second column doesn't have quotes at the start and end of the value, just at the end. So the first column's value is empty, and the second column is ABC"

Thanks, Rob

1
  • You can try your regex strings at www.regexhero.net/tester/ Commented Dec 10, 2012 at 8:02

1 Answer 1

3

I think you need to be even more specific about what your logic is in terms of how the double quotes should be treated, as it appears that your requirements conflicts with each other.

My quick version that I think comes closest to what you are trying to achieve is this (please note 1) The missing escaping of double quotes, because I am using an external tool to validate the regex, and 2) I have changed how to retrieve the matched values, see the bottom for an example):

(?<Match>(?:"[^"]*"+|[^,])*)(?:,(?<Match>(?:"[^"]*"+|[^,])*))*

It has the following logic:

  • If there is a double quote, then include everything in it, until an end double quote is hit.
  • When reaching an end double quote, double quotes following immediately after will also be included.
  • If the next character is anything but a comma, it is included, and the above is tested again.
  • If it is a comma, the current match is concluded and a new one begins after the comma.

The above logic conflicts with what you expect from index 4 and 5 however, because I get:

[4] = '""' and 'ABC","next_field"'
[5] = '"""' and 'ABC","next_field"'

If you could point out why the above logic is wrong for your needs/expectations, I'll edit my answer with a fully working regex.

To retrieve your values, you could do it like this:

string pattern = @"(?<Match>(?:""[^""]*""+|[^,])*)(?:,(?<Match>(?:""[^""]*""+|[^,])*))*";

string[] testCases = new[]{
  @"",
  @"A,B,C",
  @"A,B"",C",
  @"ABC"",""Text with,",
  @""",ABC"",""next_field""",
  @""""",ABC"",""next_field"""
};

foreach(string testCase in testCases){
  var match = System.Text.RegularExpressions.Regex.Match(testCase, pattern);
  string[] matchedValues = match.Groups["Match"].Captures
    .Cast<System.Text.RegularExpressions.Capture>()
    .Select(c => c.Value)
    .ToArray();
}
Sign up to request clarification or add additional context in comments.

5 Comments

Slightly updated the original post, hopefully a little clearer now
hmm, that pattern doesn't work at all for me. I am using .NET regex var pattern = "(?<Match>(?:\"[^\"]*\"+|[^,])*)(?:,(?<Match>(?:\"[^\"]*\"|[^,])*))*"; var result = Regex.Split(textToSplit, pattern); and even calling that with an empty string results in three empty strings
@QldRobbo sorry, I also changed how the capturing was done, please see my edited answer for an example on how to extract them.
Thanks for your help, because of your response and how you are doing it I've realised that the format of the CSV is wrong and I will need to get the provider to change it. I wouldn't have worked this out without your help so your time was not wasted! Thanks again
You're welcome. :) Please do note though, that even though Regex might be able to solve the CSV format you are using, there are a lot of CSV parsers that you would probably be better of using, e.g. Csv Helper or A Fast CSV Reader. If you do decide to stick with Regex, you probably should remove the 2 '+'es from the Regex.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.