0

Using C#, I need to parse a CSV string that doesn't come from a file. I've found a great deal of material on parsing CSV files, but virtually nothing on strings. It seems as though this should be simple, yet thus far I can come up only with inefficient methods, such as this:

using Microsoft.VisualBasic.FileIO;

var csvParser = new TextFieldParser(new StringReader(strCsvLine));
csvParser.SetDelimiters(new string[] { "," });
csvParser.HasFieldsEnclosedInQuotes = true;

Are there good ways of making this more efficient and less ugly? I will be processing huge volumes of strings, so I wouldn't want to pay the cost of all the above. Thanks.

6
  • 1
    See stackoverflow.com/questions/2081418/… Commented Oct 10, 2014 at 1:56
  • 1
    Why do you call the solution you have at hand as inefficient? What efficiency are you expecting from a different solution? Commented Oct 10, 2014 at 2:00
  • Thanks -- yes, I saw that SO entry, but it's generally about files, not strings. As for efficiency, I don't think I want to create a new TextFieldParser and a new StringReader for every single string, since this seems hugely wasteful. Still, I'm starting to believe it may not be so bad after all, given the Pandora's Box I've managed to open. Commented Oct 10, 2014 at 2:04
  • 1
    You have a valid CSV string? Split on Environment.Newline, then on commas. What's the problem? Commented Oct 10, 2014 at 2:38
  • 1
    @Jonesy: I'm gonna guess, from the example the OP has given, that they have commas that are enclosed within quotes that shouldn't be split. Still, it's only slightly more complicated. Commented Oct 10, 2014 at 2:48

1 Answer 1

3

Here is a lightly tested parser that handles quotes

public List<string> Parse(string line)
{
    var columns = new List<string>();
    var sb = new StringBuilder();
    bool isQuoted = false;

    for (int i = 0; i < line.Length; i++)
    {
        char c = line[i];

        // If the current character is a double quote
        if (c == '"')
        {
            // If we're not inside a quoted section, set isQuoted to true
            if (!isQuoted && sb.Length == 0)
            {
                isQuoted = true;
            }
            else if (isQuoted && i + 1 < line.Length && line[i + 1] == '"') // Check for escaped double quotes
            {
                sb.Append('"');
                i++; // Skip the next quote
            }
            else if (isQuoted) // If the next character is not a double quote, set isQuoted to false
            {
                isQuoted = false;
            }
            else // Not a quoted string
            {
                sb.Append('"');
            }
            continue;
        }

        // If the current character is a comma and we're not inside a quoted section, add the column and clear the StringBuilder
        if (!isQuoted && c == ',')
        {
            columns.Add(sb.ToString());
            sb.Clear();
            continue;
        }

        // Append the character to the current column
        sb.Append(c);
    }

    // Add the last column
    columns.Add(sb.ToString());

    return columns;
}
Sign up to request clarification or add additional context in comments.

1 Comment

I tested against var examples = new [] { "x,y", "x,\"y\"", "x,\"y\",z", "x,\"y,w\",z", "x,\"y,\"\"w\",z", }; and it works pretty well. Well done!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.