2

I'm using a SqlDataReader to retrieve some "SELECT" query from a DBMS. So far, I read each row one by one in the result set using SqlDataReader.read(), and I process them one by one. When the result set is huge (meaning millions of rows times hundreds of columns), iterating with .read() is very slow. I'm asking: is there a way to do a "block" read from SqlDataReader, meaning that i.e. something like SqlDataReader.read(100) gives me an array of the next 100 rows in the result set?

I thought about doing something like DataTable.Load() to load all the result set in memory, but since the table has a size of several gigabytes, it would't fit in memory.

What would you recommend? Thanks a lot

Example code:

TdConnection conn;
TdCommand cmd;
TdDataReader reader;
IAsyncResult res;

conn = new TdConnection(@"xxxx;");
conn.Open();

cmd = new TdCommand(q,conn);
res = cmd.BeginExecuteReader();

while (!res.IsCompleted);

reader = cmd.EndExecuteReader(res);
if (reader.HasRows)
{
    string sb;
    string currentout = "C:\file";
    string[] row = new string[reader.FieldCount];
    sb = "";
    for (int i = 0; i < reader.FieldCount; i++)
        row[i] = reader.GetName(i);

    sb = String.Concat(sb,String.Join("\t",row),"\r\n");

    File.WriteAllText(currentout,sb);

    sb = "";

    /* With a test query, the following commented "while" takes 5 minutes 
    /* to iterate over a dataset with 639967 rows x 63 columns (about 300MB)
    /* while(reader.Read());
    */

    /* With the same test query, the following "while block" takes 6 minutes
    /* to iterate over the same dataset AND writing it on a text file
    /* I conclude that I/O writing on text file is fast, and .Read() is very slow
    */
    while(reader.Read())
    {
        for (int i = 0; i < reader.FieldCount; i++)
                row[i] = reader.GetValue(i).ToString();

        sb = String.Concat(sb,String.Join("\t",row),"\r\n");

        if (sb.Length > 100000)
        {
            File.AppendAllText(currentout,sb);
            sb = "";
        }
    }
    File.AppendAllText(currentout,sb);
}

reader.Close();
reader.Dispose();
cmd.Dispose();
conn.Close();

The "Td" components are the Teradata DBMS interface for .NET (but they behave just like "SQL" components).

13
  • The real problem here is: why do you need to load all that data? Commented Feb 2, 2016 at 17:30
  • Is the Read slow, or your processing? Did you profile the code to find the bottleneck? Commented Feb 2, 2016 at 17:36
  • 1
    Maybe you can transfer your program login to a SQL query and execute that, from the program. Commented Feb 2, 2016 at 19:19
  • @Steve because I'm writing a sort of custom "data extractor", so whatever the query is, I need to execute it and export the result set in a custom tab-separated file. So basically my program does not need the data itself, it just needs to write it on a text file. Commented Feb 3, 2016 at 8:13
  • I can only think of an async operation that move all your processing in background leaving your interface responsive. You can read a lot about this here on MSDN: Asynchronous Programming Commented Feb 3, 2016 at 8:17

1 Answer 1

1

What's slow here is the quadratic cost of the string concatenation in the loop:

sb = String.Concat(sb,String.Join("\t",row),"\r\n");

Since this is such a glaring perf problem I'm submitting this as an answer since it probably solves your problem.

If your app is slow, profile it to see what is slow.

Unfortunately, ADO.NET is indeed quite CPU heavy when reading data. Nothing you can do about it.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.