6

Im trying to request a big load of data and then parse it into a report. The problem is that the data I'm requesting has 27 million lines of records, each having 6 joins, which when loaded via Entity framework uses all of the server RAM. Ive implemented a pagination system to buffer the processing into smaller chunks like you would do with an IO operation.

I request 10,000 records, writing them to a file stream (to disk) and them I'm trying to clear the 10,000 records from memory as they're no longer needed.

I'm having trouble garbage collecting the database context. I've tried disposing the object, nulling the reference and then creating a new context on the next batch of 10,000 records. This does not seem to work. (this was recommended by one of the devs on ef core: https://github.com/aspnet/EntityFramework/issues/5473)

The only other alternative I see is to use a raw SQL query to achieve what I want. I'm trying to build the system to deal with any request size and the only variable factor will be the time it takes to produce the reports. Is there something I can do with the EF context to get rid of loaded entities?

 private void ProcessReport(ZipArchive zip, int page, int pageSize)
        {
            using (var context = new DBContext(_contextOptions))
            {
                var batch = GetDataFromIndex(page, pageSize, context).ToArray();
                if (!batch.Any())
                {
                    return;
                }

                var file = zip.CreateEntry("file_" + page + ".csv");
                using (var entryStream = file.Open())
                using (var streamWriter = new StreamWriter(entryStream))
                {
                    foreach (var reading in batch)
                    {
                        try
                        {
                            streamWriter.WriteLine("write data from record here.")
                        }
                        catch (Exception e)
                        {
                            //handle error
                        }
                    }
                }
                batch = null;
            }
            ProcessReport(zip, page + 1, pageSize);
        }

private IEnumerable<Reading> GetDataFromIndex(int page, int pageSize, DBContext context)
        {

            var batches = (from rb in context.Reading.AsNoTracking()
                //Some joins
                select rb)
                .Skip((page - 1) * pageSize)
                .Take(pageSize);

                return batches
                    .Includes(x => x.Something)

        }
3
  • What do you mean by "data"? If you are using projection query to some sort of a DTO objects, or use no tracking query, the DbContext will not store internally anything. Also don't use ToList, ToArray etc. Simply enumerate the result. Commented Aug 5, 2017 at 15:04
  • I've turned off change tracking on the query using .AsNoTracking(). I've also removed the ToArray() yet the garbage collector still isn't freeing up any of the contexts. prntscr.com/g4q0h3 As for what I mean by data, I literally just mean I'm looking to get records from a database into a C# model to use temporarily and then dispose of. I have a feeling that the objects aren't being removed from memory due to the recursive loop? Commented Aug 5, 2017 at 15:30
  • 1
    Don't use EF Core for it, it's not the suitable case. Use a raw query where you can always read a subset of the data, i.e. using the data reader Commented Aug 5, 2017 at 16:57

2 Answers 2

3

Apart from your memory management issue, you are going to have a bad time using paging for this. Running the paging queries is going to get expensive on the server. You don't need to page. Just iterate the query results (ie don't call ToList(), or ToArray()).

Also when paging you must add ordering to the queries, or else SQL may return overlapping rows, or have gaps. See for SQL Server, eg: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-order-by-clause-transact-sql EF Core doesn't enforce this, as some providers might guarantee that paging queries always read rows in the same order.

Here's an example of EF Core (1.1 on .NET Core) plowing through a huge resultset without increasing memory usage:

using Microsoft.EntityFrameworkCore;
using System.Linq;
using System;
using System.ComponentModel.DataAnnotations.Schema;

namespace efCoreTest
{
    [Table("SomeEntity")]
    class SomeEntity
    {

        public int Id { get; set; }
        public string Name { get; set; }
        public string Description { get; set; }

        public DateTime CreatedOn { get; set; }
        public int A { get; set; }
        public int B { get; set; }
        public int C { get; set; }
        public int D { get; set; }

        virtual public Address Address { get; set; }
        public int AddressId { get; set; }

    }

    [Table("Address")]
    class Address
    {
        [DatabaseGenerated(DatabaseGeneratedOption.None)]
        public int Id { get; set; }
        public string Line1 { get; set; }
        public string Line2 { get; set; }
        public string Line3 { get; set; }

    }
    class Db : DbContext
    {
        public DbSet<SomeEntity> SomeEntities { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            optionsBuilder.UseSqlServer("Server=.;Database=efCoreTest;Integrated Security=true");
        }

    }
    class Program
    {
        static void Main(string[] args)
        {
            using (var db = new Db())
            {
                db.Database.EnsureDeleted();
                db.Database.EnsureCreated();

                db.Database.ExecuteSqlCommand("alter database EfCoreTest set recovery simple;");

                var LoadAddressesSql = @"

with N as
(
   select top (10) cast(row_number() over (order by (select null)) as int) i
   from sys.objects o, sys.columns c, sys.columns c2
)
insert into Address(Id, Line1, Line2, Line3)
select i Id, 'AddressLine1' Line1,'AddressLine2' Line2,'AddressLine3' Line3
from N;
";

                var LoadEntitySql = @"

with N as
(
   select top (1000000) cast(row_number() over (order by (select null)) as int) i
   from sys.objects o, sys.columns c, sys.columns c2
)
insert into SomeEntity (Name, Description, CreatedOn, A,B,C,D, AddressId)
select  concat('EntityName',i) Name,
        concat('Entity Description which is really rather long for Entity whose ID happens to be ',i) Description,
        getdate() CreatedOn,
        i A, i B, i C, i D, 1+i%10 AddressId
from N

";
                Console.WriteLine("Generating Data ...");
                db.Database.ExecuteSqlCommand(LoadAddressesSql);
                Console.WriteLine("Loaded Addresses");

                for (int i = 0; i < 10; i++)
                {
                    var rows = db.Database.ExecuteSqlCommand(LoadEntitySql);
                    Console.WriteLine($"Loaded Entity Batch {rows} rows");
                }


                Console.WriteLine("Finished Generating Data");

                var results = db.SomeEntities.AsNoTracking().Include(e => e.Address).AsEnumerable();

                int batchSize = 10 * 1000;
                int ix = 0;
                foreach (var r in results)
                {
                    ix++;

                    if (ix % batchSize == 0)
                    {
                        Console.WriteLine($"Read Entity {ix} with name {r.Name}.  Current Memory: {GC.GetTotalMemory(false) / 1024}kb GC's Gen0:{GC.CollectionCount(0)} Gen1:{GC.CollectionCount(1)} Gen2:{GC.CollectionCount(2)}");

                    }

                }

                Console.WriteLine($"Done.  Current Memory: {GC.GetTotalMemory(false)/1024}kb");

                Console.ReadKey();
            }
        }
    }
}

Outputs

Generating Data ...
Loaded Addresses
Loaded Entity Batch 1000000 rows
Loaded Entity Batch 1000000 rows
. . .
Loaded Entity Batch 1000000 rows
Finished Generating Data
Read Entity 10000 with name EntityName10000.  Current Memory: 2854kb GC's Gen0:7 Gen1:1 Gen2:0
Read Entity 20000 with name EntityName20000.  Current Memory: 4158kb GC's Gen0:14 Gen1:1 Gen2:0
Read Entity 30000 with name EntityName30000.  Current Memory: 2446kb GC's Gen0:22 Gen1:1 Gen2:0
. . .
Read Entity 9990000 with name EntityName990000.  Current Memory: 2595kb GC's Gen0:7429 Gen1:9 Gen2:1
Read Entity 10000000 with name EntityName1000000.  Current Memory: 3908kb GC's Gen0:7436 Gen1:9 Gen2:1
Done.  Current Memory: 3916kb

Note, another common cause for excessive memory consumption in EF Core is "Mixed client/server evaluation" of queries. See the docs for more info and how to disable automatic client-side query evaluation.

Sign up to request clarification or add additional context in comments.

7 Comments

I attempted this method but even without change tracking, ef executes the query and pulls back all the data into memory. Read Entity 250000 with name 575430. Current Memory: 3678275kb GC's Gen0:16 Gen1:9 Gen2:6
It appears that memory usage only rises when I add includes. Any idea as to why and how I can get past this?
Can you produce a repro, or modify the one I posted to show increased memory usage? As a workaround, you can always flatten the object graph in the query.
I took your example and replaced the code first with a scaffold of my database. I have an author table that has a one to many relation with a book table. I added 550,000 authors and 2,750,000 books and gave each author 5 books (no books overlapped authors). I then selected all authors and joined the book table. I've attempted to create a repo but I have no nice way to get a database that size to you that's not code first.
The repro is mostly for you. Did you see increasing memory utilization as you enumerated the authors?
|
1

This was due to MARS (Multiple multiple active result sets being disabled).

https://github.com/aspnet/EntityFrameworkCore/issues/9367

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.