8

I need to write huge arrays of longs (up to 5GB) to disk. I tried using BinaryFormatter but it seems to be able to write only arrays of size lower than 2GB:

long[] array = data.ToArray();
FileStream fs = new FileStream(dst, FileMode.Create);
BinaryFormatter formatter = new BinaryFormatter();
try
{
    formatter.Serialize(fs, array);
}
catch (SerializationException e)
{
    Console.WriteLine("Failed to serialize. Reason: " + e.Message);
    throw;
}
finally
{
    fs.Close();
}

This code throws IndexOutOfRangeException for larger arrays.

I don't want to save element per element, because it takes too much time. Is there any proper way to save such large array?

Writing element per element:

using (BinaryWriter writer = new BinaryWriter(File.Open(dst, FileMode.Create)))
{
    foreach(long v in array)
    {
        writer.Write(v);
    }
} 

This is very slow.

7
  • 2
    You can now use very, very, very large arrays in .NET 4.5- ref msdn.microsoft.com/en-us/library/hh285054(v=vs.110).aspx Commented Aug 13, 2014 at 21:13
  • @MatthewMartin Yes, I know and I'm using. I have a problem with writing them to disk. Commented Aug 13, 2014 at 21:16
  • @Ari: How did you allocate it in the first place? Or do you have a lot of memory? I do have a solution and sample, but I cannot allocate 5GB at once (only 8GB of RAM). How many elements are in that array usually? Commented Aug 13, 2014 at 21:42
  • @MarcelN. I generated this data. Variable data is List<long>. There are up to 600 000 000 elements in these arrays. Commented Aug 13, 2014 at 22:00
  • @Ari: Yes, I managed to allocate it. I'm testing now. How much time do you get for individual writes? I have ~2.5 minutes for 3GB (on a 5400RPM external drive). Commented Aug 13, 2014 at 22:01

1 Answer 1

8

OK, so maybe I got a little carried overboard with the MMF. Here's a simpler version, with a file stream only (I think this is what Scott Chamberlain suggested in the comments).

Timings (on a new system) for a 3Gb array:

  1. MMF: ~50 seconds.
  2. FilStream: ~30 seconds.

Code:

long dataLen = 402653184; //3gb represented in 8 byte chunks
long[] data = new long[dataLen];
int elementSize = sizeof(long);

Stopwatch sw = Stopwatch.StartNew();
using (FileStream f = new FileStream(@"D:\Test.bin", FileMode.OpenOrCreate, FileAccess.Write, FileShare.Read, 32768))
{
    int offset = 0;
    int workBufferSize = 32768;
    byte[] workBuffer = new byte[workBufferSize];
    while (offset < dataLen)
    {
        Buffer.BlockCopy(data, offset, workBuffer, 0, workBufferSize);
        f.Write(workBuffer, 0, workBufferSize);

        //advance in the source array
        offset += workBufferSize / elementSize;
    }
}

Console.WriteLine(sw.Elapsed);

Old solution, MMF

I think you can try with a MemoryMappedFile. I got ~2 to ~2.5 minutes for a 3Gb array on a relatively slower external drive.

What this solution implies:

  1. First, create an empty file.
  2. Create a memory mapped file over it, with a default capacity of X bytes, where X is the array length in bytes. This automatically sets the physical length of the file, on disk, to that value.
  3. Dump the array to the file via a 32kx8 bytes wide accessor (you can change this, it's just something I tested with). So, I'm writing the array in chunks of 32k elements.

Note that you will need to account for the case when the array length is not a multiple of chunkLength. For testing purposes, in my sample it is :).

See below:

//Just create an empty file
FileStream f = File.Create(@"D:\Test.bin");
f.Close();

long dataLen = 402653184; //3gb represented in 8 byte chunks
long[] data = new long[dataLen];
int elementSize = sizeof (long);

Stopwatch sw = Stopwatch.StartNew();

//Open the file, with a default capacity. This allows you to write over the initial capacity of the file
using (var mmf = MemoryMappedFile.CreateFromFile(@"D:\Test.bin", FileMode.Open, "longarray", data.LongLength * elementSize))
{
    long offset = 0;
    int chunkLength = 32768; 

    while (offset < dataLen)
    {
        using (var accessor = mmf.CreateViewAccessor(offset * elementSize, chunkLength * elementSize))
        {
            for (long i = offset; i != offset + chunkLength; ++i)
            {
                accessor.Write(i - offset, data[i]);
            }
        }

        offset += chunkLength;
    }
}

Console.WriteLine(sw.Elapsed);
Sign up to request clarification or add additional context in comments.

2 Comments

Sad that there is method File.WriteAllBytes and we can't use it, because we have long[] not byte[]. I expected something better from Microsoft. I think, Buffer.BlockCopy will be enought.
@Ari: But that would take away all the pleasure of writing lower level code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.