OK, so maybe I got a little carried overboard with the MMF. Here's a simpler version, with a file stream only (I think this is what Scott Chamberlain suggested in the comments).
Timings (on a new system) for a 3Gb array:
- MMF: ~50 seconds.
- FilStream: ~30 seconds.
Code:
long dataLen = 402653184; //3gb represented in 8 byte chunks
long[] data = new long[dataLen];
int elementSize = sizeof(long);
Stopwatch sw = Stopwatch.StartNew();
using (FileStream f = new FileStream(@"D:\Test.bin", FileMode.OpenOrCreate, FileAccess.Write, FileShare.Read, 32768))
{
int offset = 0;
int workBufferSize = 32768;
byte[] workBuffer = new byte[workBufferSize];
while (offset < dataLen)
{
Buffer.BlockCopy(data, offset, workBuffer, 0, workBufferSize);
f.Write(workBuffer, 0, workBufferSize);
//advance in the source array
offset += workBufferSize / elementSize;
}
}
Console.WriteLine(sw.Elapsed);
Old solution, MMF
I think you can try with a MemoryMappedFile. I got ~2 to ~2.5 minutes for a 3Gb array on a relatively slower external drive.
What this solution implies:
- First, create an empty file.
- Create a memory mapped file over it, with a default capacity of X bytes, where X is the array length in bytes. This automatically sets the physical length of the file, on disk, to that value.
- Dump the array to the file via a 32kx8 bytes wide accessor (you can change this, it's just something I tested with). So, I'm writing the array in chunks of 32k elements.
Note that you will need to account for the case when the array length is not a multiple of chunkLength. For testing purposes, in my sample it is :).
See below:
//Just create an empty file
FileStream f = File.Create(@"D:\Test.bin");
f.Close();
long dataLen = 402653184; //3gb represented in 8 byte chunks
long[] data = new long[dataLen];
int elementSize = sizeof (long);
Stopwatch sw = Stopwatch.StartNew();
//Open the file, with a default capacity. This allows you to write over the initial capacity of the file
using (var mmf = MemoryMappedFile.CreateFromFile(@"D:\Test.bin", FileMode.Open, "longarray", data.LongLength * elementSize))
{
long offset = 0;
int chunkLength = 32768;
while (offset < dataLen)
{
using (var accessor = mmf.CreateViewAccessor(offset * elementSize, chunkLength * elementSize))
{
for (long i = offset; i != offset + chunkLength; ++i)
{
accessor.Write(i - offset, data[i]);
}
}
offset += chunkLength;
}
}
Console.WriteLine(sw.Elapsed);
dataisList<long>. There are up to600 000 000elements in these arrays.