Please note this is not a duplicate question as it is about parsing (and not deserializing) a large array object by object and retrieving the raw JSON.
I am dealing with very large arrays of JSON payload (tens of GB).
The structure of each object may be different i.e. heterogeneous:
[
{"id": "foo", "value": "bar"},
{"key": "foo", "name": "bar", "age": 10},
...
]
How can I go through the stream processing each object one at the time and retrieve the raw JSON string of each object?
My current solution is to use the StreamReader together with JsonTextReader to deserialize each object with a JObject and then retrieve the JSON using the .ToString(). But I prefer to avoid the performance cost and the GC allocation pressure of having to deserialize from JSON only to retrieve the JSON back again.
var file = new FileInfo(@"C:\Very.Large.Array.json");
using (var sr = new StreamReader(file.OpenRead()))
using (var reader = new JsonTextReader(sr))
{
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
var obj = JObject.Load(reader);
var rawJSON = obj.ToString();
}
}
}