0

Please note this is not a duplicate question as it is about parsing (and not deserializing) a large array object by object and retrieving the raw JSON.

I am dealing with very large arrays of JSON payload (tens of GB).

The structure of each object may be different i.e. heterogeneous:

[
    {"id": "foo", "value": "bar"},
    {"key": "foo", "name": "bar", "age": 10},
    ...
]

How can I go through the stream processing each object one at the time and retrieve the raw JSON string of each object?

My current solution is to use the StreamReader together with JsonTextReader to deserialize each object with a JObject and then retrieve the JSON using the .ToString(). But I prefer to avoid the performance cost and the GC allocation pressure of having to deserialize from JSON only to retrieve the JSON back again.

var file = new FileInfo(@"C:\Very.Large.Array.json");

using (var sr = new StreamReader(file.OpenRead()))
using (var reader = new JsonTextReader(sr))
{
  while (reader.Read())
  {
    if (reader.TokenType == JsonToken.StartObject)
    {
      var obj = JObject.Load(reader);      
      var rawJSON = obj.ToString();
    }
  }
}
7
  • Change your return type to IEnumerable<JObjevt€ and use yield return JObject.Load(...). The performance will still be crap as the file is huge (JSON is not really a good streaming data format). Commented Sep 3, 2019 at 17:21
  • This does not answer my question, I am looking for a method of not having to deserialize the JSON to an object. My question is not about yield returning. Commented Sep 3, 2019 at 17:23
  • 1
    That’s why it’s a comment. The answer is to change your file. Commented Sep 3, 2019 at 17:32
  • 1
    Is it safe to assume the file is formatted in any way? Like an object per line? Otherwise you'll have to consume the JSON character per character, to match braces to find objects. Commented Sep 3, 2019 at 17:41
  • It can be indented or not so I guess the JSON must be parsed. Commented Sep 3, 2019 at 18:18

1 Answer 1

3

I believe you should use JsonTextWriter along with JsonTextReader. See below the simple POC class that demonstrates the idea.

I guess that some polishing is still required to bring this code to production quality. Like you may promote the StringBuilder sb from the local variable to the instance field and clear it at each iteration instead of creating new object.

But my goal was only to show the basic idea.

public class JsonBigFileReader
{
    static string ReadSingleObject(JsonTextReader reader)
    {
        StringBuilder sb = new StringBuilder();

        using (var sw = new StringWriter(sb))
        {
            using (var writer = new JsonTextWriter(sw))
            {
                writer.WriteToken(reader, true);    //  writes current token including its children (meaning the whole object)
            }
        }
        return sb.ToString();
    }

    public IEnumerable<string> ReadArray(string fileName)
    {
        var file = new FileInfo(fileName);
        using (var sr = new StreamReader(file.OpenRead()))
        using (var reader = new JsonTextReader(sr))
        {
            reader.Read();
            while (reader.Read())
            {
                if (reader.TokenType == JsonToken.StartObject)
                {
                    yield return ReadSingleObject(reader);
                }
            }
        }
    }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.