0

I am new to mongodb and got stuck on this task for some time now. I need to perform a query on the collection with the following schema:

{
  ticker: String,
  asks: [[Number]],
  bids: [[Number]],
  timestamp: String,
  datetime: String
}

Here is the document example:

  { 
    "_id" : ObjectId("5fc17630cf8dff0cd5506dc4"), 
    "asks" : [ 
      [ "23685.0", "0.008" ],
      [ "23688.8", "0.000" ],
      [ "23696.7", "0.000" ]
    ], 
    "bids" : [ 
      [ "23553.7", "0.200" ],
      [ "23557.8", "0.207" ],
      [ "23558.4", "0.045" ],
      [ "23563.4", "0.020" ] 
    ],
    "timestamp" : 1606514176211,
    "datetime" : "2020-11-27T21:56:16.211Z",
    "ticker" : "YFI/USDT"
  }

Inner array of asks and bids has two elements: first element is the price and the second element is the quantity.

I have two questions:

  1. I need to query the collection to get all entries where price is greater than a value in both asks and bids ( only asks or only bids) for the time interval based on the datetime field. I tried to mess with $elemMatch and aggregate framework but had no luck so far.

  2. Another question I have is whether array or arrays field type is a good choice when it comes to querying those fields. I expect to have millions of records of this type. Please suggest whether adding indexes and/or use some other data structure to hold this data in the collection.

Any help is greatly appreciated. Thanks.

0

1 Answer 1

1

Question 1 :

You have to first $match your date to return only documents in the time interval you need, then $filter your array (asks or bids) to return only what you need.

Here's the query (of course you can do exactly the same for bids):

db.collection.aggregate([
  {
    "$match": {
      $expr: {
        "$and": [
          {
            "$gte": [
              "$datetime",
              "2020-11-01"
            ]
          },
          {
            "$lt": [
              "$datetime",
              "2020-12-01"
            ]
          }
        ]
      }
    }
  },
  {
    "$addFields": {
      "asks": {
        "$filter": {
          "input": "$asks",
          "as": "ask",
          "cond": {
            "$gte": [
              {
                "$toDouble": {
                  "$arrayElemAt": [
                    "$$ask",
                    0
                  ]
                }
              },
              23686
            ]
          }
        }
      }
    }
  }
])

You can test it here

Question 2 :

However, as you noticed, it's probably not the best way to store data and query it. I see some simple things to improve :

1- Replace your nested array by an object. It will be easier to query directly the object field instead of the position in array.

{
  "price" : yourPrice, 
  "quantity":yourQuantity
}

2- Use numbers (int/double) instead of strings, you'll avoid the conversion step, and so improve the performances of your aggreggation.

{
  "price" : 21563, 
  "quantity":0.154
}

3-Do you really need date and time? If not, just store the date, so you can put an index on it and take the benefit of it during the $match stage.

Here's a sample of such a dataset/query

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks matthPen! I wanted to avoid objects and went with array to save on record size - some records may have hundreds of inner arrays populated so I thought it will be redundant to have keys populated in there. I will try to add an index on date field and see if there is any difference.
I will mark your answer as solved once I test the solution. Thanks!
mongoplayground.net/p/Cg_QBzAa2qg this code still returns the records which don't qualify to be in the result set.
Not exactly what I needed but it is a good start. Thanks for help. I will play with it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.