4

I am working on a search server something like elastic search. Just a small project which I am developing. Have completed most of the parts but am stuck on how the user would interact with the system.

I initially Decided that the user would request by send a JSON query with required fields and its values. But the problem I am facing is, that even though i can evaluate queries using the json way, i wouldn't be able to implement Boolean Queries and Compounf statements.

I was trying something like

index: name
schema:name
field1: value
field 2: value

but it could also be something like this if Boolean expression is implemented

index : name
schema : name
field 1 : name1 or name 2
field 2: <9.22 and >=2.32 
field 3: (<9.22 and >=2.32) or (<100 and >90) // compound statement.

Is there a somewhat straightforward way to implement this, without actually creating a Query Language Grammar. If yes, then how might i achieve that, if not then also the same thing.

I was thinking of splitting values based on and/ or of each field but that wouldn't work if there are compound statements.

I was checking out pyparsing as well, but i couldn't figure out a working way to use that.

1
  • Check out whoosh or plyse. Also pyparsing includes several samples of query parsers at its Examples page. Commented Aug 14, 2016 at 9:02

2 Answers 2

2

The real question is how complex are your compound statements going to get, and are they only going to include AND and OR keywords. From what I can tell, is that you're better off defining a proper grammar for this than to just use a concoction of regular expressions to get the job done (although that is what a grammar essentially is).

I would suggest using parsely, where you can clearly define a grammar in the lex format and have a parser generated for you. This way you can tokenize things better and have a better understanding when you're debugging.

Sign up to request clarification or add additional context in comments.

3 Comments

Well I want to consider even big queries with many conditions for a field. of course there will be restraints like not using mixed data types for a particular field(like <9 and =="something" , but this data type checking would come after actually parsing.). Most of the queries would be using and, or, not, not equal, and not. Would check parsley and get back to you
Parsley did most of the jobs. Thanks
Heyy, if you could, could you take a look at this stackoverflow.com/questions/37187918/…
1

Sure. Here's an example that just uses JSON.

For a basic single-field query, use a mapping:

{"fieldname": {"op": "=", "value": "somevalue"}}

For a compound query, do something like:

{"and": [
  {"field": {"op": "=", "value": "somevalue"}},
  {"field2": {"op": ">", "value": 9.22}},
  ]}

For a complex query, as in your example:

{
  "and": [
    {
      "index": {
        "op": "=",
        "value": "name"
      }
    },
    {
      "schema": {
        "op": "=",
        "value": "name"
      }
    },
    {
      "or": [
        {
          "field1": {
            "op": "=",
            "value": "name1"
          }
        },
        {
          "field1": {
            "op": "=",
            "value": "name2"
          }
        }
      ]
    },
    {
      "or": [
        {
          "field2": {
            "op": "<",
            "value": 9.22
          }
        },
        {
          "field2": {
            "op": ">=",
            "value": 2.32
          }
        }
      ]
    },
    {
      "or": [
        {
          "or": [
            {
              "field3": {
                "op": "<",
                "value": 9.22
              }
            },
            {
              "field3": {
                "op": ">=",
                "value": 2.32
              }
            }
          ]
        },
        {
          "or": [
            {
              "field3": {
                "op": "<",
                "value": 100
              }
            },
            {
              "field3": {
                "op": ">",
                "value": 90
              }
            }
          ]
        }
      ]
    }
  ]
}

4 Comments

Although this makes sense, when he'll be actually writing the queries, its going to be a nightmare. Better to just have a query language parser on the backend, to make the interface easier.
I don't actually think this would be a nightmare to handle, unless you are regularly making large compound queries by hand, and isn't substantially different in spirit from many existing solutions (the structure is almost identical to how LDAP filter strings are structured, for example).
The approach is the easiest way I would be able to do on the developer end. But like brainiac said, it would really hard for people to make such complex queries. (<9.22 and >=2.32) or (<100 and >90) , This is not a really big complex statement yet, it took up so many characters. I think it would be better if an end user uses some kind of query language for this
For the users you may provide an easy-to-use tool to generate the JSON string. But again I'd say a tailored query language is the best option, and the most difficult one.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.