implement query language on python

Question

I'm looking for a way to expose filtering functionality at my workplace to other developers and optionally customers.

Problem

i want to implement a simple query language over my data (python dicts) based on user defined filters exposed to my other developers and later on to our customers.

the language should be simple enough to be used by non-developers
safe enough to avoid remote code execution on my servers
expressive enough to query data like the below example

Exposing SQL interface over my dict / json data would be great (I prefer not to setup server)

Example Scenario

db = [
  {'first': 'john', 'last': 'doe', 'likes': ['cookies', 'http']},
  {'first': 'jane', 'last': 'doe', 'likes': ['cookies', 'donuts']},
  {'first': 'danny', 'last': 'foo', 'likes': ['http', 'donuts']},
]

query = '(first == "john" or last == "doe") and likes contains "cookies"'
results = run_query(db, query)

this should return (in results):

[
  {'first': 'john', 'last': 'doe', 'likes': ['cookies', 'http']},
  {'first': 'jane', 'last': 'doe', 'likes': ['cookies', 'donuts']},
]

note: i do not mind changing the operator names, e.g. or -> OR contains -> inside or anything else as long as it is human readable and keeps the same expressiveness of the language

Solutions I Tried

DSL

I looked at some DSL libraries like PLY but they seems to me too complex and involves some magic to get things done (not really sure where to start and if its worth it)

Plugins

didnt find any plugin system to expose a sandboxed functionality for my users (i.e. safer eval)

JSON Query Packages

I looked at TinyDB and others that implement some sort of SQL over json but couldn't find something that work without alot of customizations. I also looked at pandasql which seems good overall but unmaintained library :(

there is a lucene package parser - luqum based on PLY but its different from my syntax tree (they have more methods) and the lib is not really maintained, (I do consider manipulating this lib a bit to get what i want)

SQLite

use SQLiteDB to load all my data (in memory or not) and then run SQL queries over it. didnt test it but this should be pretty straightforward with the downside of loading my whole data into SQL just to run the data on which i prefer not to do.

I am open to suggestions or even on how to improve the above solution to make this work

Probably using pandas already can give you a lot of functionality, especially the DataFrame.query(..) part. — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Jan 28, 2018 at 21:35
thanks @WillemVanOnsem but can i pass a query string to it? if not and I should parse the query string before passing to the df.query(..) do you have any idea on how to do it? — ShmulikA
– ShmulikA, Commented Jan 28, 2018 at 21:42
Yes, it does not work however for membership checks (contains, etc.) — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Jan 28, 2018 at 21:43

Raymond Hettinger · Accepted Answer · 2020-01-27 08:48:39Z

5

Before using PLY for text based queries, I would build-up the core language from regular Python classes like this:

class Match:
    def __init__(self, **target):
        [[self.key, self.value]] = target.items()
    def __call__(self, obj):
        return self.key in obj and self.value == obj[self.key]

class Contains:        
    def __init__(self, **target):
        [[self.key, self.value]] = target.items()
    def __call__(self, obj):
        return self.key in obj and self.value in obj[self.key]        

class Or:
    def __init__(self, *predicates):
        self.predicates = predicates
    def __call__(self, record):
        return any(predicate(record) for predicate in self.predicates)

class And:
    def __init__(self, *predicates):
        self.predicates = predicates
    def __call__(self, record):
        return all(predicate(record) for predicate in self.predicates)

def run_query(db, query):
    return filter(query, db)

if __name__ == '__main__':

    db = [
      {'first': 'john', 'last': 'doe', 'likes': ['cookies', 'http']},
      {'first': 'jane', 'last': 'doe', 'likes': ['cookies', 'donuts']},
      {'first': 'danny', 'last': 'foo', 'likes': ['http', 'donuts']},
    ]
    query = And(Or(Match(first='john'), Match(last='doe')), Contains(likes='cookies'))
    for result in run_query(db, query):
        print(result)

This outputs:

{'first': 'john', 'last': 'doe', 'likes': ['cookies', 'http']}
{'first': 'jane', 'last': 'doe', 'likes': ['cookies', 'donuts']}

answered Jan 27, 2020 at 8:48

Raymond Hettinger

229k67 gold badges405 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ShmulikA Over a year ago

thanks @Raymond Hettinger for the elegant & pythonic solution. actually i ended up using PLY which worked great (required some magic on some cases). simple example: hand.name = 'full-house'

jpp · Accepted Answer · 2018-01-28 22:16:12Z

0

SQL is widely known and commonly requested by end users. I know of a couple of options which makes this possible in python. They rely on external libraries, but are well supported.

Small data solution

Input dictionary into a pd.DataFrame, e.g. see pd.DataFrame.from_dict. Then query via pandasql library. It seems you've tried this already, but I mention it as (in my experience) it does what it says.

Big data solution

Save your data in HDF5 format. pandas dataframes and numpy arrays can be easily stored in HDF5 format via h5py library. Then use HDFql library to query the HDF5 file.

edited Jan 28, 2018 at 22:16

answered Jan 28, 2018 at 21:48

jpp

166k37 gold badges301 silver badges363 bronze badges

1 Comment

ShmulikA Over a year ago

thanks but I prefer not to use pandasql as it is unmaintained. I might implement a syntax using PLY

Collectives™ on Stack Overflow

implement query language on python

Problem

Example Scenario

Solutions I Tried

DSL

Plugins

JSON Query Packages

SQLite

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Problem

Example Scenario

Solutions I Tried

DSL

Plugins

JSON Query Packages

SQLite

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related