I'm looking for a way to expose filtering functionality at my workplace to other developers and optionally customers.
Problem
i want to implement a simple query language over my data (python dicts) based on user defined filters exposed to my other developers and later on to our customers.
- the language should be simple enough to be used by non-developers
- safe enough to avoid remote code execution on my servers
- expressive enough to query data like the below example
Exposing SQL interface over my dict / json data would be great (I prefer not to setup server)
Example Scenario
db = [
{'first': 'john', 'last': 'doe', 'likes': ['cookies', 'http']},
{'first': 'jane', 'last': 'doe', 'likes': ['cookies', 'donuts']},
{'first': 'danny', 'last': 'foo', 'likes': ['http', 'donuts']},
]
query = '(first == "john" or last == "doe") and likes contains "cookies"'
results = run_query(db, query)
this should return (in results):
[
{'first': 'john', 'last': 'doe', 'likes': ['cookies', 'http']},
{'first': 'jane', 'last': 'doe', 'likes': ['cookies', 'donuts']},
]
note: i do not mind changing the operator names, e.g. or -> OR contains -> inside or anything else as long as it is human readable and keeps the same expressiveness of the language
Solutions I Tried
DSL
I looked at some DSL libraries like PLY but they seems to me too complex and involves some magic to get things done (not really sure where to start and if its worth it)
Plugins
didnt find any plugin system to expose a sandboxed functionality for my users (i.e. safer eval)
JSON Query Packages
I looked at TinyDB and others that implement some sort of SQL over json but couldn't find something that work without alot of customizations. I also looked at pandasql which seems good overall but unmaintained library :(
there is a lucene package parser - luqum based on PLY but its different from my syntax tree (they have more methods) and the lib is not really maintained, (I do consider manipulating this lib a bit to get what i want)
SQLite
use SQLiteDB to load all my data (in memory or not) and then run SQL queries over it. didnt test it but this should be pretty straightforward with the downside of loading my whole data into SQL just to run the data on which i prefer not to do.
I am open to suggestions or even on how to improve the above solution to make this work
pandasalready can give you a lot of functionality, especially theDataFrame.query(..)part.df.query(..)do you have any idea on how to do it?contains, etc.)