9

I have a table :

table

---------------------
id  | value
---------------------
1   | invalid_json
---------------------
2   | valid_json
---------------------
3   | invalid_json
---------------------

First of all, value is in varchar type not really declared as json, and it has some reasons why it is set up like that. Anyway, my question is about the possibility, and if possible how. Is it possible to create an sql to find only the rows that contains a VALID json formatted data even though the column data type is var char?

A sort of :

"select * from table where (condition that data is a valid json)";
1
  • 1
    The only thing I can think of is to write a function that tries to cast the value to JSON and catches any error that occurs. It will work but it will be very slow I guess. Commented Oct 21, 2014 at 15:37

2 Answers 2

12

As a_horse_with_no_name stated, you can write a function trying to cast to json and return a result based on the success of that operation.

CREATE FUNCTION is_json(varchar) RETURNS boolean AS $$
  DECLARE
    x json;
  BEGIN
    BEGIN
      x := $1;
    EXCEPTION WHEN others THEN
      RETURN FALSE;
    END;

    RETURN TRUE;
  END;
$$ LANGUAGE plpgsql IMMUTABLE;

Making it IMMUTABLE will make it operate quickly for repeated strings (such as an empty string for example), but it highly depends on the size of your table.

Then you can query your data.

SELECT * FROM table WHERE is_json(value);

If the table is big and you are about to repeat that query a lot of times, I would add an additional is_json boolean field to the table. Then create a trigger/rule to check the validity upon INSERT/UPDATE on that table.

However, having mixed data types in the same column is not a good idea, mind changing your data structure in case you are considering such a scenario.

Sign up to request clarification or add additional context in comments.

6 Comments

wow this is really good, anyway to further state my case. I'm actually doing data curation. our old data model only accepts strings as value, but later on we decided to use json to support a feature, so i'm actually trying to find a systematic way to find invalid and valid json data types in the table to begin my data curation. further later on we'll decide on changing the data type to json. Thanks! nice answer
Beware that this can be quite slow, because it's making a subtransaction for each test. Do not use this on huge tables.
In fact the PERFORM statement was a bit redundant. Simply assigning the value to a variable should be better in terms of performance. I edited the answer.
@Craig-Ringer This is postgresql though, right? Last I checked postgresql didn't have subtransactions. Has that changed?
@CraigRinger Thanks for the response! I'm having a hard time finding mention of subtractions in the postgresql docs though. (aside from the experimental "autonomous subtransactions" mentioned in the wiki) Are you referring to some internal structure generated by the EXCEPTION clause (probably that is alluded to by the tip regarding cost of EXCEPTION clauses on this page: postgresql.org/docs/9.4/static/plpgsql-control-structures.html)? Not trying to argue, just trying to pin down what exactly is causing the "subtransaction" and what that means.
|
9

I recently solved a similar problem by doing a simple check on the string for curly braces:

WHERE value LIKE '{%}'

This of course depends on the data you expect, and will not match all valid JSON nor exclude all non-JSON. In my case I had a field that used to take a simple character string (still present in old records) but now takes a JSON object wrapped in curly braces. If your case is like mine--you know some specifics about what the valid and invalid data look like--you might do it this way.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.