2

I need to be able to extract elements from a text column that is sometimes XML and sometimes JSON; along the lines of

SELECT
    CASE
        WHEN XML_IS_WELL_FORMED(text_column)
        THEN ARRAY_TO_STRING( XPATH( 'A/B/C/text()', text_column ), ',' )
        WHEN JSON_IS_WELL_FORMED(text_column)
        THEN text_column::JSON->'A'->'B'->>'C'
    END AS extracted_element
FROM some_table

And I have to do it in PostgreSQL 15.7 which is too old to support text_column IS JSON, in a database where I don't have access to create a function.

How can I check whether text is valid JSON in a WHEN expression without using IS JSON or CREATE FUNCTION?


Editing because one of you closed this as a duplicate of a question which was answered using features I already said I can't use. This is not a duplicate of "how can I CREATE FUNCTION something that checks for valid JSON", this is "how can I check for valid JSON when I don't have access to CREATE FUNCTION"

8
  • Are we going to have to guess what version of Postgres you do have? Commented May 29 at 19:29
  • @Adrian Klaver I only have access to run queries, is there a query that will tell me? Commented May 29 at 20:24
  • select version(); Commented May 29 at 20:29
  • According to SELECT VERSION(); it's PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by Debian clang version 12.0.1, 64-bit Commented May 29 at 20:29
  • 1
    Are you even prevented to CREATE FUNCTION pg_temp.JSON_IS_WELL_FORMED(…)? Commented May 29 at 22:06

2 Answers 2

2

You should upgrade to PostgreSQL v16 or better, where you could use pg_input_is_valid():

SELECT pg_input_is_valid('{"a":42}', 'json'),
       pg_input_is_valid('{a:42}', 'json');
 pg_input_is_valid │ pg_input_is_valid 
═══════════════════╪═══════════════════
 t                 │ f
(1 row)

For older versions, the best you can do it write a function with an exception handler:

CREATE FUNCTION is_valid_json(text) RETURNS boolean
   LANGUAGE plpgsql AS
$$BEGIN
   PERFORM CAST ($1 AS json);
   RETURN TRUE;
EXCEPTION WHEN invalid_text_representation THEN
   RETURN FALSE;
END;$$;

The exception handler will create a subtransaction, so don't call that function too often in a single transaction if you want decent performance.

Sign up to request clarification or add additional context in comments.

3 Comments

This is a database I'm hired to work in, which the company is paying a 3rd party to host; I meant it when I said I have to do it in the provided version
Sure, but at some point you will still have to upgrade. Perhaps the added benefit is enough to warrant doing it now.
Upvoting the DIY is_valid_json() backport but I'd prefer a generic my_input_is_valid(text,regtype) then create function is_valid_json(text) returns boolean return my_input_is_valid($1,'json'); if that's convenient.
1

(retrospectively developing my solving comment)

CREATE FUNCTION pg_temp.…

Regarding the "in a database where I don't have access to create a function":

Contrary to some other RDBMS, PostgreSQL's clean architecture makes nearly no distinction between temporary objects and "lasting" ones.

In fact temporary objects have nothing special at all, they are just normal objects belonging to a special schema pg_temp that get discarded at the end of your session.
(more exactly: belonging to a pg_temp_xxx schema, created on the fly at session opening, and always aliases as pg_temp for practical purpose, as explained in the search_path doc)

Thus creating temporary tables, indexing them, even creating temporary functions or creating temporary extensions (well in that case we'd likely name that "loading temp extensions" rather than "creating" them), is possible.

pg_temp is your inalienable read-write playground, with the only downside that it lacks persistence across sessions;
but it is perfect for massive extraction-only scripted tasks with intermediate steps requiring more complexity than CTEs, for example when you want to index those intermediate views, or need to use them for two different extractions:

create temp table reworked as select /* complex data extraction */;
select /* aggregate query */ from reworked group by …; -- Dump that to summary.csv
select * from reworked; -- Dump that to details.csv

Some SQL create instruction accept a temp[orary] modifier that acts as syntactic sugar: create temp table xxx = create table pg_temp.xxx.
But you're free to directly target pg_temp:

create function pg_temp.json_is_well_formed(j text) …;

Or even:

set search_path to pg_temp, <your provider''s schema if applicable>, public;
create function json_is_well_formed(j text) …;

due to the special rule for explicitely mentioning pg_temp within search_path.

Be smart

… But, as I would have told you from an artisanal / intuitive point of view, and better expressed in @Laurenz Albe's more cartesian answer, if you plan to handle millions of entries try to optimize your attempts like branch prediction would within a processor.

For example, if you're confident that entries are always either well-formed JSON or XML, then simply detect < or { at the start and only attempt the appropriate format.
On the contrary, if you can have garbage, then the exception handling is mandatory, but you can perhaps still do different passes, to first handle all sure rows in one pass and then finish with one-by-one robust handling of the remains.

/!\ even xml_is_well_formed needs protection, as you can see in this fiddle:
Both {"json":1} and not at all will return true through xml_is_well_formed().

3 Comments

Detecting < or { is what I started with, I asked this question when that started giving me exceptions; it turns out our data is only almost always valid XML or JSON. I also discovered the web interface to the deployed database resets the connection for every query so I can include CREATE FUNCTION, but using pgAdmin on my local reuses the connection so there is use CREATE OR REPLACE FUNCTION
Your case looks like mine: 3-month rolling logs of SOAP and REST calls, some truncated to 4000 chars (= unclosed < or {), which I needed to extract some key fields (to then have a searchable table for incidents diagnosing). Every 20 mn I ran a script that listed new frames, and, by batches of 1000 rows, attempted the extraction. If a batch failed, its rows were then re-attempted by a degraded row by row pass. Any row still failing was then marked "unusable, never to retry". The big plus was that my hosting provided another schema where I could write to. Do you have a way to persist anything?
Yeah, it's a similar extracting from log-like records, I don't think I have a way to persist anything, but this is for things like finding how often a condition that requires manual intervention is occurring or confirming that a fix has reduced them, not things we need ongoing records of

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.