(retrospectively developing my solving comment)
CREATE FUNCTION pg_temp.…
Regarding the "in a database where I don't have access to create a function":
Contrary to some other RDBMS, PostgreSQL's clean architecture makes nearly no distinction between temporary objects and "lasting" ones.
In fact temporary objects have nothing special at all, they are just normal objects belonging to a special schema pg_temp that get discarded at the end of your session.
(more exactly: belonging to a pg_temp_xxx schema, created on the fly at session opening, and always aliases as pg_temp for practical purpose, as explained in the search_path doc)
Thus creating temporary tables, indexing them, even creating temporary functions or creating temporary extensions (well in that case we'd likely name that "loading temp extensions" rather than "creating" them), is possible.
pg_temp is your inalienable read-write playground, with the only downside that it lacks persistence across sessions;
but it is perfect for massive extraction-only scripted tasks with intermediate steps requiring more complexity than CTEs, for example when you want to index those intermediate views, or need to use them for two different extractions:
create temp table reworked as select /* complex data extraction */;
select /* aggregate query */ from reworked group by …; -- Dump that to summary.csv
select * from reworked; -- Dump that to details.csv
Some SQL create instruction accept a temp[orary] modifier that acts as syntactic sugar: create temp table xxx = create table pg_temp.xxx.
But you're free to directly target pg_temp:
create function pg_temp.json_is_well_formed(j text) …;
Or even:
set search_path to pg_temp, <your provider''s schema if applicable>, public;
create function json_is_well_formed(j text) …;
due to the special rule for explicitely mentioning pg_temp within search_path.
Be smart
… But, as I would have told you from an artisanal / intuitive point of view, and better expressed in @Laurenz Albe's more cartesian answer, if you plan to handle millions of entries try to optimize your attempts like branch prediction would within a processor.
For example, if you're confident that entries are always either well-formed JSON or XML, then simply detect < or { at the start and only attempt the appropriate format.
On the contrary, if you can have garbage, then the exception handling is mandatory, but you can perhaps still do different passes, to first handle all sure rows in one pass and then finish with one-by-one robust handling of the remains.
/!\ even xml_is_well_formed needs protection, as you can see in this fiddle:
Both {"json":1} and not at all will return true through xml_is_well_formed().
select version();SELECT VERSION();it'sPostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by Debian clang version 12.0.1, 64-bitCREATE FUNCTION pg_temp.JSON_IS_WELL_FORMED(…)?