I've created a custom type, gp to model the DND 5e currency system. I have defined custom input and output functions in gp.c:
#include "postgres.h"
#include <string.h>
#include "fmgr.h"
#include <stdio.h>
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
static const char* inputFormat = " %i %s2 ";
static const char* invalidFormat = "invalid input syntax for gp: \"%s\"";
PG_FUNCTION_INFO_V1(gp_input);
Datum gp_input(PG_FUNCTION_ARGS) {
char* raw = PG_GETARG_CSTRING(0);
int32 amt;
char unit[3];
if (sscanf(raw, inputFormat, &amt, &unit[0]) != 2) {
ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg(invalidFormat, raw)));
}
switch(unit[1]) {
case 'p':
break;
default:
ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg(invalidFormat, raw)));
}
switch(unit[0]) {
case 'c':
break;
case 's':
amt *= 10;
break;
case 'e':
amt *= 50;
break;
case 'g':
amt *= 100;
break;
case 'p':
amt *= 1000;
break;
default:
ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg(invalidFormat, raw)));
}
int32* result = (int32*)palloc(sizeof(int32));
*result = amt;
PG_RETURN_POINTER(result);
}
PG_FUNCTION_INFO_V1(gp_output);
Datum gp_output(PG_FUNCTION_ARGS) {
int32* raw = (int32*)PG_GETARG_POINTER(0);
int32 val = *raw;
unsigned int bufsz = sizeof(unsigned char)*9 + 2;// allow up to 999999999[pgsc]p
char* buf = (char*) palloc(bufsz+1); // +1 b/c '\0'
if (val >= 10 && val % 10 == 0) {
val /= 10;
if (val >= 10 && val % 10 == 0) {
val /= 10;
if (val >= 10 && val % 10 == 0) {
val /= 10;
if (sprintf(buf, "%dpp", val) <= 0) {
ereport(ERROR, (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER), errmsg("Bad value for gp")));
}
}
else {
if (sprintf(buf, "%dgp", val) <= 0) {
ereport(ERROR, (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER), errmsg("Bad value for gp")));
}
}
}
else {
if (sprintf(buf, "%dsp", val) <= 0) {
ereport(ERROR, (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER), errmsg("Bad value for gp")));
}
}
}
else {
if (sprintf(buf, "%dcp", val) <= 0) {
ereport(ERROR, (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER), errmsg("Bad value for gp")));
}
}
PG_RETURN_CSTRING(buf);
}
I know I'm not checking that the number is out-of-bounds or that the stored value fits in the buffer, but I'm not hitting that issue yet. My problem is that postgres seems to be editing, and in some cases corrupting, the values that I'm storing. I have this test SQL file:
DROP TYPE IF EXISTS gp CASCADE;
DROP TABLE IF EXISTS test;
CREATE TYPE gp;
CREATE FUNCTION gp_input(cstring) RETURNS gp AS '$libdir/gp.so' LANGUAGE C IMMUTABLE STRICT;
CREATE FUNCTION gp_output(gp) RETURNS cstring AS '$libdir/gp.so' LANGUAGE C IMMUTABLE STRICT;
CREATE TYPE gp (input=gp_input, output=gp_output);
CREATE TABLE test (val gp);
INSERT INTO test VALUES ('12sp'), ('100gp'), ('1000cp'), ('101cp');
SELECT * FROM test;
INSERT INTO test VALUES ('101sp');
The output of that SELECT is:
val
-------
12sp
10pp
1pp
212cp
(4 rows)
So we can see that all values were properly stored and represented except for the last one: 101cp gets stored as a pointer to the int32 value 212. Using ereport warnings, I was able to determine that right before the return in the input function, result points to the correct value: 101. However, the pointer passed as an argument to my output function points to a value I didn't store: 212. Somewhere between the end of my input code and the beginning of my output code, postgres corrupted that value. This always happens with the input string 101cp, independent of the table's state or any other values being inserted at the same time.
But now the really weird part; that last INSERT crashes the client. Upon parsing that gp value, it prints the error:
psql:./gptest.sql:15: ERROR: compressed data is corrupted
LINE 1: INSERT INTO test VALUES ('101sp');
^
This always happens with the value 101sp, regardless of table state or any other values being inserted alongside it. Using ereport warnings, I was able to see that right before the return statement, result points to the correct value: 1010. That also means that the crash is happening in that return macro expansion or in some under-the-hood code.
So I truly have no idea what's going on. I'm doing palloc so overwriting the memory shouldn't be allowed, and I can't think of any reason for values containing 101 to always have problems - and different problems depending on the units. An int32 should be capable of storing the small values I'm testing, so it's not that. Idk if this is how it's supposed to be implemented, but I have checked and the pointer being passed to the output is NOT the same as the address of the result pointer for any of these values, so I assume it's doing some kind of memcpy incorrectly under the hood, but then idk how anyone can be expected to define a custom base data type.