summaryrefslogtreecommitdiff
path: root/src/include
AgeCommit message (Collapse)Author
2019-06-06Fix confusion on different kinds of slots in IndexOnlyScans.Heikki Linnakangas
We used the same slot to store a tuple from the index, and to store a tuple from the table. That's not OK. It worked with the heap, because heapam_getnextslot() stores a HeapTuple to the slot, and doesn't care how large the tts_values/nulls arrays are. But when I played with a toy table AM implementation that used a virtual tuple, it caused memory overruns. In the passing, tidy up comments on the ioss_PscanLen fields.
2019-06-04Add command column to pg_stat_progress_create_indexPeter Eisentraut
This allows determining which command is running, similar to pg_stat_progress_cluster. Discussion: https://www.postgresql.org/message-id/flat/f0e56b3b-74b7-6cbc-e207-a5ed6bee18dc%402ndquadrant.com
2019-06-04Fix some typos and inconsistencies in tableam.hMichael Paquier
The defined callback definitions have been using references to heap for a couple of variables and comments. This makes the whole interface more consistent by using "table" which is more generic. A variable storing index information was misspelled as well. Author: Michael Paquier Discussion: https://postgr.es/m/20190601190946.GB1905@paquier.xyz
2019-06-03Update SQL conformance information about JSON pathPeter Eisentraut
Reviewed-by: Oleg Bartunov <obartunov@postgrespro.ru>
2019-06-03Fix typos in various placesMichael Paquier
Author: Andrea Gelmini Reviewed-by: Michael Paquier, Justin Pryzby Discussion: https://postgr.es/m/20190528181718.GA39034@glet
2019-05-31Fix incorrect parameter name in commentDavid Rowley
Author: Antonin Houska Discussion: https://postgr.es/m/22370.1559293357@localhost
2019-05-30Remove unnecessary (and wrong) forward declaration.Andres Freund
Interestingly only C++ compilers have, so far, complained about this odd forward declaration. This originated when IndexBuildCallback was defined in another file, but now is completely unnecessary (but was wrong before too, cpluspluscheck just wouldn't have noticed). Reported-By: Tom Lane Discussion: https://postgr.es/m/53941.1559239260@sss.pgh.pa.us
2019-05-26Fix typos.Amit Kapila
Reported-by: Alexander Lakhin Author: Alexander Lakhin Reviewed-by: Amit Kapila and Tom Lane Discussion: https://postgr.es/m/7208de98-add8-8537-91c0-f8b089e2928c@gmail.com
2019-05-23tableam: Rename wrapper functions to match callback names.Andres Freund
Some of the wrapper functions didn't match the callback names. Many of them due to staying "consistent" with historic naming of the wrapped functionality. We decided that for most cases it's more important to be for tableam to be consistent going forward, than with the past. The one exception is beginscan/endscan/... because it'd have looked odd to have systable_beginscan/endscan/... with a different naming scheme, and changing the systable_* APIs would have caused way too much churn (including breaking a lot of external users). Author: Ashwin Agrawal, with some small additions by Andres Freund Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CALfoeiugyrXZfX7n0ORCa4L-m834dzmaE8eFdbNR6PMpetU4Ww@mail.gmail.com
2019-05-22Initial pgperltidy run for v12.Tom Lane
Make all the perl code look nice, too (for some value of "nice").
2019-05-22Phase 2 pgindent run for v12.Tom Lane
Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22Initial pgindent run for v12.Tom Lane
This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-22Fix O(N^2) performance issue in pg_publication_tables view.Tom Lane
The original coding of this view relied on a correlated IN sub-query. Our planner is not very bright about correlated sub-queries, and even if it were, there's no way for it to know that the output of pg_get_publication_tables() is duplicate-free, making the de-duplicating semantics of IN unnecessary. Hence, rewrite as a LATERAL sub-query. This provides circa 100X speedup for me with a few hundred published tables (the whole regression database), and things would degrade as roughly O(published_relations * all_relations) beyond that. Because the rules.out expected output changes, force a catversion bump. Ordinarily we might not want to do that post-beta1; but we already know we'll be doing a catversion bump before beta2 to fix pg_statistic_ext issues, so it's pretty much free to fix it now instead of waiting for v13. Per report and fix suggestion from PegoraroF10. Discussion: https://postgr.es/m/1551385426763-0.post@n3.nabble.com
2019-05-22In transam.h, don't expose static inline functions to frontend code.Tom Lane
That leads to unsatisfied external references if the C compiler fails to elide unused static functions. Apparently, we have no buildfarm members building HEAD that have that issue ... but such compilers still exist in the wild. Need to do something about that. In passing, fix Berkeley-era typo in comment. Discussion: https://postgr.es/m/27054.1558533367@sss.pgh.pa.us
2019-05-21tableam: Move heap-specific logic from needs_toast_table below tableam.Robert Haas
This allows table AMs to completely suppress TOAST table creation, or to modify the conditions under which they are created. Patch by me. Reviewed by Andres Freund. Discussion: http://postgr.es/m/CA+Tgmoa4O2n=yphqD2pERUnYmUO84bH1SqMsA-nSxBGsZ7gWfA@mail.gmail.com
2019-05-20Stamp 12beta1.Tom Lane
2019-05-19Fix and improve SnapshotType comments.Andres Freund
The comment for SNAPSHOT_SELF was unfortunately explaining SNAPSHOT_DIRTY, as reported by Sergei. Also expand a few comments, and include a few more comments from heapam_visibility.c, so they're in an AM independent place. Reported-By: Sergei Kornilov Author: Andres Freund Discussion: https://postgr.es/m/9152241558192351@sas1-d856b3d759c7.qloud-c.yandex.net
2019-05-19Don't to predicate lock for analyze scans, refactor scan option passing.Andres Freund
Before this commit, when ANALYZE was run on a table and serializable was used (either by virtue of an explicit BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE, or default_transaction_isolation being set to serializable) a null pointer dereference lead to a crash. The analyze scan doesn't need a snapshot (nor predicate locking), but before this commit a scan only contained information about being a bitmap or sample scan. Refactor the option passing to the scan_begin callback to use a bitmask instead. Alternatively we could have added a new boolean parameter, but that seems harder to read. Even before this issue various people (Heikki, Tom, Robert) suggested doing so. These changes don't change the scan APIs outside of tableam. The flags argument could be exposed, it's not necessary to fix this problem. Also the wrapper table_beginscan* functions encapsulate most of that complexity. After these changes fixing the bug is trivial, just don't acquire predicate lock for analyze style scans. That was already done for bitmap heap scans. Add an assert that a snapshot is passed when acquiring the predicate lock, so this kind of bug doesn't require running with serializable. Also add a comment about sample scans currently requiring predicate locking the entire relation, that previously wasn't remarked upon. Reported-By: Joe Wildish Author: Andres Freund Discussion: https://postgr.es/m/4EA80A20-E9BF-49F1-9F01-5B66CAB21453@elusive.cx https://postgr.es/m/20190411164947.nkii4gaeilt4bui7@alap3.anarazel.de https://postgr.es/m/20190518203102.g7peu2fianukjuxm@alap3.anarazel.de
2019-05-18tableam: Avoid relying on relation size to determine validity of tids.Andres Freund
Instead add a tableam callback to do so. To avoid adding per validation overhead, pass a scan to tuple_tid_valid. In heap's case we'd otherwise incurred a RelationGetNumberOfBlocks() call for each tid - which'd have added noticable overhead to nodeTidscan.c. Author: Andres Freund Reviewed-By: Ashwin Agrawal Discussion: https://postgr.es/m/20190515185447.gno2jtqxyktylyvs@alap3.anarazel.de
2019-05-18tableam: Don't assume that every AM uses md.c style storage.Andres Freund
Previously various parts of the code routed size requests through RelationGetNumberOfBlocks[InFork]. That works if md.c is used by the AM, but not otherwise. Add a tableam callback to return the size of the table. As not every AM will use postgres' BLCKSZ, have it return bytes, and have RelationGetNumberOfBlocksInFork() round the byte size up into blocks. To allow code outside of the AM to determine the actual relation size map InvalidForkNumber the total size of a relation, as not every AM might just need the postgres defined forks. A few users of RelationGetNumberOfBlocks() ought to be converted away from that. One case, the use of it to determine whether a tid is valid, will be fixed in a follow up commit. Others will have to wait for v13. Author: Andres Freund Discussion: https://postgr.es/m/20190423225201.3bbv6tbqzkb5w7cw@alap3.anarazel.de
2019-05-17Restructure creation of run-time pruning steps.Tom Lane
Previously, gen_partprune_steps() always built executor pruning steps using all suitable clauses, including those containing PARAM_EXEC Params. This meant that the pruning steps were only completely safe for executor run-time (scan start) pruning. To prune at executor startup, we had to ignore the steps involving exec Params. But this doesn't really work in general, since there may be logic changes needed as well --- for example, pruning according to the last operator's btree strategy is the wrong thing if we're not applying that operator. The rules embodied in gen_partprune_steps() and its minions are sufficiently complicated that tracking their incremental effects in other logic seems quite impractical. Short of a complete redesign, the only safe fix seems to be to run gen_partprune_steps() twice, once to create executor startup pruning steps and then again for run-time pruning steps. We can save a few cycles however by noting during the first scan whether we rejected any clauses because they involved exec Params --- if not, we don't need to do the second scan. In support of this, refactor the internal APIs in partprune.c to make more use of passing information in the GeneratePruningStepsContext struct, rather than as separate arguments. This is, I hope, the last piece of our response to a bug report from Alan Jackson. Back-patch to v11 where this code came in. Discussion: https://postgr.es/m/FAD28A83-AC73-489E-A058-2681FA31D648@tvsquared.com
2019-05-15Remove no-longer-used typedef.Tom Lane
struct ClonedConstraint is no longer needed, so delete it. Discussion: https://postgr.es/m/18102.1557947143@sss.pgh.pa.us
2019-05-14Move logging.h and logging.c from src/fe_utils/ to src/common/.Tom Lane
The original placement of this module in src/fe_utils/ is ill-considered, because several src/common/ modules have dependencies on it, meaning that libpgcommon and libpgfeutils now have mutual dependencies. That makes it pointless to have distinct libraries at all. The intended design is that libpgcommon is lower-level than libpgfeutils, so only dependencies from the latter to the former are acceptable. We already have the precedent that fe_memutils and a couple of other modules in src/common/ are frontend-only, so it's not stretching anything out of whack to treat logging.c as a frontend-only module in src/common/. To the extent that such modules help provide a common frontend/backend environment for the rest of common/ to use, it's a reasonable design. (logging.c does not yet provide an ereport() emulation, but one can dream.) Hence, move these files over, and revert basically all of the build-system changes made by commit cc8d41511. There are no places that need to grow new dependencies on libpgcommon, further reinforcing the idea that this is the right solution. Discussion: https://postgr.es/m/a912ffff-f6e4-778a-c86a-cf5c47a12933@2ndquadrant.com
2019-05-14Update SQL features/conformance information to SQL:2016Peter Eisentraut
2019-05-14Detect internal GiST page splits correctly during index build.Heikki Linnakangas
As we descend the GiST tree during insertion, we modify any downlinks on the way down to include the new tuple we're about to insert (if they don't cover it already). Modifying an existing downlink might cause an internal page to split, if the new downlink tuple is larger than the old one. If that happens, we need to back up to the parent and re-choose a page to insert to. We used to detect that situation, thanks to the NSN-LSN interlock normally used to detect concurrent page splits, but that got broken by commit 9155580fd5. With that commit, we now use a dummy constant LSN value for every page during index build, so the LSN-NSN interlock no longer works. I thought that was OK because there can't be any other backends modifying the index during index build, but missed that the insertion itself can modify the page we're inserting to. The consequence was that we would sometimes insert the new tuple to an incorrect page, one whose downlink doesn't cover the new tuple. To fix, add a flag to the stack that keeps track of the state while descending tree, to indicate that a page was split, and that we need to retry the descend from the parent. Thomas Munro first reported that the contrib/intarray regression test was failing occasionally on the buildfarm after commit 9155580fd5. The failure was intermittent, because the gistchoose() function is not deterministic, and would only occasionally create the right circumstances for this bug to cause the failure. Patch by Anastasia Lubennikova, with some changes by me to make it work correctly also when the internal page split also causes the "grandparent" to be split. Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJRzLo7tZExWfSbwM3XuK7aAK7FhdBV0FLkbUG%2BW0v0zg%40mail.gmail.com
2019-05-14Fix duplicated words in commentsMichael Paquier
Author: Stephen Amell Discussion: https://postgr.es/m/539fa271-21b3-777e-a468-d96cffe9c768@gmail.com
2019-05-13Standardize ItemIdData terminology.Peter Geoghegan
The term "item pointer" should not be used to refer to ItemIdData variables, since that is needlessly ambiguous. Only ItemPointerData/ItemPointer variables should be called item pointers. To fix, establish the convention that ItemIdData variables should always be referred to either as "item identifiers" or "line pointers". The term "item identifier" already predominates in docs and translatable messages, and so should be the preferred alternative there. Discussion: https://postgr.es/m/CAH2-Wz=c=MZQjUzde3o9+2PLAPuHTpVZPPdYxN=E4ndQ2--8ew@mail.gmail.com
2019-05-13Improve comment for att_isnull.Robert Haas
The comment implies that a 1 in the null bitmap indicates a null value, but actually a 0 in the null bitmap indicates a null value. Try to be more clear. Patch by me; proposed wording reviewed by Alvaro Herrera and Tom Lane. Discussion: http://postgr.es/m/CA+TgmobHOP8r6cG+UnsDFMrS30-m=jRrCBhgw-nFkn0k9QnFsg@mail.gmail.com
2019-05-12Rearrange pgstat_bestart() to avoid failures within its critical section.Tom Lane
We long ago decided to design the shared PgBackendStatus data structure to minimize the cost of writing status updates, which means that writers just have to increment the st_changecount field twice. That isn't hooked into any sort of resource management mechanism, which means that if something were to throw error between the two increments, the st_changecount field would be left odd indefinitely. That would cause readers to lock up. Now, since it's also a bad idea to leave the field odd for longer than absolutely necessary (because readers will spin while we have it set), the expectation was that we'd treat these segments like spinlock critical sections, with only short, more or less straight-line, code in them. That was fine as originally designed, but commit 9029f4b37 broke it by inserting a significant amount of non-straight-line code into pgstat_bestart(), code that is very capable of throwing errors, not to mention taking a significant amount of time during which readers will spin. We have a report from Neeraj Kumar of readers actually locking up, which I suspect was due to an encoding conversion error in X509_NAME_to_cstring, though conceivably it was just a garden-variety OOM failure. Subsequent commits have loaded even more dubious code into pgstat_bestart's critical section (and commit fc70a4b0d deserves some kind of booby prize for managing to miss the critical section entirely, although the negative consequences seem minimal given that the PgBackendStatus entry should be seen by readers as inactive at that point). The right way to fix this mess seems to be to compute all these values into a local copy of the process' PgBackendStatus struct, and then just copy the data back within the critical section proper. This plan can't be implemented completely cleanly because of the struct's heavy reliance on out-of-line strings, which we must initialize separately within the critical section. But still, the critical section is far smaller and safer than it was before. In hopes of forestalling future errors of the same ilk, rename the macros for st_changecount management to make it more apparent that the writer-side macros create a critical section. And to prevent the worst consequences if we nonetheless manage to mess it up anyway, adjust those macros so that they really are a critical section, ie they now bump CritSectionCount. That doesn't add much overhead, and it guarantees that if we do somehow throw an error while the counter is odd, it will lead to PANIC and a database restart to reset shared memory. Back-patch to 9.5 where the problem was introduced. In HEAD, also fix an oversight in commit b0b39f72b: it failed to teach pgstat_read_current_status to copy st_gssstatus data from shared memory to local memory. Hence, subsequent use of that data within the transaction would potentially see changing data that it shouldn't see. Discussion: https://postgr.es/m/CAPR3Wj5Z17=+eeyrn_ZDG3NQGYgMEOY6JV6Y-WRRhGgwc16U3Q@mail.gmail.com
2019-05-10Fix and improve description of locktag types in lock.hMichael Paquier
The description of the lock type for speculative insertions was incorrect, being copy-pasted from another one. As discussed, also move the description for all the fields of lock tag types from the structure listing lock tag types to the set of macros setting each LOCKTAG. Author: John Naylor Discussion: https://postgr.es/m/CACPNZCtA0-ybaC4fFfaDq_8p_TUOLvGxZH9Dm-=TMHZJarBa7Q@mail.gmail.com
2019-05-09Clean up the behavior and API of catalog.c's is-catalog-relation tests.Tom Lane
The right way for IsCatalogRelation/Class to behave is to return true for OIDs less than FirstBootstrapObjectId (not FirstNormalObjectId), without any of the ad-hoc fooling around with schema membership. The previous code was wrong because (1) it claimed that information_schema tables were not catalog relations but their toast tables were, which is silly; and (2) if you dropped and recreated information_schema, which is a supported operation, the behavior changed. That's even sillier. With this definition, "catalog relations" are exactly the ones traceable to the postgres.bki data, which seems like what we want. With this simplification, we don't actually need access to the pg_class tuple to identify a catalog relation; we only need its OID. Hence, replace IsCatalogClass with "IsCatalogRelationOid(oid)". But keep IsCatalogRelation as a convenience function. This allows fixing some arguably-wrong semantics in contrib/sepgsql and ReindexRelationConcurrently, which were using an IsSystemNamespace test where what they really should be using is IsCatalogRelationOid. The previous coding failed to protect toast tables of system catalogs, and also was not on board with the general principle that user-created tables do not become catalogs just by virtue of being renamed into pg_catalog. We can also get rid of a messy hack in ReindexMultipleTables. While we're at it, also rename IsSystemNamespace to IsCatalogNamespace, because the previous name invited confusion with the more expansive semantics used by IsSystemRelation/Class. Also improve the comments in catalog.c. There are a few remaining places in replication-related code that are special-casing OIDs below FirstNormalObjectId. I'm inclined to think those are wrong too, and if there should be any special case it should just extend to FirstBootstrapObjectId. But first we need to debate whether a FOR ALL TABLES publication should include information_schema. Discussion: https://postgr.es/m/21697.1557092753@sss.pgh.pa.us Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
2019-05-08Add missing periods to comments.Etsuro Fujita
2019-05-07Add TRUNCATE parameter to VACUUM.Fujii Masao
This commit adds new parameter to VACUUM command, TRUNCATE, which specifies that VACUUM should attempt to truncate off any empty pages at the end of the table and allow the disk space for the truncated pages to be returned to the operating system. This parameter, if specified, overrides the vacuum_truncate reloption. If neither the reloption nor the VACUUM option is used, the default is true, as before. Author: Fujii Masao Reviewed-by: Julien Rouhaud, Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoD+qtrSDL=GSma4Wd3kLYLeRC0hPna-YAdkDeV4z156vg@mail.gmail.com
2019-05-07Revert "Avoid the creation of the free space map for small heap relations".Amit Kapila
This feature was using a process local map to track the first few blocks in the relation. The map was reset each time we get the block with enough freespace. It was discussed that it would be better to track this map on a per-relation basis in relcache and then invalidate the same whenever vacuum frees up some space in the page or when FSM is created. The new design would be better both in terms of API design and performance. List of commits reverted, in reverse chronological order: 06c8a5090e Improve code comments in b0eaa4c51b. 13e8643bfc During pg_upgrade, conditionally skip transfer of FSMs. 6f918159a9 Add more tests for FSM. 9c32e4c350 Clear the local map when not used. 29d108cdec Update the documentation for FSM behavior.. 08ecdfe7e5 Make FSM test portable. b0eaa4c51b Avoid creation of the free space map for small heap relations. Discussion: https://postgr.es/m/20190416180452.3pm6uegx54iitbt5@alap3.anarazel.de
2019-05-03Remove RelationSetIndexList().Tom Lane
In the wake of commit f912d7dec, RelationSetIndexList isn't used any more. It was always a horrid wart, so getting rid of it is very nice. We can also convert rd_indexvalid back to a plain boolean. Discussion: https://postgr.es/m/28926.1556664156@sss.pgh.pa.us
2019-05-01Fix union for pgstat message typesMagnus Hagander
The message type for temp files and for checksum failures were missing from the union. Due to the coding style used there was no compiler error when this happend. So change the code to actively use the union thereby producing a compiler error if the same mistake happens again, suggested by Tom Lane. Author: Julien Rouhaud Reported-By: Tomas Vondra Discussion: https://postgr.es/m/20190430163328.zd4rrlnbvgaqlcdz@development
2019-04-30Fix several recently introduced issues around handling new relation forks.Andres Freund
Most of these stem from d25f519107 "tableam: relation creation, VACUUM FULL/CLUSTER, SET TABLESPACE.". 1) To pass data to the relation_set_new_filenode() RelationSetNewRelfilenode() was made to update RelationData.rd_rel directly. That's not OK however, as it makes the relcache entries temporarily inconsistent. Which among other scenarios is a problem if a REINDEX targets an index on pg_class - the CatalogTupleUpdate() in RelationSetNewRelfilenode(). Presumably that was introduced because other places in the code do so - while those aren't "good practice" they don't appear to be actively buggy (e.g. because system tables may not be targeted). I (Andres) should have caught this while reviewing and signficantly evolving the code in that commit, mea culpa. Fix that by instead passing in the new RelFileNode as separate argument to relation_set_new_filenode() and rely on the relcache to update the catalog entry. Also revert that the RelationMapUpdateMap() call was changed to immediate, and undo some other more unnecessary changes. 2) Document that the relation_set_new_filenode cannot rely on the whole relcache entry to be valid. It might be worthwhile to refactor the code to never have to rely on that, but given the way heap_create() is currently coded, that'd be a large change. 3) ATExecSetTableSpace() shouldn't do FlushRelationBuffers() itself. A table AM might not use shared buffers at all. Move to index_copy_data() and heapam_relation_copy_data(). 4) heapam_relation_set_new_filenode() previously sometimes accessed rel->rd_rel->relpersistence rather than the `persistence` argument. Code movement mistake. 5) Previously heapam_relation_set_new_filenode() re-opened the smgr relation to create the init for, if necesary. Instead have RelationCreateStorage() return the SMgrRelation and use it to create the init fork. 6) Add a note about the danger of modifying the relcache directly to ATExecSetTableSpace() - it's currently not a bug because there's a check ERRORing for catalog tables. Regression tests and assertion improvements that together trigger the bug described in 1) will be added in a later commit, as there is a related bug on all branches. Reported-By: Michael Paquier Diagnosed-By: Tom Lane and Andres Freund Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20190418011430.GA19133@paquier.xyz
2019-04-29In walreceiver, don't try to do ereport() in a signal handler.Tom Lane
This is quite unsafe, even for the case of ereport(FATAL) where we won't return control to the interrupted code, and despite this code's use of a flag to restrict the areas where we'd try to do it. It's possible for example that we interrupt malloc or free while that's holding a lock that's meant to protect against cross-thread interference. Then, any attempt to do malloc or free within ereport() will result in a deadlock, preventing the walreceiver process from exiting in response to SIGTERM. We hypothesize that this explains some hard-to-reproduce failures seen in the buildfarm. Hence, get rid of the immediate-exit code in WalRcvShutdownHandler, as well as the logic associated with WalRcvImmediateInterruptOK. Instead, we need to take care that potentially-blocking operations in the walreceiver's data transmission logic (libpqwalreceiver.c) will respond reasonably promptly to the process's latch becoming set and then call ProcessWalRcvInterrupts. Much of the needed code for that was already present in libpqwalreceiver.c. I refactored things a bit so that all the uses of PQgetResult use latch-aware waiting, but didn't need to do much more. These changes should be enough to ensure that libpqwalreceiver.c will respond promptly to SIGTERM whenever it's waiting to receive data. In principle, it could block for a long time while waiting to send data too, and this patch does nothing to guard against that. I think that that hazard is mostly theoretical though: such blocking should occur only if we fill the kernel's data transmission buffers, and we don't generally send enough data to make that happen without waiting for input. If we find out that the hazard isn't just theoretical, we could fix it by using PQsetnonblocking, but that would require more ticklish changes than I care to make now. This is a bug fix, but it seems like too big a change to push into the back branches without much more testing than there's time for right now. Perhaps we'll back-patch once we have more confidence in the change. Patch by me; thanks to Thomas Munro for review. Discussion: https://postgr.es/m/20190416070119.GK2673@paquier.xyz
2019-04-28Do pre-release housekeeping on catalog data, and fix jsonpath send/recv.Tom Lane
Run renumber_oids.pl to move high-numbered OIDs down, as per pre-beta tasks specified by RELEASE_CHANGES. (The only change is 8394 -> 3428.) Also run reformat_dat_file.pl while I'm here. While looking at the reformat diffs, I chanced to notice that type jsonpath had typsend and typreceive = '-', which surely is not the intention given that jsonpath_send and jsonpath_recv exist. Fix that. It's safe to assume that these functions have never been tested :-(. I didn't try, but somebody should.
2019-04-25Fix tablespace inheritance for partitioned relsAlvaro Herrera
Commit ca4103025dfe left a few loose ends. The most important one (broken pg_dump output) is already fixed by virtue of commit 3b23552ad8bb, but some things remained: * When ALTER TABLE rewrites tables, the indexes must remain in the tablespace they were originally in. This didn't work because index recreation during ALTER TABLE runs manufactured SQL (yuck), which runs afoul of default_tablespace in competition with the parent relation tablespace. To fix, reset default_tablespace to the empty string temporarily, and add the TABLESPACE clause as appropriate. * Setting a partitioned rel's tablespace to the database default is confusing; if it worked, it would direct the partitions to that tablespace regardless of default_tablespace. But in reality it does not work, and making it work is a larger project. Therefore, throw an error when this condition is detected, to alert the unwary. Add some docs and tests, too. Author: Álvaro Herrera Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
2019-04-24Allow pg_class xid & multixid horizons to not be set.Andres Freund
This allows table AMs that don't need these horizons. This was already documented in the tableam relation_set_new_filenode callback, but an assert prevented if from actually working (the test AM code contained the change itself). Defang the asserts in the general code, and move the stronger ones into heap AM. Relatedly, after CLUSTER/VACUUM, we'd always assign a relfrozenxid / relminmxid. Change the table_relation_copy_for_cluster() interface to allow the AM to overwrite the horizons that get set on the pg_class entry. This'd also in the future allow AMs like heap to compute a relfrozenxid during rewrite that's the table's actual minimum rather than a pre-determined value. Arguably it'd have been better to move the whole computation / setting of those values into the callback, but it seems likely that for other reasons it'd be better to be able to use one value to vacuum/cluster multiple tables (e.g. a toast's horizon shouldn't be different than the table's). Reported-By: Heikki Linnakangas Author: Andres Freund Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi
2019-04-23Remove useless comment.Tom Lane
Commit e439c6f0c removed IndexStmt.relationId, but not the comment that had been added to explain it. Said comment was therefore very confusing.
2019-04-23Prevent O(N^2) unique index insertion edge case.Peter Geoghegan
Commit dd299df8 made nbtree treat heap TID as a tiebreaker column, establishing the principle that there is only one correct location (page and page offset number) for every index tuple, no matter what. Insertions of tuples into non-unique indexes proceed as if heap TID (scan key's scantid) is just another user-attribute value, but insertions into unique indexes are more delicate. The TID value in scantid must initially be omitted to ensure that the unique index insertion visits every leaf page that duplicates could be on. The scantid is set once again after unique checking finishes successfully, which can force _bt_findinsertloc() to step right one or more times, to locate the leaf page that the new tuple must be inserted on. Stepping right within _bt_findinsertloc() was assumed to occur no more frequently than stepping right within _bt_check_unique(), but there was one important case where that assumption was incorrect: inserting a "duplicate" with NULL values. Since _bt_check_unique() didn't do any real work in this case, it wasn't appropriate for _bt_findinsertloc() to behave as if it was finishing off a conventional unique insertion, where any existing physical duplicate must be dead or recently dead. _bt_findinsertloc() might have to grovel through a substantial portion of all of the leaf pages in the index to insert a single tuple, even when there were no dead tuples. To fix, treat insertions of tuples with NULLs into a unique index as if they were insertions into a non-unique index: never unset scantid before calling _bt_search() to descend the tree, and bypass _bt_check_unique() entirely. _bt_check_unique() is no longer responsible for incoming tuples with NULL values. Discussion: https://postgr.es/m/CAH2-Wzm08nr+JPx4jMOa9CGqxWYDQ-_D4wtPBiKghXAUiUy-nQ@mail.gmail.com
2019-04-23Avoid order-of-execution problems with ALTER TABLE ADD PRIMARY KEY.Tom Lane
Up to now, DefineIndex() was responsible for adding attnotnull constraints to the columns of a primary key, in any case where it hadn't been convenient for transformIndexConstraint() to mark those columns as is_not_null. It (or rather its minion index_check_primary_key) did this by executing an ALTER TABLE SET NOT NULL command for the target table. The trouble with this solution is that if we're creating the index due to ALTER TABLE ADD PRIMARY KEY, and the outer ALTER TABLE has additional sub-commands, the inner ALTER TABLE's operations executed at the wrong time with respect to the outer ALTER TABLE's operations. In particular, the inner ALTER would perform a validation scan at a point where the table's storage might be inconsistent with its catalog entries. (This is on the hairy edge of being a security problem, but AFAICS it isn't one because the inner scan would only be interested in the tuples' null bitmaps.) This can result in unexpected failures, such as the one seen in bug #15580 from Allison Kaptur. To fix, let's remove the attempt to do SET NOT NULL from DefineIndex(), reducing index_check_primary_key's role to verifying that the columns are already not null. (It shouldn't ever see such a case, but it seems wise to keep the check for safety.) Instead, make transformIndexConstraint() generate ALTER TABLE SET NOT NULL subcommands to be executed ahead of the ADD PRIMARY KEY operation in every case where it can't force the column to be created already-not-null. This requires only minor surgery in parse_utilcmd.c, and it makes for a much more satisfying spec for transformIndexConstraint(): it's no longer having to take it on faith that someone else will handle addition of NOT NULL constraints. To make that work, we have to move the execution of AT_SetNotNull into an ALTER pass that executes ahead of AT_PASS_ADD_INDEX. I moved it to AT_PASS_COL_ATTRS, and put that after AT_PASS_ADD_COL to avoid failure when the column is being added in the same command. This incidentally fixes a bug in the only previous usage of AT_PASS_COL_ATTRS, for AT_SetIdentity: it didn't work either for a newly-added column. Playing around with this exposed a separate bug in ALTER TABLE ONLY ... ADD PRIMARY KEY for partitioned tables. The intent of the ONLY modifier in that context is to prevent doing anything that would require holding lock for a long time --- but the implied SET NOT NULL would recurse to the child partitions, and do an expensive validation scan for any child where the column(s) were not already NOT NULL. To fix that, invent a new ALTER subcommand AT_CheckNotNull that just insists that a child column be already NOT NULL, and apply that, not AT_SetNotNull, when recursing to children in this scenario. This results in a slightly laxer definition of ALTER TABLE ONLY ... SET NOT NULL for partitioned tables, too: that command will now work as long as all children are already NOT NULL, whereas before it just threw up its hands if there were any partitions. In passing, clean up the API of generateClonedIndexStmt(): remove a useless argument, ensure that the output argument is not left undefined, update the header comment. A small side effect of this change is that no-such-column errors in ALTER TABLE ADD PRIMARY KEY now produce a different message that includes the table name, because they are now detected by the SET NOT NULL step which has historically worded its error that way. That seems fine to me, so I didn't make any effort to avoid the wording change. The basic bug #15580 is of very long standing, and these other bugs aren't new in v12 either. However, this is a pretty significant change in the way ALTER TABLE ADD PRIMARY KEY works. On balance it seems best not to back-patch, at least not till we get some more confidence that this patch has no new bugs. Patch by me, but thanks to Jie Zhang for a preliminary version. Discussion: https://postgr.es/m/15580-d1a6de5a3d65da51@postgresql.org Discussion: https://postgr.es/m/1396E95157071C4EBBA51892C5368521017F2E6E63@G08CNEXMBPEKD02.g08.fujitsu.local
2019-04-23Fix detection of passwords hashed with MD5 or SCRAM-SHA-256Michael Paquier
This commit fixes a couple of issues related to the way password verifiers hashed with MD5 or SCRAM-SHA-256 are detected, leading to being able to store in catalogs passwords which do not follow the supported hash formats: - A MD5-hashed entry was checked based on if its header uses "md5" and if the string length matches what is expected. Unfortunately the code never checked if the hash only used hexadecimal characters, as reported by Tom Lane. - A SCRAM-hashed entry was checked based on only its header, which should be "SCRAM-SHA-256$", but it never checked for any fields afterwards, as reported by Jonathan Katz. Backpatch down to v10, which is where SCRAM has been introduced, and where password verifiers in plain format have been removed. Author: Jonathan Katz Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/016deb6b-1f0a-8e9f-1833-a8675b170aa9@postgresql.org Backpatch-through: 10
2019-04-22Convert gist to compute page level xid horizon on primary.Andres Freund
Due to parallel development, gist added the missing conflict information in c952eae52a3, while 558a9165e08 moved that computation to the primary for the index types that already had it. Thus adapt gist to also compute on the primary, using index_compute_xid_horizon_for_tuples() instead of its own copy of the logic. This also adds pg_waldump support for XLOG_GIST_DELETE records, which previously was not properly present. Bumps WAL version. Author: Andres Freund Discussion: https://postgr.es/m/20190406050243.bszosdg4buvabfrt@alap3.anarazel.de
2019-04-21Fix mvdistinct and dependencies size calculationsTomas Vondra
The formulas used to calculate size while (de)serializing mvndistinct and functional dependencies were based on offset() of the structs. But that is incorrect, because the structures are not copied directly, we we copy the individual fields directly. At the moment this works fine, because there is no alignment padding on any platform we support. But it might break if we ever added some fields into any of the structs, for example. It's also confusing. Fixed by reworking the macros to directly sum sizes of serialized fields. The macros are now useful only for serialiation, so there is no point in keeping them in the public header file. So make them private by moving them to the .c files. Also adds a couple more asserts to check the serialization, and fixes an incorrect allocation of MVDependency instead of (MVDependency *). Reported-By: Tom Lane Discussion: https://postgr.es/m/29785.1555365602@sss.pgh.pa.us
2019-04-19Fix slot type issue for fuzzy distance index scan over out-of-core table AM.Andres Freund
For amcanreorderby scans the nodeIndexscan.c's reorder queue holds heap tuples, but the underlying table likely does not. Before this fix we'd return different types of slots, depending on whether the tuple came from the reorder queue, or from the index + table. While that could be fixed by signalling that the node doesn't return a fixed type of slot, it seems better to instead remove the separate slot for the reorder queue, and use ExecForceStoreHeapTuple() to store tuples from the queue. It's not particularly common to need reordering, after all. This reverts most of the iss_ReorderQueueSlot related changes to nodeIndexscan.c made in 1a0586de3657cd3, except that now ExecForceStoreHeapTuple() is used instead of ExecStoreHeapTuple(). Noticed when testing zheap against the in-core version of tableam. Author: Andres Freund
2019-04-19Fix two memory leaks around force-storing tuples in slots.Andres Freund
As reported by Tom, when ExecStoreMinimalTuple() had to perform a conversion to store the minimal tuple in the slot, it forgot to respect the shouldFree flag, and leaked the tuple into the current memory context if true. Fix that by freeing the tuple in that case. Looking at the relevant code made me (Andres) realize that not having the shouldFree parameter to ExecForceStoreHeapTuple() was a bad idea. Some callers had to locally implement the necessary logic, and in one case it was missing, creating a potential per-group leak in non-hashed aggregation. The choice to not free the tuple in ExecComputeStoredGenerated() is not pretty, but not introduced by this commit - I'll start a separate discussion about it. Reported-By: Tom Lane Discussion: https://postgr.es/m/366.1555382816@sss.pgh.pa.us
2019-04-15Use [FLEXIBLE_ARRAY_MEMBER] not [1] in MultiSortSupportData.Tom Lane
This struct seems to have not gotten the word about preferred coding style for variable-length arrays.