users/kgrittn/postgres.git - Kevin Grittner's development, especially fully serializable transactions (See http://wiki.postgresql.org/wiki/Serializable for more info)

Age	Commit message (Collapse)	Author
2011-12-25	Rethink representation of index clauses' mapping to index columns.HEAD master	Tom Lane
	In commit e2c2c2e8b1df7dfdb01e7e6f6191a569ce3c3195 I made use of nested list structures to show which clauses went with which index columns, but on reflection that's a data structure that only an old-line Lisp hacker could love. Worse, it adds unnecessary complication to the many places that don't much care which clauses go with which index columns. Revert to the previous arrangement of flat lists of clauses, and instead add a parallel integer list of column numbers. The places that care about the pairing can chase both lists with forboth(), while the places that don't care just examine one list the same as before. The only real downside to this is that there are now two more lists that need to be passed to amcostestimate functions in case they care about column matching (which btcostestimate does, so not passing the info is not an option). Rather than deal with 11-argument amcostestimate functions, pass just the IndexPath and expect the functions to extract fields from it. That gets us down to 7 arguments which is better than 11, and it seems more future-proof against likely additions to the information we keep about an index path.
2011-12-23	Improve planner's handling of duplicated index column expressions.	Tom Lane
	It's potentially useful for an index to repeat the same indexable column or expression in multiple index columns, if the columns have different opclasses. (If they share opclasses too, the duplicate column is pretty useless, but nonetheless we've allowed such cases since 9.0.) However, the planner failed to cope with this, because createplan.c was relying on simple equal() matching to figure out which index column each index qual is intended for. We do have that information available upstream in indxpath.c, though, so the fix is to not flatten the multi-level indexquals list when putting it into an IndexPath. Then we can rely on the sublist structure to identify target index columns in createplan.c. There's a similar issue for index ORDER BYs (the KNNGIST feature), so introduce a multi-level-list representation for that too. This adds a bit more representational overhead, but we might more or less buy that back by not having to search for matching index columns anymore in createplan.c; likewise btcostestimate saves some cycles. Per bug #6351 from Christian Rudolph. Likely symptoms include the "btree index keys must be ordered by attribute" failure shown there, as well as "operator MMMM is not a member of opfamily NNNN". Although this is a pre-existing problem that can be demonstrated in 9.0 and 9.1, I'm not going to back-patch it, because the API changes in the planner seem likely to break things such as index plugins. The corner cases where this matters seem too narrow to justify possibly breaking things in a minor release.
2011-12-23	Add bytea_agg, parallel to string_agg.	Robert Haas
	Pavel Stehule
2011-12-22	Add a security_barrier option for views.	Robert Haas
	When a view is marked as a security barrier, it will not be pulled up into the containing query, and no quals will be pushed down into it, so that no function or operator chosen by the user can be applied to rows not exposed by the view. Views not configured with this option cannot provide robust row-level security, but will perform far better. Patch by KaiGai Kohei; original problem report by Heikki Linnakangas (in October 2009!). Review (in earlier versions) by Noah Misch and others. Design advice by Tom Lane and myself. Further review and cleanup by me.
2011-12-21	Shave a few cycles in string_agg().	Robert Haas
	Pavel Stehule
2011-12-21	Fix gincostestimate to handle ScalarArrayOpExpr reasonably.	Tom Lane
	The original coding of this function overlooked the possibility that it could be passed anything except simple OpExpr indexquals. But ScalarArrayOpExpr is possible too, and the code would probably crash (and surely give ridiculous answers) in such a case. Add logic to try to estimate sanely for such cases. In passing, fix the treatment of inner-indexscan cost estimation: it was failing to scale up properly for multiple iterations of a nestloop. (I think somebody might've thought that index_pages_fetched() is linear, but of course it's not.) Report, diagnosis, and preliminary patch by Marti Raudsepp; I refactored it a bit and fixed the cost estimation. Back-patch into 9.1 where the bogus code was introduced.
2011-12-19	Add support for privileges on types	Peter Eisentraut
	This adds support for the more or less SQL-conforming USAGE privilege on types and domains. The intent is to be able restrict which users can create dependencies on types, which restricts the way in which owners can alter types. reviewed by Yeb Havinga
2011-12-19	Allow CHECK constraints to be declared ONLY	Alvaro Herrera
	This makes them enforceable only on the parent table, not on children tables. This is useful in various situations, per discussion involving people bitten by the restrictive behavior introduced in 8.4. Message-Id: 8762mp93iw.fsf@comcast.net CAFaPBrSMMpubkGf4zcRL_YL-AERUbYF_-ZNNYfb3CVwwEqc9TQ@mail.gmail.com Authors: Nikhil Sontakke, Alex Hunsaker Reviewed by Robert Haas and myself
2011-12-17	Add SP-GiST (space-partitioned GiST) index access method.	Tom Lane
	SP-GiST is comparable to GiST in flexibility, but supports non-balanced partitioned search structures rather than balanced trees. As described at PGCon 2011, this new indexing structure can beat GiST in both index build time and query speed for search problems that it is well matched to. There are a number of areas that could still use improvement, but at this point the code seems committable. Teodor Sigaev and Oleg Bartunov, with considerable revisions by Tom Lane
2011-12-16	include_if_exists facility for config file.	Andrew Dunstan
	This works the same as include, except that an error is not thrown if the file is missing. Instead the fact that it's missing is logged. Greg Smith, reviewed by Euler Taveira de Oliveira.
2011-12-12	Revert the behavior of inet/cidr functions to not unpack the arguments.	Heikki Linnakangas
	I forgot to change the functions to use the PG_GETARG_INET_PP() macro, when I changed DatumGetInetP() to unpack the datum, like Datum*P macros usually do. Also, I screwed up the definition of the PG_GETARG_INET_PP() macro, and didn't notice because it wasn't used. This fixes the memory leak when sorting inet values, as reported by Jochen Erwied and debugged by Andres Freund. Backpatch to 8.3, like the previous patch that broke it.
2011-12-10	Miscellaneous cleanup to silence compiler warnings seen on Mingw.	Andrew Dunstan
	Remove some dead code, conditionally declare some items or call some code, and fix one or two declarations.
2011-12-09	Cancel running query if it is detected that the connection to the client is	Heikki Linnakangas
	lost. The only way we detect that at the moment is when write() fails when we try to write to the socket. Florian Pflug with small changes by me, reviewed by Greg Jaskiewicz.
2011-12-07	Fix corner cases in readlink() usage.	Tom Lane
	Make sure all calls are protected by HAVE_READLINK, and get the buffer overflow tests right. Be a bit more paranoid about string length in _tarWriteHeader(), too.
2011-12-07	Better error reporting if the link target is too long	Magnus Hagander
	This situation won't set errno, so using %m will give an incorrect error message.
2011-12-07	Remove spclocation field from pg_tablespace	Magnus Hagander
	Instead, add a function pg_tablespace_location(oid) used to return the same information, and do this by reading the symbolic link. Doing it this way makes it possible to relocate a tablespace when the database is down by simply changing the symbolic link.
2011-12-07	Create a "sort support" interface API for faster sorting.	Tom Lane
	This patch creates an API whereby a btree index opclass can optionally provide non-SQL-callable support functions for sorting. In the initial patch, we only use this to provide a directly-callable comparator function, which can be invoked with a bit less overhead than the traditional SQL-callable comparator. While that should be of value in itself, the real reason for doing this is to provide a datatype-extensible framework for more aggressive optimizations, as in Peter Geoghegan's recent work. Robert Haas and Tom Lane
2011-12-01	Fix getTypeIOParam to support type record[].	Tom Lane
	Since record[] uses array_in, it needs to have its element type passed as typioparam. In HEAD and 9.1, this fix essentially reverts commit 9bc933b2125a5358722490acbc50889887bf7680, which was a hack that is no longer needed since domains don't set their typelem anymore. Before that, adjust the logic so that only domains are excluded from being treated like arrays, rather than assuming that only base types should be included. Add a regression test to demonstrate the need for this. Per report from Maxim Boguk. Back-patch to 8.4, where type record[] was added.
2011-11-30	Improve table locking behavior in the face of current DDL.	Robert Haas
	In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30	Tweak previous patch to ensure edata->filename always gets initialized.	Tom Lane
	On a platform that isn't supplying __FILE__, previous coding would either crash or give a stale result for the filename string. Not sure how likely that is, but the original code catered for it, so let's keep doing so.
2011-11-30	Strip file names reported in error messages in vpath builds	Peter Eisentraut
	In vpath builds, the __FILE__ macro that is used in verbose error reports contains the full absolute file name, which makes the error messages excessively verbose. So keep only the base name, thus matching the behavior of non-vpath builds.
2011-11-27	Improve GiST range-contained-by searches by adding a flag for empty ranges.	Tom Lane
	In the original implementation, a range-contained-by search had to scan the entire index because an empty range could be lurking anywhere. Improve that by adding a flag to upper GiST entries that says whether the represented subtree contains any empty ranges. Also, make a simple mod to the penalty function to discourage empty ranges from getting pushed into subtrees without any. This needs more work, and the picksplit function should be taught about it too, but that code can be improved without causing an on-disk compatibility break; so we'll leave it for another day. Since we're breaking on-disk compatibility of range values anyway, I took the opportunity to reorganize the range flags bits; the unused RANGE_xB_NULL bits are now adjacent, which might open the door for using them in some other way later. In passing, remove the GiST range opclass entry for <>, which doesn't seem like it can really be indexed usefully. Alexander Korotkov, with some editorializing by Tom
2011-11-26	Make GiST index searches smarter about queries against empty ranges.	Tom Lane
	In the cases where the result of the called proc is negated, we should explicitly test both inputs for empty, to ensure we'll never return "true" for an unsatisfiable query. In other cases we can rely on the called proc to say the right thing.
2011-11-25	Improve logging of autovacuum I/O activity	Alvaro Herrera
	This adds some I/O stats to the logging of autovacuum (when the operation takes long enough that log_autovacuum_min_duration causes it to be logged), so that it is easier to tune. Notably, it adds buffer I/O counts (hits, misses, dirtied) and read and write rate. Authors: Greg Smith and Noah Misch
2011-11-25	Move "hot" members of PGPROC into a separate PGXACT array.	Robert Haas
	This speeds up snapshot-taking and reduces ProcArrayLock contention. Also, the PGPROC (and PGXACT) structures used by two-phase commit are now allocated as part of the main array, rather than in a separate array, and we keep ProcArray sorted in pointer order. These changes are intended to minimize the number of cache lines that must be pulled in to take a snapshot, and testing shows a substantial increase in performance on both read and write workloads at high concurrencies. Pavan Deolasee, Heikki Linnakangas, Robert Haas
2011-11-23	Adjust range_adjacent to support different canonicalization rules.	Tom Lane
	The original coding would not work for discrete ranges in which the canonicalization rule is to produce symmetric boundaries (either [] or () style), as noted by Jeff Davis. Florian Pflug pointed out that we could fix that by invoking the canonicalization function to see if the range "between" the two given ranges normalizes to empty. This implementation of Florian's idea is a tad slower than the original code, but only in the case where there actually is a canonicalization function --- if not, it's essentially the same logic as before.
2011-11-23	Remove user-selectable ANALYZE option for range types.	Tom Lane
	It's not clear that a per-datatype typanalyze function would be any more useful than a generic typanalyze for ranges. What is clear is that letting unprivileged users select typanalyze functions is a crash risk or worse. So remove the option from CREATE TYPE AS RANGE, and instead put in a generic typanalyze function for ranges. The generic function does nothing as yet, but hopefully we'll improve that before 9.2 release.
2011-11-23	Remove zero- and one-argument range constructor functions.	Tom Lane
	Per discussion, the zero-argument forms aren't really worth the catalog space (just write 'empty' instead). The one-argument forms have some use, but they also have a serious problem with looking too much like functional cast notation; to the point where in many real use-cases, the parser would misinterpret what was wanted. Committing this as a separate patch, with the thought that we might want to revert part or all of it if we can think of some way around the cast ambiguity.
2011-11-22	Improve implementation of range-contains-element tests.	Tom Lane
	Implement these tests directly instead of constructing a singleton range and then applying range-contains. This saves a range serialize/deserialize cycle as well as a couple of redundant bound-comparison steps, and adds very little code on net. Remove elem_contained_by_range from the GiST opclass: it doesn't belong there because there is no way to use it in an index clause (where the indexed column would have to be on the left). Its commutator is in the opclass, and that's what counts.
2011-11-22	Still more review for range-types patch.	Tom Lane
	Per discussion, relax the range input/construction rules so that the only hard error is lower bound > upper bound. Cases where the lower bound is <= upper bound, but the range nonetheless normalizes to empty, are now permitted. Fix core dump in range_adjacent when bounds are infinite. Marginal cleanup of regression test cases, some more code commenting.
2011-11-21	More code review for rangetypes patch.	Tom Lane
	Fix up some infelicitous coding in DefineRange, and add some missing error checks. Rearrange operator strategy number assignments for GiST anyrange opclass so that they don't make such a mess of opr_sanity's table of operator names associated with different strategy numbers. Assign hopefully-temporary selectivity estimators to range operators that didn't have one --- poor as the estimates are, they're still a lot better than the default 0.5 estimate, and they'll shut up the opr_sanity test that wants to see selectivity estimators on all built-in operators.
2011-11-21	Further code review for range types patch.	Tom Lane
	Fix some bugs in coercion logic and pg_dump; more comment cleanup; minor cosmetic improvements.
2011-11-17	Fix range_cmp_bounds for the case of equal-valued exclusive bounds.	Tom Lane
	Also improve its comments and related regression tests. Jeff Davis, with some further adjustments by Tom
2011-11-15	Improve caching in range type I/O functions.	Tom Lane
	Cache the the element type's I/O info across calls, not only the range type's info. In passing, also clean up hash_range a bit more.
2011-11-15	Restructure function-internal caching in the range type code.	Tom Lane
	Move the responsibility for caching specialized information about range types into the type cache, so that the catalog lookups only have to occur once per session. Rearrange APIs a bit so that fn_extra caching is actually effective in the GiST support code. (Use of OidFunctionCallN is bad enough for performance in itself, but it also prevents the function from exploiting fn_extra caching.) The range I/O functions are still not very bright about caching repeated lookups, but that seems like material for a separate patch. Also, avoid unnecessary use of memcpy to fetch/store the range type OID and flags, and don't use the full range_deserialize machinery when all we need to see is the flags value. Also fix API error in range_gist_penalty --- it was failing to set *penalty for any case involving an empty range.
2011-11-15	Fix alignment and toasting bugs in range types.	Tom Lane
	A range type whose element type has 'd' alignment must have 'd' alignment itself, else there is no guarantee that the element value can be used in-place. (Because range_deserialize uses att_align_pointer which forcibly aligns the given pointer, violations of this rule did not lead to SIGBUS but rather to garbage data being extracted, as in one of the added regression test cases.) Also, you can't put a toast pointer inside a range datum, since the referenced value could disappear with the range datum still present. For consistency with the handling of arrays and records, I also forced decompression of in-line-compressed bound values. It would work to store them as-is, but our policy is to avoid situations that might result in double compression. Add assorted regression tests for this, and bump catversion because of fixes to built-in pg_type entries. Also some marginal cleanup of inconsistent/unnecessary error checks.
2011-11-14	Return NULL instead of throwing error when desired bound is not available.	Tom Lane
	Change range_lower and range_upper to return NULL rather than throwing an error when the input range is empty or the relevant bound is infinite. Per discussion, throwing an error seems likely to be unduly hard to work with. Also, this is more consistent with the behavior of the constructors, which treat NULL as meaning an infinite bound.
2011-11-14	Return FALSE instead of throwing error for comparisons with empty ranges.	Tom Lane
	Change range_before, range_after, range_adjacent to return false rather than throwing an error when one or both input ranges are empty. The original definition is unnecessarily difficult to use, and also can result in undesirable planner failures since the planner could try to compare an empty range to something else while deriving statistical estimates. (This was, in fact, the cause of repeatable regression test failures on buildfarm member jaguar, as well as intermittent failures elsewhere.) Also tweak rangetypes regression test to not drop all the objects it creates, so that the final state of the regression database contains some rangetype objects for pg_dump testing.
2011-11-14	Fix copyright notices, other minor editing in new range-types code.	Tom Lane
	No functional changes in this commit (except I could not resist the temptation to re-word a couple of error messages). This is just manual cleanup after pgindent to make the code look reasonably like other PG code, in preparation for more detailed code review to come.
2011-11-14	Rerun pgindent with updated typedef list.	Bruce Momjian

2011-11-14	Run pgindent on range type files, per request from Tom.	Bruce Momjian

2011-11-10	Revert removal of trace_userlocks, because userlocks aren't gone.	Robert Haas
	This reverts commit 0180bd6180511875db046bf8ddcaa633a2952dfd. contrib/userlock is gone, but user-level locking still exists, and is exposed via the pg_advisory* family of functions.
2011-11-08	Make DatumGetInetP() unpack inet datums with a 1-byte header, and add	Heikki Linnakangas
	a new macro, DatumGetInetPP(), that does not. This brings these macros in line with other DatumGet*P() macros. Backpatch to 8.3, where 1-byte header varlenas were introduced.
2011-11-07	Fix timestamp range subdiff functions, when using float datetimes.	Heikki Linnakangas

2011-11-03	Support range data types.	Heikki Linnakangas
	Selectivity estimation functions are missing for some range type operators, which is a TODO. Jeff Davis
2011-11-02	Remove spurious entry from missed catch while patch juggling	Simon Riggs

2011-11-02	Fix timing of Startup CLOG and MultiXact during Hot Standby	Simon Riggs
	Patch by me, bug report by Chris Redekop, analysis by Florian Pflug
2011-11-01	Fix race condition with toast table access from a stale syscache entry.	Tom Lane
	If a tuple in a syscache contains an out-of-line toasted field, and we try to fetch that field shortly after some other transaction has committed an update or deletion of the tuple, there is a race condition: vacuum could come along and remove the toast tuples before we can fetch them. This leads to transient failures like "missing chunk number 0 for toast value NNNNN in pg_toast_2619", as seen in recent reports from Andrew Hammond and Tim Uckun. The design idea of syscache is that access to stale syscache entries should be prevented by relation-level locks, but that fails for at least two cases where toasted fields are possible: ANALYZE updates pg_statistic rows without locking out sessions that might want to plan queries on the same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without any meaningful lock at all. The least risky fix seems to be an idea that Heikki suggested when we were dealing with a related problem back in August: forcibly detoast any out-of-line fields before putting a tuple into syscache in the first place. This avoids the problem because at the time we fetch the parent tuple from the catalog, we should be holding an MVCC snapshot that will prevent removal of the toast tuples, even if the parent tuple is outdated immediately after we fetch it. (Note: I'm not convinced that this statement holds true at every instant where we could be fetching a syscache entry at all, but it does appear to hold true at the times where we could fetch an entry that could have a toasted field. We will need to be a bit wary of adding toast tables to low-level catalogs that don't have them already.) An additional benefit is that subsequent uses of the syscache entry should be faster, since they won't have to detoast the field. Back-patch to all supported versions. The problem is significantly harder to reproduce in pre-9.0 releases, because of their willingness to flush every entry in a syscache whenever the underlying catalog is vacuumed (cf CatalogCacheFlushRelation); but there is still a window for trouble.
2011-11-01	Clean up whitespace and indentation in parser and scanner files	Peter Eisentraut
	These are not touched by pgindent, so clean them up a bit manually.
2011-10-30	Support more locale-specific formatting options in cash_out().	Tom Lane
	The POSIX spec defines locale fields for controlling the ordering of the value, sign, and currency symbol in monetary output, but cash_out only supported a small subset of these options. Fully implement p/n_sign_posn, p/n_cs_precedes, and p/n_sep_by_space per spec. Fix up cash_in so that it will accept all these format variants. Also, make sure that thousands_sep is only inserted to the left of the decimal point, as required by spec. Per bug #6144 from Eduard Kracmar and discussion of bug #6277. This patch includes some ideas from Alexander Lakhin's proposed patch, though it is very different in detail.