Previously, an error while processing any single table would cause
pg_reorg to cause exit() and bail out. Quick summary of fixes:
* get rid of pgut_atexit_push() and pgut_atexit_pop() use, since
we are no longer relying on calling exit() to handle mundane errors
* remove lock_conn_pid variable; we can just use buffer instead
* lock_exclusive() and lock_access_share() now return bool instead of
bailing out on any error
* ERROR-level ereport() or elog() calls now return WARNING instead,
to avoid bailing out unnecessarily
* signature of reorg_cleanup() changed; it no longer needs to take a
void pointer
* check return of strdup() for vxid
* Use pgut_rollback() instead of sending ROLLBACK; command directly
There are still one or two FIXMEs left, including fixing table name
escaping, but I'm committing this much.
Per Issue #18. SimpleStringList code borrowed from pg_dump and a
pending patch to add similar functionality to pg_restore,
clusterdb, vacuumdb, and reindexdb.
The error handling in reorg_one_table() could still be much improved,
so that an error processing a single table doesn't cause pg_reorg to
necessarily bail out and skip further tables, but I'll leave that for
another day.
* KILL_COMPETING_LOCKS was using pg_cancel_backend() instead of
pg_terminate_backend()
* create kill_ddl() function for canceling+terminating any pending
unsafe concurrent DDL, i.e. anyone hanging out waiting for
an ACCESS EXCLUSIVE lock on our table.
* create lock_access_share() function for reliably obtaining an
ACCESS SHARE lock on the target table, killing off any queued
ACCESS EXCLUSIVE lockers in the process via kill_ddl()
* Avoid deadlock possible before we run:
CREATE TABLE reorg.table_xxx AS SELECT ... FROM ONLY ...
by using lock_access_share()
* Fix a few calls in lock_exclusive() which were forgetting to
specify the passed-in connection.
These fixes are related to Issue #8. The main thing remaining AFAIK
is to review or fix some of the unlikely-error handling bits;
most of these should be marked with XXX now.
Fix table locking so that race conditions don't exist between lock
release in primary conn, and lock acquisition in conn2. Also, have
conn2 be in charge of performing the table swap step, to avoid a
similar race.
Part of work for Issue #8.
This is a first pass at Daniele's suggestion in Issue #8, although it is
definitely still buggy -- it is still possible for another transaction
to get in an AccessExclusive lock and perform DDL either before the
ACCESS SHARE lock is acquired or immediately after it is released.