Prepare 3.2.16

author pengbo <pengbo@sraoss.co.jp>

Fri, 17 Jun 2016 09:04:47 +0000 (18:04 +0900)

committer pengbo <pengbo@sraoss.co.jp>

Fri, 17 Jun 2016 09:04:47 +0000 (18:04 +0900)
author pengbo <pengbo@sraoss.co.jp>
Fri, 17 Jun 2016 09:04:47 +0000 (18:04 +0900)
committer pengbo <pengbo@sraoss.co.jp>
Fri, 17 Jun 2016 09:04:47 +0000 (18:04 +0900)
diff --git a/NEWS b/NEWS

index a40a8fc6f856f819e8e480df1da404cab6080b3d..58f45d6491342fc30f4d554d80619afc8b333782 100644 (file)
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,141 @@
  
  ===============================================================================
  3.2 Series (2012/08/03 - )
+===============================================================================
+
+                        3.2.16 (namameboshi) 2016/06/17
+
+* Version 3.2.16
+
+    This is a bugfix release against pgpool-II 3.2.15.
+
+    __________________________________________________________________
+
+* New features
+
+    - Allow to access to pgpool while doing health checking (Tatsuo Ishii)
+
+      Currently any attempt to connect to pgpool fails if pgpool is doing
+      health check against failed node even if fail_over_on_backend_error is
+      off because pgpool child first tries to connect to all backend
+      including the failed one and exits if it fails to connect to a backend
+      (of course it fails). This is a temporary situation and will be
+      resolved before pgpool executes failover. However if the health check
+      is retrying, the temporary situation keeps longer depending on the
+      setting of health_check_max_retries and health_check_retry_delay. This
+      is not good. Attached patch tries to mitigate the problem:
+
+      - When an attempt to connect to backend fails, give up connecting to
+      the failed node and skip to other node, rather than exiting the
+      process if operating in streaming replication mode and the node is
+      not primary node.
+
+      - Mark the local status of the failed node to "down".
+
+      - This will let the primary node be selected as a load balance node
+      and every queries will be sent to the primary node. If there's other
+      healthy standby nodes, one of them will be chosen as the load
+      balance node.
+
+      - After the session is over, the child process will suicide to not
+      retain the local status.
+
+* Bug fixes
+
+    - Fix is_set_transaction_serializable() when
+      SET default_transaction_isolation TO 'serializable'. (Bo Peng)
+
+      SET default_transaction_isolation TO 'serializable' is sent to
+      not only primary but also to standby server in streaming replication mode,
+      and this causes an error. Fix is, in streaming replication mode,
+      SET default_transaction_isolation TO 'serializable' is sent only to the
+      primary server.
+
+      See bug 191 for related info.
+
+    - Fix Chinese documetation bug about raw mode (Yugo Nagata, Bo Peng)
+      Connection pool is avalilable in raw mode.
+
+    - Fix confusing comments in pgpool.conf (Tatsuo Ishii)
+
+    - Permit pgpool to support multiple SSL cipher protocols (Muhammad Usama)
+
+      Currently TLSv1_method() is used to initialize the SSL context, that puts an
+      unnecessary limitation to allow only TLSv1 protocol for SSL communication.
+      While postgreSQL supports other ciphers protocols as well. The commit changes
+      the above and initializes the SSLSession using the SSLv23_method()
+      (same is also used by PostgreSQL). Because it can negotiate the use of the
+      highest mutually supported protocol version and remove the limitation of one
+      specific protocol version.
+
+    - If statement timeout is enabled on backend and do_query() sends a (Tatsuo Ishii)
+      query to primary node, and all of following user queries are sent to
+      standby, it is possible that the next command, for example END, could
+      cause a statement timeout error on the primary, and a kind mismatch
+      error on pgpool-II is raised.
+
+      This fix tries to mitigate the problem by sending sync message instead
+      of flush message in do_query(), expecting that the sync message reset
+      the statement timeout timer if we are in an explicit transaction. We
+      cannot use this technique for implicit transaction case, because the
+      sync message removes the unnamed portal if there's any.
+
+      Plus, pg_stat_statement will no longer show the query issued by
+      do_query() as "running".
+
+      See bug 194 for related info.
+
+    - Deal with the case when the primary is not node 0 in streaming replication mode. (Tatsuo Ishii)
+
+      http://www.pgpool.net/mantisbt/view.php?id=194#c837 reported that if
+      primary is not node 0, then statement timeout could occur even after
+      bug194-3.3.diff was applied. After some investigation, it appeared
+      that MASTER macro could return other than primary or load balance
+      node, which was not supposed to happen, thus do_query() sends queries
+      to wrong node (this is not clear from the report but I confirmed it in
+      my investigation).
+
+      pool_virtual_master_db_node_id(), which is called in MASTER macro
+      returns query_context->virtual_master_node_id if query context
+      exists. This could return wrong node if the variable has not been set
+      yet. To fix this, the function is modified: if the variable is not
+      either load balance node or primary node, the primary node id is
+      returned.
+
+    - change the Makefile under the directory src/sql/, that is proposed (Bo Peng)
+      by [pgpool-hackers: 1611]
+
+    - Fix a posible hang during health checking (Yugo Nagata)
+
+      Helath checking was hang when any data wasn't sent
+      from backend after connect(2) succeeded. To fix this,
+      pool_check_fd() returns 1 when select(2) exits with
+      EINTR due to SIGALRM while health checkking is performed.
+
+      Reported and patch provided by harukat and some modification
+      by Yugo. Per bug #204.
+
+      backported from 3.4 or later;
+      https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commitdiff;h=ed9f2900f1b611f5cfd52e8f758c3616861e60c0
+
+    - Fix bug with load balance node id info on shmem (Tatsuo Ishii)
+
+      There are few places where the load balance node was mistakenly put on
+      wrong place. It should be placed on: ConnectionInfo *con_info[child
+      id, connection pool_id, backend id].load_balancing_node].  In fact it
+      was placed on: *con_info[child id, connection pool_id,
+      0].load_balancing_node].
+
+      As long as the backend id in question is 0, it is ok. However while
+      testing pgpool-II 3.6's enhancement regarding failover, if primary
+      node is 1 (which is the load balance node) and standby is 0, a client
+      connecting to node 1 is disconnected when failover happens on node
+      0. This is unexpected and the bug was revealed.
+
+      It seems the bug was there since long time ago but it had not found
+      until today by the reason above.
+
+
  ===============================================================================
  
                          3.2.15 (namameboshi) 2016/04/26
diff --git a/configure b/configure

index 783cca77936bcfa11674aa8deed60a06d57e0874..7291d90dd8358e82bc31283b6fecd4a7e1b8051d 100755 (executable)
--- a/configure
+++ b/configure
@@ -3976,7 +3976,7 @@ fi
  
  # Define the identity of the package.
   PACKAGE=pgpool-II
- VERSION=3.2.15
+ VERSION=3.2.16
  
  
  cat >>confdefs.h <<_ACEOF
diff --git a/configure.in b/configure.in

index d0bd29da002f9b193ee3e562fbbf0f8fc669b418..ede5c57a3b16d10fd03da8bcff06d4ab58b6d56b 100644 (file)
--- a/configure.in
+++ b/configure.in
@@ -4,7 +4,7 @@ AC_INIT
  dnl Checks for programs.
  AC_PROG_CC
  
-AM_INIT_AUTOMAKE(pgpool-II, 3.2.15)
+AM_INIT_AUTOMAKE(pgpool-II, 3.2.16)
  # AC_PROG_RANLIB
  AC_PROG_LIBTOOL
  
diff --git a/doc/pgpool-ja.html b/doc/pgpool-ja.html

index 21629799041d9ad0ae8a65838435859673650aac..4b20b9c40e6799751e2338ccd3914f9012546944 100644 (file)
--- a/doc/pgpool-ja.html
+++ b/doc/pgpool-ja.html
@@ -8,7 +8,7 @@
  <body>
  
  <!-- hhmts start -->
-Last modified: Fri Sep 5 22:40:36 JST 2014
+Last modified: Fri June 17 22:40:36 JST 2016
  <!-- hhmts end -->
  
  <body bgcolor="#ffffff">
@@ -5522,6 +5522,150 @@ SELECTの最終実行ステータスとパフォーマンスのおおよその
  <!-- 3.2                                                                              -->
  <!-- ================================================================================ -->
  
+<!-- -------------------------------------------------------------------------------- -->
+<h2><a name="release3.2.16"></a>3.2.16 (namameboshi) 2016/06/17</h2>
+<!-- -------------------------------------------------------------------------------- -->
+
+<h3>概要</h3>
+<p>
+このバージョンは 3.2.15 に対するバグ修正リリースです。
+</p>
+
+<h3>バグ修正</h3>
+<ul>
+
+<li>
+    ヘルスチェックを実施中の間でも、pgpool に接続ができるようになりました。
+    <p>
+    今まではダウンしたノードに対してヘルスチェックを行っている間は、たとえ fail_over_on_backend_error が off になっていても、pgpoolに接続することができませんでした。
+    </p>
+    <p>
+    これは、pgpool の子プロセスがダウンしているノードを含めてすべてのバックエンドに接続を試み、一つでも接続に失敗したら終了してしまうからです(もちろんこの場合は接続に失敗します)。これは一時的な状態で、pgpool がフェイルオーバーを完了すれば問題は起きません。
+    </p>
+    <p>
+    しかし、ヘルスチェックがリトライを実施している間は、health_check_max_retries とhealth_check_retry_delay の設定によっては長時間、このような一時的な状態が続きます。
+    </p>
+    <p>
+    今回の修正では、以下のようにしてこの問題が解決されました。
+    </p>
+    <p>
+    - ストリーミングレプリケーションモードでは、pgpoolの子プロセスがバックエンドへの接続に失敗すると、そこで終了してしまうのではなく、失敗したバックエンドノードがプライマリーノードでない限りそのノードをスキップして次のノードに接続を試みます。
+    </p>
+    <p>
+    　- この場合、その子プロセスはそのノードを「ダウン中」とローカルに記憶します。ローカルにダウン中とマークされたノードは、ロードバランスの対象から外れます。
+    </p>
+    <p>
+    - セッションが終了するとその子プロセスは終了し、ローカルにマークした「ダウン中」の状態を新しいセッションでも継続しないようにします。これは、ヘルスチェックのリトライの結果、一時的にダウン中のノードが復活することがありえるからです。
+    </p>
+    <p>
+    詳しくは [pgpool-hackers: 1531] を参照してください。
+    </p>
+
+    </p>
+</li>
+
+<li>
+    is_set_transaction_serializable() 関数のSET default_transaction_isolation TO 'serializable' のバグを修正しました。(Bo Peng)
+    <p>
+    pgpool は SET default_transaction_isolation TO 'serializable'　をプライマリだけではなく、スタンバイにも送信してしまい、エラーが起きていました。
+    </p>
+    <p>
+    この修正で、streaming replication モードの場合、SET default_transaction_isolation TO 'serializable'がプライマリサーバのみに送信されます。
+    </p>
+    <p>
+    bug #191 の報告によります。
+    </p>
+</li>
+
+<li>
+    ドキュメントのraw モードに関する内容の誤りを修正しました。(Yugo Nagata, Bo Peng)raw モードの場合、コネクションプーリングが有効です。
+</li>
+
+<li>
+    pgpool.confの不正確なコメントを修正しました。(Tatsuo Ishii)
+</li>
+
+<li>
+    pgpool が 複数 SSL cipher protocols に対応させるように修正しました。(Muhammad Usama)
+    <p>
+    今まで TLSv1_method() を使って、SSL contextを初期化していました。そのため、SSL通信で TLSv1 protocol のみに対応するという制約がありました。この修正で、上記制約をなくし、SSLv23_method() を使って SSLSession を初期化するように修正しました（PostgreSQL と同じように）。この関数が特定バージョンのプロトコルを利用することではなく、互換性のあるプロトコルの最新バージョンを利用できるからです。
+    </p>
+</li>
+
+<li>
+    バックエンドのステートメントタイムアウトが有効な場合、do_query() が最初のクエリをプライマリノードに送信し、それ以降のユーザクエリをスタンバイノードに送信します。例えば、END コマンドの場合、プライマリノードのステートメントタイムアウトを引き起こし、kind mismatch error が発生する可能性がありました。
+    <p>
+    この問題を軽減するために、do_query() がフラッシュメッセージを送信する代わりに、sync メッセージ送信するように修正しました。sync メッセージを送信することで、明示的トランザクションの場合、ステートメントタイムアウトタイマーがリセットされます。unnamed portal が存在する場合、sync メッセージがunnamed portal を削除しますので、暗黙的トランザクションの場合には適用しません。
+    </p>
+    <p>
+    また、pg_stat_statement が do_query() が発行したクエリを "running" で表示しなくなります。
+    </p>
+    <p>
+    bug #194 の報告によります。
+    </p>
+</li>
+
+<li>
+    streaming replication モードで、プライマリノードが 0 ではない場合に発生するバグを修正しました。 (Tatsuo Ishii)
+    <blockquote>
+        <a href="http://www.pgpool.net/mantisbt/view.php?id=194#c837の報告により、">
+    http://www.pgpool.net/mantisbt/view.php?id=194#c837の報告により、
+    </a><br />
+bug194-3.3.diff を適用しても、プライマリノードが 0 ではない場合、<br />
+ステートメントタイムアウトが発生する可能性がありました。<br />
+調査した結果、MASTER マクロがプライマリノードまたはロードバランスノード<br />
+以外のノードを返したからです。そのため、 do_query() がクエリを間違った<br />
+ノードに送信してしまいました（これは報告で確認されていませんが、<br />
+調査で確認できました）。
+    </blockquote>
+    <blockquote>
+    MASTER マクロと呼ばれる pool_virtual_master_db_node_id() が、<br />
+クエリコンテキストが存在する場合、query_context->virtual_master_node_id<br />
+を返します。変数がセットされていない場合、この関数が間違ったノード<br />
+を返す可能性があります。そのため、pool_virtual_master_db_node_id()　<br />
+関数を以下のように修正しました。戻り値がプライマリノードまたはロード<br />
+バランスノードではない場合、プライマリノードを返します。
+    </blockquote>
+</li>
+
+<li>
+    src/sql/ 配下の Makefile を修正しました。 (pengbo)詳しくは [pgpool-hackers: 1611] を参照してください。
+</li>
+
+<li>
+    ヘルスチェックで発生し得るハングアップを修正しました。 (Yugo Nagata)
+    <p>
+    connect(2) が成功し、その後バックエンドからデータが送信されない場合、ヘルスチェックがハングしていました。ヘルスチェックが行われると、select(2)にSIGALRMを送信し、EINTRで抜けて、pool_check_fd() は 1 を返すように修正しました。
+    </p>
+    <p>
+    パッチはバグ報告者によって作成され、Yugo により修正されました。
+    </p>
+    <p>
+    bug #204 の報告によります。
+    </p>
+</li>
+
+<li>
+    共有メモリ上のロードバランスノードの書き込みに関するバグを修正しました。(Tatsuo Ishii)
+    <p>
+    領域が少ないため、ロードバランスノードは間違ったことろに置かれてしまいました。
+    </p>
+    <p>
+    [正しい場所]ConnectionInfo *con_info[child id, connection pool_id, backend id].load_balancing_node].
+    </p>
+    <p>
+    [実際に置かれた場所]*con_info[child id, connection pool_id, 0].load_balancing_node].
+    </p>
+    <p>
+    バックエンドid が 0 の場合、上記バグが発生しませんが、 pgpool-II 3.6 のフェイルオーバーテストで、プライマリノードが 1 (ロードバランスノード)、スタンバイノードが 0 の場合、ノード1 の接続が切断され、フェイルオーバーが起きています。これは想定外のバグです。
+    </p>
+    <p>
+    このバグが前からありましたが、上記の原因で、今まで見つかっておリませんでした。
+    </p>
+</li>
+
+
+</ul>
  <!-- -------------------------------------------------------------------------------- -->
  <h2><a name="release3.2.15"></a>3.2.15 (namameboshi) 2016/04/26</h2>
  <!-- -------------------------------------------------------------------------------- -->
author	pengbo <pengbo@sraoss.co.jp>
	Fri, 17 Jun 2016 09:04:47 +0000 (18:04 +0900)
committer	pengbo <pengbo@sraoss.co.jp>
	Fri, 17 Jun 2016 09:04:47 +0000 (18:04 +0900)
NEWS		patch \| blob \| blame \| history
configure		patch \| blob \| blame \| history
configure.in		patch \| blob \| blame \| history
doc/pgpool-ja.html		patch \| blob \| blame \| history