Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: cannot execute SQL commands in WAL sender for physical replication #1979

Closed
MasonXon opened this issue Jun 22, 2021 · 3 comments
Closed

Comments

@MasonXon
Copy link

Describe the bug
In my pgsql logs, I found the following logs, l know about why the error occurs,maybe the health check SQL input the replication sessions.
2021-06-22 07:59:15.003 CST [24211] ERROR: cannot execute SQL commands in WAL sender for physical replication
2021-06-22 07:59:24.767 CST [23871] LOG: statement: SELECT CASE WHEN pg_catalog.pg_is_in_recovery() THEN 0 ELSE ('x' || pg_catalog.substr(pg_catalog.pg_walfile_name(pg_catalog.pg_current_wal_lsn()), 1, 8))::bit(32)::int END, CASE WHEN pg_catalog.pg_is_in_recovery() THEN 0 ELSE pg_catalog.pg_wal_lsn_diff(pg_catalog.pg_current_wal_lsn(), '0/0')::bigint END, pg_catalog.pg_wal_lsn_diff(pg_catalog.pg_last_wal_replay_lsn(), '0/0')::bigint, pg_catalog.pg_wal_lsn_diff(COALESCE(pg_catalog.pg_last_wal_receive_lsn(), '0/0'), '0/0')::bigint, pg_catalog.pg_is_in_recovery() AND pg_catalog.pg_is_wal_replay_paused(), 0, CASE WHEN latest_end_lsn IS NULL THEN NULL ELSE received_tli END, slot_name, conninfo FROM pg_catalog.pg_stat_get_wal_receiver()

To Reproduce
Steps to reproduce the behavior:

Expected behavior
health check SQL should input the session for the normal, but not replication session

Screenshots
image

Environment

  • Patroni version: 2.0.2
  • PostgreSQL version: PostgreSQL 12.7
  • DCS (and its version): Consul v1.9.6

Patroni configuration file

############### General/Global ########################################
scope: pg-cluster
namespace: /pg/
name: pgsql-01

############### RestAPI ###############################################
restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.37.129.3:8008
  authentication:
    username: postgres
    password: "123456"
  http_extra_headers:
    'X-Frame-Options': 'SAMEORIGIN'
    'X-XSS-Protection': '1; mode=block'
    'X-Content-Type-Options': 'nosniff'
  https_extra_headers:
    'Strict-Transport-Security': 'max-age=31536000; includeSubDomains'

############### Consul ################################################
consul:
  host: 127.0.0.1:8500
  scheme: http
  verify: false
  dc: my-dc-1
  consistency: default
  check: false
  register_service: true
  service_tags:
    - pg-cluster-consul
  service_check_interval: 15s

############### Ctl ##################################################
ctl:
  insecure: false

############### Log ##################################################
log:
  level: INFO
  traceback_level: INFO
  format: '%(asctime)s %(levelname)s: %(message)s'
  dateformat: '%Y-%m-%d %H:%M:%S %z'
  max_queue_size: 1000
  dir: /tmp/
  file_num: 10
#  file_size: 10485760

############### Bootstrap ############################################
bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    maximum_lag_on_syncnode: 1048576
    max_timeline_history: 10
    master_start_timeout: 300
    master_stop_timeout: 300
    synchronous_mode: false
    synchronous_mode_strict: false
    # standby_cluster:
    #   host: 127.0.0.1
    #   port: 1111
    #   primary_slot_name: patroni
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: hot_standby
        hot_standby: "on"
        max_connections: 300
        max_worker_processes: 8
        wal_keep_segments: 8
        max_wal_senders: 10
        max_replication_slots: 10
        max_prepared_transactions: 0
        max_locks_per_transaction: 64
        wal_log_hints: "on"
        track_commit_timestamp: "off"
        archive_mode: "on"
        archive_timeout: 1800s
        archive_command: mkdir -p ../wal_archive && test ! -f ../wal_archive/%f && cp %p ../wal_archive/%f
      recovery_conf:
        restore_command: cp ../wal_archive/%f %p
    slots:
      my_slot:
        type: physical

  method: initdb

  initdb:
    - encoding: UTF8
    - locale: C
    - lc-collate: C
    - lc-ctype: en_US.UTF8
    - data-checksums

  pg_hba:
    # "local" is for Unix domain socket connections only
    - local   all             all                                     trust
    # IPv4 local connections:
    - host    all             all             127.0.0.1/32            trust
    # IPv6 local connections:
    - host    all             all             ::1/128                 trust
    # Allow replication connections from localhost
    - local   replication     all                                     trust
    - host    replication     all             127.0.0.1/32            trust
    - host    replication     all             ::1/128                 trust
    # Allow replication connection from subnet
    - host    replication     replicator      10.37.129.0/24          md5
    # Allow other connections
    - host    all             all             10.37.129.0/24          md5

  users:
    gitlab:
      password: "123456"
      options:
        - createdb
        - createrole

  # post_init: /usr/local/bin/setup_cluster.sh

############### Postgresql ###########################################
postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.37.129.3:5432
  data_dir: /var/lib/pgsql/12/data
  bin_dir: /usr/pgsql-12/bin
  config_dir: /var/lib/pgsql/12/data
  pgpass: /var/lib/pgsql/12/.pgpass

  custom_conf: /var/lib/my_custom.conf

  # recovery_conf:

  create_replica_methods:
    - basebackup
  basebackup:
    - max-rate: '1000M'
    - checkpoint: fast
    - status-interval: 1s
    - verbose
    - progress

  use_unix_socket: true
  use_unix_socket_repl: true
  use_pg_rewind: true
  pg_ctl_timeout: 60
  remove_data_directory_on_rewind_failure: true
  remove_data_directory_on_diverged_timelines: true

  parameters:
    logging_collector: "on"
    log_filename: "postgresql-%Y-%m-%d.log"
    log_statement: "all"
    log_replication_commands: "on"
    timezone: "Asia/Shanghai"
    log_timezone: "PRC"

  authentication:
    replication:
      username: replicator
      password: "123456"
    superuser:
      username: postgres
      password: "123456"
    rewind:
      username: rewind-user
      password: "123456"

  pg_hba:
    # "local" is for Unix domain socket connections only
    - local   all             all                                     trust
    # IPv4 local connections:
    - host    all             all             127.0.0.1/32            trust
    # IPv6 local connections:
    - host    all             all             ::1/128                 trust
    # Allow replication connections from localhost
    - local   replication     all                                     trust
    - host    replication     all             127.0.0.1/32            trust
    - host    replication     all             ::1/128                 trust
    # Allow replication connection from subnet
    - host    replication     replicator      10.37.129.0/24          scram-sha-256
    # Allow other connections
    - host    all             all             10.37.129.0/24          scram-sha-256

  pg_ident:
    - gitlab git gitlab

watchdog:
  mode: off
  device: /dev/watchdog
  safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

patronictl show-config

# patronictl show-config
loop_wait: 10
master_start_timeout: 300
master_stop_timeout: 300
max_timeline_history: 10
maximum_lag_on_failover: 1048576
maximum_lag_on_syncnode: 1048576
postgresql:
  parameters:
    archive_command: mkdir -p ../wal_archive && test ! -f ../wal_archive/%f && cp %p ../wal_archive/%f
    archive_mode: 'on'
    archive_timeout: 1800s
    hot_standby: 'on'
    log_timezone: PRC
    max_connections: 300
    max_locks_per_transaction: 64
    max_prepared_transactions: 0
    max_replication_slots: 10
    max_wal_senders: 10
    max_worker_processes: 8
    track_commit_timestamp: 'off'
    wal_keep_segments: 8
    wal_level: hot_standby
    wal_log_hints: 'on'
  recovery_conf:
    restore_command: cp ../wal_archive/%f %p
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
slots:
  my_slot:
    type: physical
synchronous_mode: true
synchronous_mode_strict: false
synchronous_node_count: 1
ttl: 30```

**Have you checked Patroni logs?**
2021-06-22 08:05:05 +0800 INFO: Lock owner: pgsql-02; I am pgsql-01
2021-06-22 08:05:05 +0800 INFO: does not have lock
2021-06-22 08:05:05 +0800 INFO: Local timeline=16 lsn=0/19000148
2021-06-22 08:05:05 +0800 ERROR: Exception when working with master via replication connection
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/rewind.py", line 188, in _check_timeline_and_lsn
    cur.execute('IDENTIFY_SYSTEM')
psycopg2.errors.InternalError_: cannot execute SQL commands in WAL sender for physical replication

**Have you checked PostgreSQL logs?**
2021-06-22 08:05:29.685 CST [30674] LOG:  statement: SELECT pg_catalog.to_char(pg_catalog.pg_postmaster_start_time(), 'YYYY-MM-DD HH24:MI:SS.MS TZ'), CASE WHEN pg_catalog.pg_is_in_recovery() THEN 0 ELSE ('x' || pg_catalog.substr(pg_catalog.pg_walfile_name(pg_catalog.pg_current_wal_lsn()), 1, 8))::bit(32)::int END, CASE WHEN pg_catalog.pg_is_in_recovery() THEN 0 ELSE pg_catalog.pg_wal_lsn_diff(pg_catalog.pg_current_wal_lsn(), '0/0')::bigint END, pg_catalog.pg_wal_lsn_diff(pg_catalog.pg_last_wal_replay_lsn(), '0/0')::bigint,  pg_catalog.pg_wal_lsn_diff(COALESCE(pg_catalog.pg_last_wal_receive_lsn(), '0/0'), '0/0')::bigint, pg_catalog.pg_is_in_recovery() AND pg_catalog.pg_is_wal_replay_paused(), pg_catalog.to_char(pg_catalog.pg_last_xact_replay_timestamp(), 'YYYY-MM-DD HH24:MI:SS.MS TZ'), pg_catalog.array_to_json(pg_catalog.array_agg(pg_catalog.row_to_json(ri))) FROM (SELECT (SELECT rolname FROM pg_authid WHERE oid = usesysid) AS usename, application_name, client_addr, w.state, sync_state, sync_priority FROM pg_catalog.pg_stat_get_wal_senders() w, pg_catalog.pg_stat_get_activity(pid)) AS ri
2021-06-22 08:05:29.689 CST [31040] ERROR:  cannot execute SQL commands in WAL sender for physical replication

**Have you tried to use GitHub issue search?**
Sorry,I didn't find the answer

**Additional context**
[root@pgsql-03 log]# patronictl list
+ Cluster: pg-cluster (6976360105259810136) ------+----+-----------+
| Member   | Host        | Role         | State   | TL | Lag in MB |
+----------+-------------+--------------+---------+----+-----------+
| pgsql-01 | 10.37.129.3 | Replica      | running | 16 |         0 |
| pgsql-02 | 10.37.129.4 | Leader       | running | 16 |           |
| pgsql-03 | 10.37.129.5 | Sync Standby | running | 16 |         0 |
+----------+-------------+--------------+---------+----+-----------+
@MasonXon
Copy link
Author

MasonXon commented Jun 22, 2021

This problems maybe not affect the replication to sync data,but if I configure the monitor,this will cause the alarm
So please insure that if this is a bug, or it's the misconfig by myself
Thanks

@CyberDem0n
Copy link
Member

Duplicate of #1977 and #1969

@honglei
Copy link

honglei commented Aug 11, 2021

psycopg/psycopg2#941

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants