High Availability and Clustering

Mark as read

Linux高可用性とクラスタリング完全ガイド

はじめに
高可用性の基礎概念
Pacemaker/Corosync アーキテクチャ
- 3.1 コンポーネント概要
- 3.2 CRM, CIB, PE, LRM
クラスタのインストールと構成
- 4.1 前提条件
- 4.2 パッケージインストール
- 4.3 クラスタの初期設定
pcs / crm コマンド
- 5.1 pcs コマンド
- 5.2 crm コマンド
リソースエージェント
リソース制約
- 7.1 ロケーション制約
- 7.2 コロケーション制約
- 7.3 オーダリング制約
DRBD (Distributed Replicated Block Device)
フローティング/仮想IPアドレス
HAProxy によるロードバランシング
Keepalived (VRRP)
Active-Passive vs Active-Active クラスタ
共有ファイルシステム (GFS2/OCFS2)
フェンシング設定
クラスタ監視
実践HAシナリオ
- 16.1 Webサーバクラスタ
- 16.2 データベースクラスタ
トラブルシューティング
ベストプラクティス
参考資料

1. はじめに

高可用性（High Availability, HA）とは、システムの稼働率を最大化するための設計手法と技術の総称である。単一障害点（Single Point of Failure, SPOF）を排除し、コンポーネントの障害時にもサービスを継続提供することを目的とする。

本ドキュメントでは、Linuxにおける高可用性クラスタの概念、アーキテクチャ、構築手順、そして実践的な構成例を包括的に解説する。

2. 高可用性の基礎概念

2.1 フェイルオーバーとフェイルバック

┌─────────────────────────────────────────────────────┐
│              フェイルオーバー / フェイルバック          │
│                                                      │
│  正常時:                                             │
│  [Node1: Active] ←── クライアント                    │
│  [Node2: Standby]                                    │
│                                                      │
│  フェイルオーバー後:                                  │
│  [Node1: Failed/Down]                                │
│  [Node2: Active] ←── クライアント                    │
│                                                      │
│  フェイルバック後:                                    │
│  [Node1: Active] ←── クライアント（復旧後に戻る）    │
│  [Node2: Standby]                                    │
└─────────────────────────────────────────────────────┘

用語	説明
フェイルオーバー	プライマリノード障害時にセカンダリノードがサービスを引き継ぐ
フェイルバック	プライマリノード復旧後にサービスを元のノードに戻す
自動フェイルオーバー	障害検知後に自動的にフェイルオーバーが実行される
手動フェイルオーバー	管理者の判断でフェイルオーバーを実行する

2.2 クォーラムとスプリットブレイン

┌─────────────────────────────────────────────────────┐
│                  クォーラム（定足数）                   │
│                                                      │
│  3ノードクラスタの場合:                               │
│  - 過半数 = 2ノード以上                              │
│  - 2ノードが通信可能 → クォーラムあり → サービス継続  │
│  - 1ノードのみ → クォーラムなし → サービス停止        │
│                                                      │
│  スプリットブレイン:                                  │
│  ┌───────┐    ✕ ネットワーク断 ✕    ┌───────┐       │
│  │Node1  │◄─────────────────────────►│Node2  │       │
│  │Active │    通信不能               │Active │       │
│  └───────┘                           └───────┘       │
│  両ノードがActiveと認識 → データ不整合の危険          │
└─────────────────────────────────────────────────────┘

クォーラム計算:

ノード数	過半数	許容障害ノード数
2	2 (特殊対応が必要)	0
3	2	1
4	3	1
5	3	2
7	4	3

2.3 フェンシングとSTONITH

# フェンシング = 障害ノードを確実にクラスタから隔離する仕組み
# STONITH = Shoot The Other Node In The Head
# - 障害ノードを強制的に停止/再起動して、データ破損を防ぐ

# フェンシング方法:
# 1. 電源フェンシング: IPMI/iLO/iDRAC で電源を切断
# 2. ストレージフェンシング: 共有ストレージへのアクセスを遮断
# 3. ネットワークフェンシング: ネットワークを切断

# フェンシングが必要な理由:
# - スプリットブレインの防止
# - 共有データの一貫性保証
# - リソースの二重起動防止

2.4 可用性レベル

レベル	可用性	年間ダウンタイム	例
99%	"Two Nines"	3.65日	バッチシステム
99.9%	"Three Nines"	8.76時間	社内システム
99.99%	"Four Nines"	52.6分	ECサイト
99.999%	"Five Nines"	5.26分	金融システム
99.9999%	"Six Nines"	31.5秒	通信インフラ

3. Pacemaker/Corosync アーキテクチャ

3.1 コンポーネント概要

┌─────────────────────────────────────────────────────┐
│            Pacemaker/Corosync アーキテクチャ          │
│                                                      │
│  ┌──────────────────────────────────────┐            │
│  │           Pacemaker (CRM)            │            │
│  │  ┌─────┐  ┌─────┐  ┌─────┐         │            │
│  │  │ CIB │  │ PE  │  │ CRMd│         │            │
│  │  └──┬──┘  └──┬──┘  └──┬──┘         │            │
│  │     │        │        │              │            │
│  │     └────────┼────────┘              │            │
│  │              │                       │            │
│  │  ┌───────────┴──────────┐            │            │
│  │  │    LRM (LRMd)        │            │            │
│  │  │  Resource Agents      │            │            │
│  │  │  (OCF, LSB, Systemd)  │            │            │
│  │  └───────────────────────┘            │            │
│  └──────────────────────────────────────┘            │
│                    │                                  │
│  ┌──────────────────────────────────────┐            │
│  │         Corosync (通信層)            │            │
│  │  - クラスタメンバーシップ             │            │
│  │  - メッセージング                    │            │
│  │  - クォーラム管理                    │            │
│  └──────────────────────────────────────┘            │
└─────────────────────────────────────────────────────┘

3.2 CRM, CIB, PE, LRM

コンポーネント	名称	役割
CRM	Cluster Resource Manager	クラスタリソースの全体管理
CIB	Cluster Information Base	クラスタ構成情報のXMLデータベース
PE	Policy Engine	CIBの情報に基づき最適なリソース配置を計算
CRMd	CRM Daemon	PEの決定を実行するデーモン
LRM	Local Resource Manager	各ノードでリソースエージェントを実行
STONITHd	STONITH Daemon	フェンシング操作を実行

4. クラスタのインストールと構成

4.1 前提条件

# 環境:
# node1: 192.168.1.101 (ha-node1)
# node2: 192.168.1.102 (ha-node2)
# VIP:   192.168.1.100

# 1. ホスト名の設定
$ sudo hostnamectl set-hostname ha-node1    # node1
$ sudo hostnamectl set-hostname ha-node2    # node2

# 2. /etc/hosts の設定（両ノード）
$ cat /etc/hosts
192.168.1.101  ha-node1
192.168.1.102  ha-node2

# 3. 時刻同期の確認
$ chronyc tracking
Reference ID    : A1B2C3D4 (ntp.example.com)
Stratum         : 3

# 4. ファイアウォール設定
$ sudo firewall-cmd --permanent --add-service=high-availability
$ sudo firewall-cmd --reload
# または個別ポート
$ sudo firewall-cmd --permanent --add-port=2224/tcp    # pcsd
$ sudo firewall-cmd --permanent --add-port=3121/tcp    # pacemaker
$ sudo firewall-cmd --permanent --add-port=5403/tcp    # corosync-qnetd
$ sudo firewall-cmd --permanent --add-port=5404-5405/udp  # corosync
$ sudo firewall-cmd --reload

4.2 パッケージインストール

# RHEL/CentOS/Rocky（両ノードで実行）
$ sudo dnf install pcs pacemaker corosync fence-agents-all

# Ubuntu/Debian（両ノードで実行）
$ sudo apt install pacemaker corosync pcs crmsh fence-agents

# pcsd の起動と有効化
$ sudo systemctl enable --now pcsd

# hacluster ユーザーのパスワード設定（両ノード）
$ sudo passwd hacluster
New password: <パスワード入力>

4.3 クラスタの初期設定

# node1 から実行

# 1. ノードの認証
$ sudo pcs host auth ha-node1 ha-node2
Username: hacluster
Password:
ha-node1: Authorized
ha-node2: Authorized

# 2. クラスタの作成
$ sudo pcs cluster setup ha-cluster ha-node1 ha-node2
Destroying cluster on nodes: ha-node1, ha-node2...
Sending 'pacemaker_remote authkey' to 'ha-node1', 'ha-node2'
Sending cluster config files to the nodes...
Synchronizing pcsd certificates on nodes ha-node1, ha-node2...

# 3. クラスタの起動
$ sudo pcs cluster start --all
ha-node1: Starting Cluster...
ha-node2: Starting Cluster...

# 4. クラスタの自動起動設定
$ sudo pcs cluster enable --all

# 5. クラスタ状態の確認
$ sudo pcs cluster status
Cluster Status:
 Cluster Summary:
   * Stack: corosync
   * Current DC: ha-node1 (version 2.1.5-1) - partition with quorum
   * Last updated: Mon Jan 15 10:00:00 2024
   * 2 nodes configured
   * 0 resource instances configured

 Node List:
   * Online: [ ha-node1 ha-node2 ]

# 6. 2ノードクラスタの特別設定
# STONITHの一時無効化（テスト環境のみ、本番では必須）
$ sudo pcs property set stonith-enabled=false

# 2ノードクラスタではクォーラムポリシーを無視
$ sudo pcs property set no-quorum-policy=ignore

5. pcs / crm コマンド

5.1 pcs コマンド

# クラスタ状態
$ sudo pcs status
Cluster name: ha-cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha-node1 - partition with quorum
  * 2 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ ha-node1 ha-node2 ]

Full List of Resources:
  * VirtualIP  (ocf:heartbeat:IPaddr2): Started ha-node1
  * WebServer  (systemd:httpd):         Started ha-node1
  * WebData    (ocf:heartbeat:Filesystem): Started ha-node1

# リソースの作成
$ sudo pcs resource create VirtualIP ocf:heartbeat:IPaddr2 \
    ip=192.168.1.100 cidr_netmask=24 op monitor interval=30s

$ sudo pcs resource create WebServer systemd:httpd \
    op monitor interval=30s timeout=60s

# リソースのグループ化
$ sudo pcs resource group add WebGroup VirtualIP WebServer

# リソースの移動
$ sudo pcs resource move VirtualIP ha-node2

# リソースの無効化/有効化
$ sudo pcs resource disable WebServer
$ sudo pcs resource enable WebServer

# リソースのクリーンアップ（失敗状態のリセット）
$ sudo pcs resource cleanup WebServer

# プロパティの設定
$ sudo pcs property set maintenance-mode=true    # メンテナンスモード
$ sudo pcs property set maintenance-mode=false

# ノードのスタンバイ
$ sudo pcs node standby ha-node1
$ sudo pcs node unstandby ha-node1

# クラスタ設定の表示
$ sudo pcs config show

# クラスタ設定のバックアップ/リストア
$ sudo pcs config backup cluster-backup
$ sudo pcs config restore cluster-backup.tar.bz2

5.2 crm コマンド

# crm シェル（SUSE/Ubuntu向け、crmshパッケージ）
$ sudo crm status
$ sudo crm configure show

# リソースの作成
$ sudo crm configure primitive VirtualIP ocf:heartbeat:IPaddr2 \
    params ip=192.168.1.100 cidr_netmask=24 \
    op monitor interval=30s

# グループの作成
$ sudo crm configure group WebGroup VirtualIP WebServer

# 制約の設定
$ sudo crm configure colocation web-with-ip inf: WebServer VirtualIP
$ sudo crm configure order ip-before-web mandatory: VirtualIP WebServer

# ノードのスタンバイ
$ sudo crm node standby ha-node1
$ sudo crm node online ha-node1

# リソースの移動
$ sudo crm resource move VirtualIP ha-node2
$ sudo crm resource unmove VirtualIP

# 設定の編集（テキストエディタ）
$ sudo crm configure edit

6. リソースエージェント

# 利用可能なリソースエージェント一覧
$ sudo pcs resource agents
ocf:heartbeat:IPaddr2
ocf:heartbeat:Filesystem
ocf:heartbeat:apache
ocf:heartbeat:mysql
ocf:heartbeat:pgsql
ocf:heartbeat:nginx
systemd:httpd
systemd:mariadb
...

# 特定のリソースエージェントの詳細
$ sudo pcs resource describe ocf:heartbeat:IPaddr2
ocf:heartbeat:IPaddr2 - Manages virtual IPv4 and IPv6 addresses (Linuxspecific version)

Resource options:
  ip (required): The IPv4 (or IPv6) address to be configured in dotted quad
  cidr_netmask: The netmask for the interface in CIDR format
  nic: The base network interface on which the IP address will be brought online
  ...

# リソースエージェントの種類
# OCF (Open Cluster Framework): 最も機能的 (ocf:heartbeat:*, ocf:linbit:*)
# LSB (Linux Standard Base): /etc/init.d/ のスクリプト
# Systemd: systemd サービスユニット
# STONITH: フェンシングエージェント
# Nagios: Nagios プラグイン

リソースエージェントの種類比較

種類	パス	監視	パラメータ	推奨度
OCF	/usr/lib/ocf/resource.d/	高度	柔軟	最高
Systemd	systemd unit	基本	限定	高
LSB	/etc/init.d/	基本	なし	低

7. リソース制約

7.1 ロケーション制約

# 特定ノードでの実行を優先
$ sudo pcs constraint location VirtualIP prefers ha-node1=100
$ sudo pcs constraint location VirtualIP prefers ha-node2=50

# 特定ノードでの実行を禁止
$ sudo pcs constraint location VirtualIP avoids ha-node2=INFINITY

# ルールベースの制約
$ sudo pcs constraint location VirtualIP rule \
    score=500 '#uname' eq ha-node1

# 制約の一覧
$ sudo pcs constraint show
Location Constraints:
  Resource: VirtualIP
    Enabled on:
      Node: ha-node1 (score:100)
      Node: ha-node2 (score:50)

7.2 コロケーション制約

# 2つのリソースを同じノードで実行
$ sudo pcs constraint colocation add WebServer with VirtualIP INFINITY

# 2つのリソースを異なるノードで実行
$ sudo pcs constraint colocation add DBSlave with DBMaster -INFINITY

# コロケーション制約の確認
$ sudo pcs constraint colocation show
Colocation Constraints:
  WebServer with VirtualIP (score:INFINITY)

7.3 オーダリング制約

# リソースの起動順序を定義
$ sudo pcs constraint order VirtualIP then WebServer

# 詳細なオーダリング
$ sudo pcs constraint order start VirtualIP then start WebServer

# オプションの指定
$ sudo pcs constraint order VirtualIP then WebServer kind=Mandatory
$ sudo pcs constraint order VirtualIP then WebServer kind=Optional
$ sudo pcs constraint order VirtualIP then WebServer kind=Serialize

# オーダリング制約の確認
$ sudo pcs constraint order show
Ordering Constraints:
  start VirtualIP then start WebServer (kind:Mandatory)

8. DRBD (Distributed Replicated Block Device)

DRBDは、ネットワーク経由でブロックデバイスをリアルタイムにレプリケーションする技術である。

# インストール
$ sudo dnf install drbd drbd-utils kmod-drbd    # RHEL（ELRepo）
$ sudo apt install drbd-utils                     # Ubuntu

# DRBD設定ファイル
$ cat /etc/drbd.d/data.res
resource data {
    protocol C;           # 同期レプリケーション（A=非同期, B=準同期, C=同期）

    net {
        after-sb-0pri discard-younger-primary;
        after-sb-1pri discard-secondary;
        after-sb-2pri disconnect;
    }

    disk {
        on-io-error detach;
    }

    on ha-node1 {
        device /dev/drbd0;
        disk /dev/sdb1;
        address 192.168.1.101:7789;
        meta-disk internal;
    }

    on ha-node2 {
        device /dev/drbd0;
        disk /dev/sdb1;
        address 192.168.1.102:7789;
        meta-disk internal;
    }
}

# DRBDの初期化（両ノード）
$ sudo drbdadm create-md data

# DRBDの起動（両ノード）
$ sudo drbdadm up data

# プライマリノードの設定（node1で初回のみ）
$ sudo drbdadm primary --force data

# ファイルシステムの作成（プライマリノードで）
$ sudo mkfs.ext4 /dev/drbd0

# マウント
$ sudo mount /dev/drbd0 /mnt/data

# DRBD状態の確認
$ sudo drbdadm status
data role:Primary
  disk:UpToDate
  peer role:Secondary
    replication:Established peer-disk:UpToDate

$ cat /proc/drbd
version: 9.0.32-1 (api:2/proto:86-121)
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:1048576 nr:0 dw:0 dr:1049244 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1

DRBDのレプリケーションモード比較

モード	プロトコル	説明	パフォーマンス	データ安全性
Protocol A	非同期	ローカル書き込み完了で応答	最高	低
Protocol B	準同期	リモートバッファ到達で応答	高	中
Protocol C	同期	リモート書き込み完了で応答	低	最高

9. フローティング/仮想IPアドレス

# Pacemaker でVIPリソースを作成
$ sudo pcs resource create VirtualIP ocf:heartbeat:IPaddr2 \
    ip=192.168.1.100 \
    cidr_netmask=24 \
    nic=eth0 \
    op monitor interval=30s

# VIPの状態確認
$ sudo pcs status
  * VirtualIP  (ocf:heartbeat:IPaddr2): Started ha-node1

# VIPの確認
$ ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP>
    inet 192.168.1.101/24 brd 192.168.1.255 scope global eth0
    inet 192.168.1.100/24 brd 192.168.1.255 scope global secondary eth0

# VIPの手動移動
$ sudo pcs resource move VirtualIP ha-node2

# 移動後の制約を削除（重要）
$ sudo pcs resource clear VirtualIP

10. HAProxy によるロードバランシング

# インストール
$ sudo dnf install haproxy
$ sudo apt install haproxy

# HAProxy設定
$ cat /etc/haproxy/haproxy.cfg
global
    log         /dev/log local0
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon
    stats socket /var/lib/haproxy/stats

defaults
    mode        http
    log         global
    option      httplog
    option      dontlognull
    option      http-server-close
    option      forwardfor except 127.0.0.0/8
    retries     3
    timeout connect 5000
    timeout client  50000
    timeout server  50000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 503 /etc/haproxy/errors/503.http

# 統計ページ
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 5s
    stats auth admin:password

# Webサーバのロードバランシング
frontend web_front
    bind *:80
    bind *:443 ssl crt /etc/haproxy/certs/site.pem
    default_backend web_back
    option httplog

backend web_back
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    server web1 192.168.1.111:80 check inter 5s fall 3 rise 2
    server web2 192.168.1.112:80 check inter 5s fall 3 rise 2
    server web3 192.168.1.113:80 check inter 5s fall 3 rise 2 backup

# データベースのロードバランシング（L4）
frontend db_front
    bind *:3306
    mode tcp
    default_backend db_back

backend db_back
    mode tcp
    balance leastconn
    option mysql-check user haproxy
    server db-master 192.168.1.121:3306 check inter 5s
    server db-slave1 192.168.1.122:3306 check inter 5s backup
    server db-slave2 192.168.1.123:3306 check inter 5s backup

# HAProxyの起動
$ sudo systemctl enable --now haproxy

# 設定の検証
$ haproxy -c -f /etc/haproxy/haproxy.cfg
Configuration file is valid

# Pacemakerリソースとして管理
$ sudo pcs resource create HAProxy systemd:haproxy \
    op monitor interval=30s

ロードバランシングアルゴリズム比較

アルゴリズム	説明	用途
roundrobin	順番に振り分け	均等なサーバ性能の場合
leastconn	接続数が最小のサーバへ	長時間接続の場合
source	送信元IPでハッシュ	セッション維持
uri	URIでハッシュ	キャッシュの最適化
first	最初に見つかったサーバ	Active-Standby

11. Keepalived (VRRP)

# インストール
$ sudo dnf install keepalived
$ sudo apt install keepalived

# Keepalived設定 - MASTERノード
$ cat /etc/keepalived/keepalived.conf
global_defs {
    router_id LVS_MASTER
    enable_script_security
    script_user root
}

vrrp_script chk_haproxy {
    script "/usr/bin/systemctl is-active haproxy"
    interval 2
    weight -20
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1

    authentication {
        auth_type PASS
        auth_pass mypassword
    }

    virtual_ipaddress {
        192.168.1.100/24
    }

    track_script {
        chk_haproxy
    }

    notify_master "/usr/local/bin/keepalived-notify.sh master"
    notify_backup "/usr/local/bin/keepalived-notify.sh backup"
    notify_fault  "/usr/local/bin/keepalived-notify.sh fault"
}

# Keepalived設定 - BACKUPノード
$ cat /etc/keepalived/keepalived.conf
global_defs {
    router_id LVS_BACKUP
    enable_script_security
    script_user root
}

vrrp_script chk_haproxy {
    script "/usr/bin/systemctl is-active haproxy"
    interval 2
    weight -20
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 90
    advert_int 1

    authentication {
        auth_type PASS
        auth_pass mypassword
    }

    virtual_ipaddress {
        192.168.1.100/24
    }

    track_script {
        chk_haproxy
    }
}

# Keepalived通知スクリプト
$ cat /usr/local/bin/keepalived-notify.sh
#!/bin/bash
STATE=$1
case $STATE in
    "master")
        logger "Keepalived: Transitioning to MASTER state"
        systemctl start haproxy
        ;;
    "backup")
        logger "Keepalived: Transitioning to BACKUP state"
        ;;
    "fault")
        logger "Keepalived: Transitioning to FAULT state"
        systemctl stop haproxy
        ;;
esac

# Keepalivedの起動
$ sudo systemctl enable --now keepalived

12. Active-Passive vs Active-Active クラスタ

┌─────────────────────────────────────────────────────┐
│  Active-Passive (ホットスタンバイ)                    │
│                                                      │
│  ┌──────────┐     ┌──────────┐                      │
│  │  Node1   │     │  Node2   │                      │
│  │ [Active] │     │[Standby] │                      │
│  │ VIP: ✓   │     │ VIP: ✗   │                      │
│  │ App: ✓   │     │ App: ✗   │                      │
│  └──────────┘     └──────────┘                      │
│                                                      │
│  メリット: シンプル、データ一貫性が容易               │
│  デメリット: リソースが50%遊休                        │
└─────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│  Active-Active (負荷分散)                            │
│                                                      │
│  ┌──────────┐     ┌──────────┐                      │
│  │  Node1   │     │  Node2   │                      │
│  │ [Active] │     │ [Active] │                      │
│  │ VIP1: ✓  │     │ VIP2: ✓  │                      │
│  │ App: ✓   │     │ App: ✓   │                      │
│  └──────────┘     └──────────┘                      │
│       ↑                ↑                             │
│       └──── ロードバランサ ────┘                      │
│                                                      │
│  メリット: リソースの有効活用、高スループット          │
│  デメリット: 複雑、共有ストレージやデータ同期が必要   │
└─────────────────────────────────────────────────────┘

比較項目	Active-Passive	Active-Active
構成の複雑さ	低	高
リソース効率	50%	100%
スケーラビリティ	低	高
データ一貫性	容易	要対策
適用例	DB、ファイルサーバ	Webサーバ、APIサーバ

13. 共有ファイルシステム (GFS2/OCFS2)

# GFS2 (Global File System 2) - Red Hat
# 複数ノードから同時にマウントできるクラスタファイルシステム

# インストール
$ sudo dnf install gfs2-utils dlm

# DLM (Distributed Lock Manager) の設定
$ sudo pcs resource create dlm ocf:pacemaker:controld \
    op monitor interval=30s clone interleave=true ordered=true

# GFS2ファイルシステムの作成
$ sudo mkfs.gfs2 -p lock_dlm -t ha-cluster:gfs2data \
    -j 2 /dev/drbd0
# -p lock_dlm: DLMを使用
# -t cluster_name:fs_name: クラスタ名とFS名
# -j 2: ジャーナル数（ノード数と同じ）

# GFS2リソースの作成
$ sudo pcs resource create GFS2Data ocf:heartbeat:Filesystem \
    device=/dev/drbd0 directory=/mnt/shared fstype=gfs2 \
    op monitor interval=30s \
    clone interleave=true

# 制約の設定
$ sudo pcs constraint order dlm-clone then GFS2Data-clone
$ sudo pcs constraint colocation add GFS2Data-clone with dlm-clone

# OCFS2 (Oracle Cluster File System 2) - Oracle
$ sudo apt install ocfs2-tools
$ sudo mkfs.ocfs2 -N 2 /dev/drbd0

14. フェンシング設定

# フェンシングエージェントの一覧
$ sudo pcs stonith list
fence_apc           - Fence agent for APC
fence_idrac         - Fence agent for Dell iDRAC
fence_ilo           - Fence agent for HP iLO
fence_ipmilan       - Fence agent for IPMI
fence_vmware_rest   - Fence agent for VMware REST API
fence_xvm           - Fence agent for virtual machines

# IPMI フェンシングの設定
$ sudo pcs stonith create ipmi-fence-node1 fence_ipmilan \
    ipaddr=192.168.1.201 \
    login=admin \
    passwd=password \
    lanplus=1 \
    pcmk_host_list=ha-node1

$ sudo pcs stonith create ipmi-fence-node2 fence_ipmilan \
    ipaddr=192.168.1.202 \
    login=admin \
    passwd=password \
    lanplus=1 \
    pcmk_host_list=ha-node2

# フェンシングの有効化
$ sudo pcs property set stonith-enabled=true

# フェンシングのテスト
$ sudo pcs stonith fence ha-node2
Node: ha-node2 fenced

# フェンシング状態の確認
$ sudo pcs stonith show
 ipmi-fence-node1	(stonith:fence_ipmilan):	Started ha-node2
 ipmi-fence-node2	(stonith:fence_ipmilan):	Started ha-node1

# VMware環境のフェンシング
$ sudo pcs stonith create vmware-fence fence_vmware_rest \
    ipaddr=vcenter.example.com \
    ssl_insecure=1 \
    login=admin@vsphere.local \
    passwd=password \
    pcmk_host_map="ha-node1:vm-node1;ha-node2:vm-node2"

15. クラスタ監視

# リアルタイムクラスタ状態監視
$ sudo crm_mon -1
# または
$ sudo pcs status

# クラスタリソースの監視
$ watch -n 5 'sudo pcs status'

# Corosync の状態
$ sudo corosync-cmapctl | grep members
runtime.members.1.config_version (u64) = 0
runtime.members.1.ip (str) = r(0) ip(192.168.1.101)
runtime.members.1.join_count (u32) = 1
runtime.members.1.status (str) = joined
runtime.members.2.ip (str) = r(0) ip(192.168.1.102)
runtime.members.2.status (str) = joined

# クォーラム状態
$ sudo corosync-quorumtool
Quorum information
------------------
Date:             Mon Jan 15 10:00:00 2024
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          1
Ring ID:          1.8
Quorate:          Yes

# クラスタログ
$ sudo journalctl -u pacemaker -f
$ sudo journalctl -u corosync -f

# リソース障害履歴
$ sudo pcs resource failcount show

# CIBの確認（XML形式）
$ sudo cibadmin --query

16. 実践HAシナリオ

16.1 Webサーバクラスタ

# Active-Passive Webサーバクラスタの構築

# 1. リソースの作成
# VIP
$ sudo pcs resource create VirtualIP ocf:heartbeat:IPaddr2 \
    ip=192.168.1.100 cidr_netmask=24 op monitor interval=30s

# Apache/Nginx
$ sudo pcs resource create WebServer systemd:httpd \
    op monitor interval=30s timeout=60s \
    op start timeout=60s \
    op stop timeout=60s

# 共有ストレージ（DRBD使用時）
$ sudo pcs resource create WebData ocf:linbit:drbd \
    drbd_resource=webdata op monitor interval=30s \
    promotable promoted-max=1 promoted-node-max=1 \
    clone-max=2 clone-node-max=1 notify=true

$ sudo pcs resource create WebFS ocf:heartbeat:Filesystem \
    device=/dev/drbd0 directory=/var/www fstype=ext4 \
    op monitor interval=30s

# 2. リソースグループの作成
$ sudo pcs resource group add WebGroup VirtualIP WebFS WebServer

# 3. 制約の設定
$ sudo pcs constraint colocation add WebGroup with WebData-clone \
    INFINITY with-rsc-role=Master

$ sudo pcs constraint order promote WebData-clone \
    then start WebGroup

# 4. 確認
$ sudo pcs status
  * Resource Group: WebGroup:
    * VirtualIP  (ocf:heartbeat:IPaddr2): Started ha-node1
    * WebFS      (ocf:heartbeat:Filesystem): Started ha-node1
    * WebServer  (systemd:httpd): Started ha-node1
  * Clone Set: WebData-clone [WebData] (promotable):
    * Promoted:  [ ha-node1 ]
    * Unpromoted: [ ha-node2 ]

16.2 データベースクラスタ

# Active-Passive PostgreSQL クラスタ

# 1. PostgreSQL リソースエージェント
$ sudo pcs resource create PgSQL ocf:heartbeat:pgsql \
    pgctl="/usr/bin/pg_ctl" \
    psql="/usr/bin/psql" \
    pgdata="/var/lib/pgsql/data" \
    pgport="5432" \
    rep_mode="sync" \
    node_list="ha-node1 ha-node2" \
    restore_command="cp /backup/wal/%f %p" \
    primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \
    master_ip="192.168.1.100" \
    repuser="replicator" \
    op start timeout=60s \
    op stop timeout=60s \
    op promote timeout=60s \
    op demote timeout=60s \
    op monitor interval=15s timeout=10s \
    op monitor interval=10s timeout=10s role=Master \
    promotable promoted-max=1 promoted-node-max=1

# 2. VIPリソース
$ sudo pcs resource create PgVIP ocf:heartbeat:IPaddr2 \
    ip=192.168.1.100 cidr_netmask=24 op monitor interval=30s

# 3. 制約
$ sudo pcs constraint colocation add PgVIP with PgSQL-clone \
    INFINITY with-rsc-role=Master

$ sudo pcs constraint order promote PgSQL-clone then start PgVIP \
    symmetrical=false

$ sudo pcs constraint order demote PgSQL-clone then stop PgVIP \
    symmetrical=false

17. トラブルシューティング

よくある問題と解決策

# 1. リソースが起動しない
$ sudo pcs resource debug-start WebServer    # デバッグモードで起動
$ sudo pcs resource cleanup WebServer        # 失敗カウンタをリセット
$ sudo journalctl -u pacemaker | tail -50    # ログ確認

# 2. スプリットブレイン状態
# フェンシングが正常に動作しているか確認
$ sudo pcs stonith show
$ sudo stonith_admin -L    # フェンシングデバイス一覧

# 3. クォーラムが得られない
$ sudo corosync-quorumtool
$ sudo pcs property set no-quorum-policy=ignore    # 2ノードの場合

# 4. リソースが移動しない
$ sudo pcs constraint show    # 制約を確認
$ sudo pcs resource clear VirtualIP    # 移動制約を削除
$ sudo pcs resource cleanup VirtualIP

# 5. Corosync通信エラー
$ sudo corosync-cmapctl | grep members
$ sudo corosync-cfgtool -s    # リング状態確認
$ sudo firewall-cmd --list-all    # ファイアウォール確認

# 6. DRBD同期エラー
$ sudo drbdadm status
$ sudo drbdadm disconnect data
$ sudo drbdadm connect data
# 強制同期（データ不整合解消）
$ sudo drbdadm invalidate-remote data    # プライマリで実行

# 7. クラスタ設定のリセット（最終手段）
$ sudo pcs cluster destroy --all
# クラスタを最初から再構築

18. ベストプラクティス

設計のベストプラクティス

フェンシングを必ず設定する: 本番環境では STONITH を必ず有効にする。フェンシングなしのクラスタはデータ破損のリスクがある。
奇数ノードでクラスタを構成する: クォーラムの計算を容易にし、スプリットブレインのリスクを軽減する。
ネットワークの冗長化: クラスタ通信には専用の冗長ネットワークを使用する。
リソースの監視間隔を適切に設定する: 短すぎると負荷が高くなり、長すぎると障害検知が遅れる。
メンテナンス手順を確立する: メンテナンスモードの使用、リソースの手動移動手順を文書化する。

運用のベストプラクティス

# 1. 定期的なフェイルオーバーテスト
$ sudo pcs resource move WebGroup ha-node2
# テスト後に制約を削除
$ sudo pcs resource clear WebGroup

# 2. メンテナンス前の手順
$ sudo pcs property set maintenance-mode=true
# メンテナンス作業を実施
$ sudo pcs property set maintenance-mode=false

# 3. ノードメンテナンス
$ sudo pcs node standby ha-node1
# メンテナンス作業を実施
$ sudo pcs node unstandby ha-node1

# 4. 設定のバックアップ
$ sudo pcs config backup ha-config-$(date +%Y%m%d)

# 5. クラスタログの定期的な確認
$ sudo journalctl -u pacemaker --since "1 hour ago" --no-pager

コマンド	説明
`pcs status`	クラスタ状態の表示
`pcs resource`	リソースの管理
`pcs constraint`	制約の管理
`pcs stonith`	フェンシングの管理
`pcs property`	プロパティの設定
`pcs node`	ノードの管理
`crm_mon`	リアルタイム監視
`corosync-quorumtool`	クォーラム状態の確認
`drbdadm`	DRBD管理