diff --git a/docs/node-restore.md b/docs/node-restore.md new file mode 100644 index 0000000..8ea6a03 --- /dev/null +++ b/docs/node-restore.md @@ -0,0 +1,153 @@ +# Proxmox Node Restore + +Restores a Proxmox node exactly as it previously existed, allowing it to rejoin the cluster without using `pvecm delnode`. +It works by reconstructing the node’s identity vectors so Corosync and pmxcfs accept it as the same entity. + +This method is **not guaranteed** and depends on perfect identity matching. +If any identity vector differs (hostname, node ID, ring IP, certificates), the node may: + +- fail to mount pmxcfs +- fail to join Corosync +- appear as a ghost node +- destabilize the cluster + +Use this approach only when you fully understand the identity requirements and the cluster +is otherwise healthy. + +- [1. Hostname Identity](#1-hostname-identity) +- [2. Local Name Resolution](#2-local-name-resolution) +- [3. System Users and Groups](#3-system-users-and-groups) +- [4. Network Topology](#4-network-topology) +- [5. VFIO Bindings (If Using Passthrough)](#5-vfio-bindings-if-using-passthrough) +- [6. Kernel Flags and Module Loading (If Restoring GRUB)](#6-kernel-flags-and-module-loading-if-restoring-grub) +- [7. Disk Configuration](#7-disk-configuration) +- [8. SSH Identity](#8-ssh-identity) +- [9. Corosync Identity](#9-corosync-identity) + - [single node custer](#single-node-custer) + - [Multi node cluster](#multi-node-cluster) +- [10. Node‑Specific Artifacts](#10-nodespecific-artifacts) +- [11. Finalize](#11-finalize) + +## 1. Hostname Identity + +Preserve the node’s cluster identity. + +- Restore `/etc/hostname` + +--- + +## 2. Local Name Resolution + +Ensure the node can resolve itself and its peers. + +- Restore `/etc/hosts` + +## 3. System Users and Groups + +Restore OS‑level identity and UID/GID mapping. + +- Restore `/etc/passwd` +- Restore `/etc/group` + +--- + +## 4. Network Topology + +Restore bridges, VLANs, MTU, and IP assignments. + +- Restore `/etc/network/interfaces` +- Restore `/etc/network/interfaces.d/` + +--- + +## 5. VFIO Bindings (If Using Passthrough) + +Restore PCI passthrough behavior. + +- Restore `/etc/modprobe.d/` + +--- + +## 6. Kernel Flags and Module Loading (If Restoring GRUB) + +Restore passthrough‑related kernel parameters. + +- Restore `/etc/default/grub` +- Restore `/etc/modules` +- Restore `/etc/modules.d/` + +--- + +## 7. Disk Configuration + +Restore storage layout and ZFS pools. + +- Restore `/etc/fstab` +- Import ZFS pools (example: `zfs import fastcore`) + +--- + +## 8. SSH Identity + +Restore admin access and host identity. + +- Restore `/root/.ssh/*` +- Restore `/etc/ssh/*` + +--- + +## 9. Corosync Identity + +### single node custer + +In some cases, a single Proxmox node can be resurrected simply by restoring the entire +`/etc/pve` directory from backup. This works because `/etc/pve` (pmxcfs) contains: + +- Corosync identity (node ID, ring IPs, cluster membership) +- Cluster certificates +- Node‑specific configuration +- Storage definitions +- VM/CT configuration metadata + +If the restored node has **the same hostname, same IPs, same Corosync ring addresses, and +the cluster still contains the old node entry**, then copying the full `/etc/pve` state +can allow the node to rejoin the cluster as if nothing happened. + +### Multi node cluster + +Reconstruct the node’s cluster identity (Rejoin Without Delnode). + +- Restore `/etc/pve/corosync.conf` +- Restore `/etc/pve/corosync.pub` +- Restore `/etc/corosync/authkey` + +These define: + +- cluster topology +- node ID +- ring IPs +- shared authentication + +--- + +## 10. Node‑Specific Artifacts + +Restore semantic overlays and local scripts. + +- Restore `/home/` +- Restore `/root/` +- Restore `/opt/` + +--- + +## 11. Finalize + +If GRUB was restored: + +- Run `update-grub` + +Then: + +- Reboot + +---