Redoing My Progress WhatsUp Gold Home Lab with Proxmox: A Journey of Failover, Backup and Recovery

Introduction

Greetings, tech enthusiasts! I hope you’re all doing well. Today, I’m thrilled to share the story of my recent adventure in rearchitecting my home lab with Proxmox. This journey has been a rollercoaster of unexpected challenges, valuable lessons, and rewarding successes. I built a resilient and efficient setup that exceeded my initial expectations by leveraging modern virtualization and storage technologies.

Setting Up the Lab

In my quest to elevate my home lab, I chose Proxmox for its powerful high availability (HA), failover and backup capabilities. To support this, I configured ZFS with replication across local arrays on each server and integrated NFS as an additional storage layer. My initial plan was to test failover scenarios manually, but fate had other ideas, leading to an unplanned but highly educational trial of Proxmox’s resilience.

The Unexpected Failover Test

Working with older, repurposed hardware (HPe DL360 Gen9) presented its challenges. A flaky port on my 10GBe network card led to instability in the Linux bond, triggering a real-world failover test. Proxmox responded impressively, automatically migrating the affected VMs to the other host in just three seconds. The VMs rebooted, and my WhatsUp Gold services were back online in under 40 seconds.

One fascinating observation was how ZFS with replication became available before NFS storage, likely due to the difference in network speeds (10GBe vs. 2GBe). Watching my environment self-heal and continue operating seamlessly was a testament to Proxmox’s robust design.

Leveraging GenAI for Setup and Management

To streamline the setup process, I leaned on GenAI for guidance in configuring the cluster, tailoring it to my hardware, and establishing HA and replication. Additionally, I developed a PowerShell script that utilized the Proxmox API to fetch detailed information about all hosts and guests. This data was then integrated into the WhatsUp Gold solution for monitoring. Furthermore, I began creating Proxmox device templates for WhatsUp Gold to monitor critical services on Proxmox hosts via SNMP.

Validating High Availability

One of the significant milestones during this journey was successfully verifying Proxmox’s HA functionality. Pinpointing the root cause of kernel panics, validating log messages, and tracking precise timestamps were instrumental. GenAI was pivotal in troubleshooting and identifying specific log messages, allowing me to prove my HA setup’s effectiveness confidently. This achievement underscored the robustness and reliability of my new architecture.

The NFS Storage Failure: A Harsh Wake-Up Call

Early one Friday morning, I received an alert from Progress WhatsUp Gold 360 indicating that my connector was unreachable. I realized that my production website, https://wug.ninja, was offline, too. Upon further investigation, I discovered that all VMs hosted on NFS storage, including my primary WhatsUp Gold server, were inaccessible. Thankfully, WhatsUp Gold 360 internet connection monitoring alerted me to the problem despite most of my inaccessible infrastructure.

The NFS system was completely frozen/halted, so I could not ping it on the network or log in via SSH, HTTPS, or other means. The storage system was completely unresponsive. This incident served as a harsh reminder of the importance of robust backup and recovery plans.

Recovery from Backups: A Test of Resilience

With no other option, I initiated a forced reboot of the NFS storage system, fully aware of the lengthy recovery time. I restored backups from the USB disk attached to the storage system to minimize downtime to my production systems. Here, GenAI was a game-changer, guiding me through the step-by-step recovery process.

For Linux VMs, I recreated new virtual machines via the command line, copied files using WinSCP, and converted them using qemu-img. This process worked seamlessly for Alma Linux and Ubuntu servers. However, my Windows VMs presented a unique challenge—they refused to boot. Troubleshooting with recovery ISOs yielded no progress, forcing me to explore alternative solutions.

This led me to discover guestmount, a tool that allows mounting virtual disks for file exploration. Using guestmount, I mounted my Windows NTFS partitions on one of the Proxmox hosts, connected with WinSCP, and download the raw database files (MDF and LDF), and dropped the files to the ZFS WhatsUp Gold test server that stayed up the whole time. While unconventional, I was curious to see if it would work and how I might make it work. Normal recoveries should include using the .bak file generated by SQL Server.

Lessons Learned and Future Plans

This experience offered invaluable lessons, particularly regarding the importance of redundancy, failover, and backup planning. The VMs on ZFS remained unaffected throughout these challenges, highlighting their reliability. With this knowledge, I rearchitected my setup to prioritize ZFS with replication for all production VMs. Once the NFS system was stable, I migrated the VMs and repurposed it for scheduled backups. I deleted my recovered VMs and used my migrated ones so I did not lose as much data, as my backup on the external device was only weekly.

Multiple backup jobs are configured from within Proxmox to enhance my backup plan further. One of them copies for internal ZFS storage on a mirrored disk set. The other copies of the NFS system keep one weekly and three daily backups. I made a WinSCP script to copy backups from the NFS system to my Windows PC and sync them to OneDrive. This is a multilayered backup strategy (on three sets of disks and the cloud) that protects against hardware failures and unforeseen incidents.

Conclusion

This journey has been a blend of challenges, learning, and triumphs. Moving to Proxmox has transformed my home lab into a resilient and efficient environment equipped to handle real-world scenarios. I’m grateful for the insights gained and excited to continue refining my setup.

Stay tuned for more updates as I explore new ways to optimize and innovate in my home lab. Thank you for joining me on this journey—may your own tech adventures be just as rewarding!

Tags

Get Started with WhatsUp Gold

Subscribe to our mailing list

Get our latest blog posts delivered in a monthly email.

Loading animation

Comments

Comments are disabled in preview mode.