What Is OpenAPS and Why Redundancy Matters

OpenAPS (Open Artificial Pancreas System) is a community-driven, open-source platform that enables people with diabetes to build a personalized automated insulin delivery system. It runs on small single-board computers—typically a Raspberry Pi or BeagleBone Black—reading continuous glucose monitor (CGM) data, running an algorithm, and wirelessly controlling a compatible insulin pump. While a single-device setup works well for many, relying on one board introduces a single point of failure. A crashed SD card, a power glitch, a network dropout, or a corrupted configuration file can halt insulin delivery without warning. For a life‑critical system, that risk is unacceptable.

Running multiple OpenAPS devices creates a redundant, fault‑tolerant rig that keeps your control loop running even when individual components fail. This guide walks through every aspect: choosing hardware, synchronizing data across devices, implementing automatic failover, testing your setup, and maintaining it over time. By the end, you’ll have a production‑ready approach to 24/7 reliable control.

The Core Need for Redundancy in a DIY Closed Loop

In any automated system, the weakest link determines overall reliability. For a DIY closed loop, that weak link is often the single board computer. Hardware can fail, software can crash, and network connectivity can drop. Without redundancy, a failure forces the user back to manual mode—open‑loop control—until the issue is resolved. That can mean hours or even days of less optimal glucose management. Redundancy provides:

  • Continuous operation: If the primary board dies, a secondary board takes over within seconds.
  • Maintenance windows: Update one device while the other handles therapy, without any interruption.
  • Peace of mind: Caregivers and users know there is a hot standby ready to act.

Redundancy is not just about failover—it also improves data integrity. With two devices logging the same events independently, you have a double‑checked record of insulin delivery and glucose readings. This can be invaluable for post‑hoc analysis and troubleshooting.

Benefits of a Multi‑Device OpenAPS Rig

Beyond simple failover, a multi-device setup offers several practical advantages:

  • Hardware resilience: Using two different board types (e.g., a Raspberry Pi 4 and a BeagleBone Black) protects against component‑specific failures. If a widespread SD card corruption issue affects one model, the other is likely unaffected.
  • Network diversity: Each device can connect to a different Wi‑Fi access point or use a wired Ethernet connection. If one network segment goes down, the other rig maintains connectivity to Nightscout and pump.
  • Load distribution: You can assign the primary device to handle the loop (CGM reading + algorithm + pump commands) while the secondary device acts as a dedicated uploader to Nightscout and a monitoring station. This reduces latency on the critical path.
  • Isolated troubleshooting: When an issue arises, you can switch to the backup and investigate the primary offline without stopping therapy.
  • Graceful degradation: Even if one device fails, the backup has access to the last synchronized data, so it can make informed decisions immediately.

Choosing Compatible Hardware

Supported Single‑Board Computers

The OpenAPS project officially supports several platforms. For a redundant setup, choose two boards that are either identical or sufficiently similar in capabilities. Popular choices include:

  • Raspberry Pi 3B+/4B/5 – Widely available, excellent community support, and many guides exist. Use a high‑endurance SD card (e.g., Samsung Pro Endurance) or boot from a USB‑attached SSD to minimize corruption risk.
  • BeagleBone Black (or BeagleBone Green) – Includes built‑in eMMC flash storage, making it less susceptible to SD card issues. It also has better real‑time capabilities for direct pump communication.
  • Intel NUC or ODROID – More powerful, but community support is thinner. They can be useful if you need extra processing for advanced features like machine learning.

For a multi‑device system, using two of the same board simplifies configuration and failover scripts. However, mixing a Pi and a BeagleBone is also feasible if you keep the algorithm and software versions identical.

Power and Connectivity

Reliable power is critical. Each device should be powered by a dedicated, regulated 5V power supply that can handle the board’s peak draw (2.5A for a Pi 4, 1A for a BeagleBone). Consider a small UPS (uninterruptible power supply) or a high‑capacity battery pack that can keep both boards running through short power outages. Additionally:

  • Use wired Ethernet for at least one device to reduce latency and avoid Wi‑Fi interference.
  • If using Wi‑Fi, set each device to a different channel or connect to different access points to avoid both losing connectivity simultaneously.
  • Assign static IP addresses or use DHCP reservations to ensure the devices always have the same addresses for heartbeat monitoring.

Setting Up Multiple Devices

Initial Installation

Follow the official OpenAPS documentation for each board. Steps to tailor for multi‑device:

  • Unique hostnames: Name one device rig-primary and the other rig-backup (or similar) to avoid network conflicts and make logs clear.
  • Same software version: Install the same OpenAPS release (e.g., oref0 version 0.7.x) on both devices. Mismatched versions can lead to different algorithm behavior during failover.
  • Separate configuration files: Each device needs its own pump.ini, cgm.ini, and preferences.json. Do not symlink them across devices—keep copies to avoid accidental overwrites.
  • Version‑controlled settings: Store configuration files in a private Git repository (e.g., on GitHub or a personal server). This tracks changes and allows you to quickly sync settings between devices.

Synchronizing Data Across Devices

For seamless failover, both devices must share real‑time data. The goal: the standby device always knows the last glucose value, recent insulin delivery history, and pump status. Several methods exist, each with trade‑offs:

  • Nightscout (cloud‑based): Both devices upload to the same Nightscout site. The backup can fetch the last few hours of data via the Nightscout API on startup and then poll periodically. This works well but requires internet access and introduces latency.
  • Local MQTT broker (recommended): Run a Mosquitto MQTT server on a Raspberry Pi Zero or on your home network. Each OpenAPS device publishes topics (e.g., rig/glucose, rig/enacted) and subscribes to those topics from the other device. MQTT is fast, lightweight, and works offline.
  • Shared filesystem (NFS/SMB): Mount a network share where both devices write status files. Use atomic writes (write to a temp file, then rename) to avoid partial reads. This method can be slower and may suffer from file locking issues.
  • Database replication (advanced): Set up InfluxDB on a separate server. Both rigs write measurements to the same databases. The backup can query InfluxDB for the latest data. This offers robust historical data but adds complexity.

Practical recommendation: Start with Nightscout synchronization because it’s already part of most OpenAPS configurations. Then add MQTT for low‑latency local data sharing. Test both to ensure the backup has data less than 30 seconds old.

Implementing Automatic Failover

Heartbeat Monitoring

Failover relies on a heartbeat: each device periodically sends a "I am alive" signal. Simple implementation:

  • Create a lightweight HTTP endpoint on each device (e.g., http://rig-primary:8080/heartbeat) that returns a timestamp and status.
  • Alternatively, use MQTT with a reserved topic: heartbeat/primary and heartbeat/backup. Have each rig publish a JSON message every 15–30 seconds.
  • On the backup device, run a script that checks for the primary’s heartbeat. If three consecutive heartbeats are missed (e.g., 90 seconds without a signal), the backup declares the primary dead.

Automatic Switchover

When the backup decides the primary is unreachable, it must assume control without user intervention. Steps for a reliable switch:

  1. Claim control: Write a flag to a shared location (e.g., an MQTT topic or a file on the network share) indicating that the backup is now active. The primary, when it recovers, will read this flag and remain in standby.
  2. Take over pump communication: If both devices share a physical pump (e.g., via Bluetooth or serial), ensure only one device has the pump’s connection. On switchover, the backup should release any previous pump connection and establish a new one.
  3. Continue the loop: The backup uses the most recent synchronized data to restart the OpenAPS loop. It should log the event to Nightscout as a treatment note (e.g., "Failover: backup now active").
  4. Send alert: Use Pushover, email, SMS, or a local buzzer to notify the user that failover occurred.

Preventing Flapping

Flapping occurs when the primary recovers and immediately retakes control, causing oscillations. Use a dead‑time timer: once the backup takes over, it should not relinquish control for at least 5–10 minutes, even if the primary’s heartbeat reappears. After that period, the backup can demote itself if the primary is stable. A better approach: require manual confirmation to switch back. The user can then choose to restore the primary after verifying it is fully healthy.

Manual Override

Despite automation, always provide a manual override. Implement a simple web interface (e.g., using Node‑Red or a Flask web app) with a button to designate which device is active. Also, a physical switch that cuts power to one device can serve as a last resort. Document the manual procedure clearly and test it regularly.

Testing and Validation

Your failover system is only as good as its testing. Create a schedule to simulate failures every 2–4 weeks. Use a checklist:

  • Hardware failure simulation: Remove power from the primary, then pull its SD card while powered (simulate card corruption). Verify the backup activates within the expected timeout (typically 60–90 seconds).
  • Network failure simulation: Disconnect the primary’s network cable. Confirm the backup can still access Nightscout and communicate with the pump.
  • Data sync test: Intentionally stop data flow to the backup (e.g., by pausing its MQTT subscription). Re‑enable it and verify the backup catches up with the last 15 minutes of data before assuming control.
  • Recovery test: Bring the primary back online. Ensure it recognizes the backup is active and remains in standby mode without triggering another switch.
  • Alert test: Verify that the failover alert reaches you (push notification, email, or whatever method you use) within 30 seconds of the switch.

Record each test in a log or in Nightscout notes. If you have a caregiver, involve them in the tests so they know what to expect.

Maintenance and Updates

Keeping both devices on the same software version is essential for predictable behavior. Follow these practices:

  • Staged updates: Always update one device first, leaving the other running. After 24–48 hours of stable operation, update the second. If an issue arises, you can revert the updated device without disrupting therapy.
  • Use a staging board: Maintain a third “spare” board that you update first. After two weeks of stability, apply changes to both production rigs.
  • Automate updates with caution: A cron job can check for new OpenAPS releases nightly, but only download and apply if a human‑approved flag is set. Never auto‑update a life‑critical system.
  • Monitor logs: Set up log rotation and forward critical errors to your phone via systemd journal or rsyslog. Tools like logwatch can send daily summaries.

Securing Your Multi‑Device Setup

More devices mean more attack surfaces. Implement these security measures:

  • Strong authentication: Use unique, long passwords for SSH and web interfaces. Disable password authentication entirely and rely on SSH keys with passphrases.
  • Network segmentation: Put both rigs on an isolated VLAN if your router supports it. They should only have access to Nightscout, update servers, and the MQTT broker. Block all inbound connections from the internet.
  • Encryption: Enforce HTTPS for all Nightscout communication. Use MQTT over TLS (port 8883). Generate self‑signed certificates or use Let’s Encrypt if your broker is exposed (not recommended).
  • Physical security: Mount the boards in a padded case or enclosure. Ensure SD cards or SSDs are not easily dislodged. Use strain relief for power and network cables.
  • Regular audits: Review open ports with netstat or ss. Disable any services that are not strictly necessary (e.g., Bluetooth, Wi‑Fi if using Ethernet).

Community Resources and Support

You are not building this alone. The OpenAPS community is active, knowledgeable, and welcoming. Key resources:

When asking for help, provide your hardware details, software versions, failover method, and any error logs. The community values detailed, respectful questions.

Conclusion

Running OpenAPS with multiple devices transforms a good automated insulin delivery system into a truly robust one. By selecting compatible hardware, synchronizing data effectively, implementing heartbeat‑based failover, and testing rigorously, you create a setup that can weather hardware failures, network glitches, and human errors. The initial effort—choosing boards, writing failover scripts, and testing thoroughly—pays off with continuous, reliable control day and night.

Start small: get two identical boards, set up Nightscout sync, and build a simple heartbeat script that sends you a notification when the primary goes offline. Then gradually add automatic failover. The OpenAPS community has paved the way with years of real‑world experience. Now you can build on their shoulders for a safer, more resilient system that you can trust with your life.