Health Module

📖 5 min read

The Health module provides comprehensive monitoring and diagnostics for the Datum system's operational status.

Overview#

Health monitoring is crucial for maintaining reliable data synchronization. The Health module tracks the status of all adapters, managers, and sync operations to ensure the system is functioning properly.

Key Components#

DatumHealth#

Represents the health status of a Datum component.

Properties:

status: Current health status (healthy, degraded, unhealthy)
message: Human-readable status description
timestamp: When the health check was performed
details: Additional diagnostic information (Map<String, dynamic>)

HealthStatus Enum#

Defines the possible health states:

healthy: Component is functioning normally
degraded: Component has issues but can still operate
unhealthy: Component is not functioning properly

Health Monitoring#

Manager Health#

Each DatumManager provides health monitoring:

// Check health of a specific manager
final taskHealth = await Datum.manager<Task>().checkHealth();

// Get current health status
final currentHealth = Datum.manager<Task>().currentStatus.health;

// Watch health changes reactively
final healthStream = Datum.manager<Task>().health;
healthStream.listen((health) {
  switch (health.status) {
    case HealthStatus.healthy:
      print('Tasks manager is healthy');
      break;
    case HealthStatus.degraded:
      print('Tasks manager is degraded: ${health.message}');
      break;
    case HealthStatus.unhealthy:
      print('Tasks manager is unhealthy: ${health.message}');
      break;
  }
});

Global Health Monitoring#

Monitor health across all managers:

// Get health status of all managers
final allHealths = await Datum.instance.allHealths.first;

allHealths.forEach((entityType, health) {
  print('${entityType}: ${health.status} - ${health.message}');
});

// Watch global health changes
Datum.instance.allHealths.listen((healthMap) {
  final unhealthyCount = healthMap.values
      .where((health) => health.status == HealthStatus.unhealthy)
      .length;

  if (unhealthyCount > 0) {
    print('Warning: $unhealthyCount managers are unhealthy');
  }
});

Health Checks#

Automatic Health Checks#

Health checks run automatically during:

Manager initialization
Sync operations
Periodic intervals (configurable)
Manual health check requests

Manual Health Checks#

Trigger health checks manually:

// Check health of all managers
final results = await Future.wait([
  Datum.manager<Task>().checkHealth(),
  Datum.manager<User>().checkHealth(),
  Datum.manager<Post>().checkHealth(),
]);

// Check if any manager is unhealthy
final hasUnhealthy = results.any((health) => health.status == HealthStatus.unhealthy);

Adapter Health#

Adapters implement their own health checks:

// Local adapter health
final localHealth = await taskManager.localAdapter.checkHealth();

// Remote adapter health
final remoteHealth = await taskManager.remoteAdapter.checkHealth();

Health Diagnostics#

Health Details#

Health checks provide detailed diagnostic information:

final health = await Datum.manager<Task>().checkHealth();

print('Status: ${health.status}');
print('Message: ${health.message}');
print('Timestamp: ${health.timestamp}');

// Access detailed diagnostics
final details = health.details;
if (details != null) {
  print('Connection status: ${details['connection']}');
  print('Last sync: ${details['lastSyncTime']}');
  print('Pending operations: ${details['pendingCount']}');
  print('Storage size: ${details['storageSize']} bytes');
}

Common Health Issues#

Local Adapter Issues:

Database connection failures
Storage quota exceeded
File system permissions
Corruption detection

Remote Adapter Issues:

Network connectivity problems
Authentication failures
API rate limiting
Service unavailability

Sync Issues:

Long-running sync operations
High conflict rates
Large pending operation queues
Memory pressure

Health-Based Actions#

Automatic Recovery#

Configure automatic recovery actions:

final config = DatumConfig(
  // Automatic recovery settings
  errorRecoveryStrategy: DatumErrorRecoveryStrategy(
    maxRetries: 3,
    backoffStrategy: ExponentialBackoffStrategy(),
  ),

  // Health check intervals
  healthCheckInterval: Duration(minutes: 5),
);

Manual Recovery#

Implement manual recovery logic:

Future<void> recoverFromHealthIssues() async {
  final allHealths = await Datum.instance.allHealths.first;

  for (final entry in allHealths.entries) {
    final entityType = entry.key;
    final health = entry.value;

    if (health.status == HealthStatus.unhealthy) {
      print('Attempting to recover ${entityType}...');

      // Try to reinitialize the manager
      try {
        final manager = Datum.managerByType(entityType);
        await manager.dispose();
        // Reinitialize logic here
        print('Recovered ${entityType}');
      } catch (e) {
        print('Failed to recover ${entityType}: $e');
      }
    }
  }
}

Health Metrics#

Performance Metrics#

Track performance-related health metrics:

// Get detailed health with performance metrics
final health = await Datum.manager<Task>().checkHealth();

final details = health.details;
if (details != null) {
  final avgSyncTime = details['averageSyncDuration'];
  final syncSuccessRate = details['syncSuccessRate'];
  final storageUtilization = details['storageUtilizationPercent'];

  print('Avg sync time: ${avgSyncTime}ms');
  print('Success rate: ${(syncSuccessRate * 100).round()}%');
  print('Storage usage: ${storageUtilization}%');
}

Trend Analysis#

Monitor health trends over time:

class HealthMonitor {
  final List<DatumHealth> _healthHistory = [];

  void recordHealth(DatumHealth health) {
    _healthHistory.add(health);

    // Keep only recent history
    if (_healthHistory.length > 100) {
      _healthHistory.removeAt(0);
    }

    // Analyze trends
    final recentHealth = _healthHistory.sublist(_healthHistory.length - 10);
    final unhealthyCount = recentHealth
        .where((h) => h.status == HealthStatus.unhealthy)
        .length;

    if (unhealthyCount > 5) {
      print('Warning: Health deteriorating');
    }
  }
}

Health Alerts#

Alert Configuration#

Set up health-based alerts:

class HealthAlertSystem {
  void setupAlerts() {
    // Monitor all managers
    Datum.instance.allHealths.listen((healthMap) {
      for (final entry in healthMap.entries) {
        final entityType = entry.key;
        final health = entry.value;

        if (health.status == HealthStatus.unhealthy) {
          sendAlert(
            title: '${entityType} Manager Unhealthy',
            message: health.message,
            details: health.details,
          );
        }
      }
    });
  }

  void sendAlert({
    required String title,
    required String message,
    Map<String, dynamic>? details,
  }) {
    // Send alert via email, Slack, etc.
    print('ALERT: $title - $message');
  }
}

Alert Types#

Critical Alerts:

Complete system failure
Data corruption detected
Authentication failures

Warning Alerts:

Degraded performance
High error rates
Storage capacity warnings

Info Alerts:

Recovery actions taken
Configuration changes
Maintenance notifications

Best Practices#

Health Check Design#

Make health checks fast: Keep checks lightweight to avoid impacting performance
Provide actionable information: Include specific details for troubleshooting
Use appropriate timeouts: Don't let health checks hang indefinitely
Check dependencies: Verify all required services are accessible

Monitoring Strategy#

Monitor continuously: Set up ongoing health monitoring
Alert on degradation: Catch issues before they become critical
Automate recovery: Implement automatic recovery where possible
Log health changes: Maintain history for trend analysis

Alert Management#

Avoid alert fatigue: Only alert on actionable issues
Escalate appropriately: Different severity levels for different issues
Include context: Provide enough information to diagnose issues
Test alerts: Ensure alerts work and reach the right people

Performance Impact#

Minimize overhead: Health checks should not significantly impact performance
Use sampling: For high-frequency metrics, consider sampling
Cache results: Cache health check results when appropriate
Async checks: Run health checks asynchronously to avoid blocking

Troubleshooting#

Common Health Issues#

Database Connection Issues:

// Check local adapter health
final localHealth = await manager.localAdapter.checkHealth();
if (localHealth.status == HealthStatus.unhealthy) {
  // Try to reconnect or reinitialize
  await manager.dispose();
  await manager.initialize();
}

Network Connectivity Issues:

// Check remote adapter health
final remoteHealth = await manager.remoteAdapter.checkHealth();
if (remoteHealth.status == HealthStatus.unhealthy) {
  // Wait for connectivity to recover
  await connectivityChecker.onStatusChange
      .where((connected) => connected)
      .first;
}

Sync Performance Issues:

// Check for large pending queues
final pendingCount = await manager.getPendingCount('user-id');
if (pendingCount > 1000) {
  print('Warning: Large pending queue may cause performance issues');
}

Health Check Debugging#

// Enable detailed logging
final config = DatumConfig(
  enableLogging: true,
  // ... other config
);

// Manually run health checks with timing
final stopwatch = Stopwatch()..start();
final health = await manager.checkHealth();
stopwatch.stop();

print('Health check took ${stopwatch.elapsedMilliseconds}ms');
print('Health details: ${health.details}');
```</content>

v1.0.2