Code
Use Git or checkout with SVN using the web URL.
No matching headings.
ZephyrFS Coordinator
The coordination server for ZephyrFS distributed storage network, written in Go.
Overview
The ZephyrFS Coordinator is a centralized service that manages:
- Node Discovery & Registration: Track active storage nodes in the network
- File & Chunk Metadata: Coordinate file registration and chunk placement
- Network Health: Monitor node health and network statistics
- Replication Management: Ensure proper chunk replication across nodes
Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ ZephyrFS Node │────│ Coordinator │────│ ZephyrFS Node │
│ │ │ │ │ │
│ • Register │ │ • Node Registry │ │ • Register │
│ • Heartbeat │ │ • Chunk Tracker │ │ • Heartbeat │
│ • Report Stats │ │ • Health Monitor│ │ • Report Stats │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└───── File Storage ────┼───── File Storage ────┘
│
┌─────────────────┐
│ Web Client │
│ • File Upload │
│ • Download │
│ • Management │
└─────────────────┘
Features
Core Functionality
- Node Management: Registration, heartbeat processing, health tracking
- File Coordination: Metadata storage, chunk placement optimization
- Network Monitoring: Real-time statistics and health metrics
- High Availability: Support for multiple coordinator instances
APIs
- gRPC API: High-performance binary protocol for node communication
- REST API: HTTP/JSON interface for web clients and management
- Health Endpoints: Kubernetes-compatible health checks
Storage Options
- BBolt: Embedded key-value database (default)
- PostgreSQL: Production-ready relational database
Monitoring
- Prometheus Metrics: Built-in metrics collection
- Health Checks: Liveness, readiness, and detailed health status
- Performance Tracking: Request times, error rates, resource usage
Quick Start
Prerequisites
- Go 1.21+ for building from source
- Docker for containerized deployment
- PostgreSQL (optional, for production)
Development
# Clone repository
git clone https://github.com/ZephyrFS/zephyrfs-coordinator
cd zephyrfs-coordinator
# Install dependencies
go mod download
# Run with default configuration
go run cmd/coordinator/main.go
# Or with custom config
go run cmd/coordinator/main.go -config config.yaml
Docker Deployment
# Build image
docker build -t zephyrfs/coordinator .
# Run with default settings
docker run -p 8080:8080 -p 8090:8090 -p 8091:8091 zephyrfs/coordinator
# Run with custom configuration
docker run -v ./config.yaml:/config/config.yaml \
-v ./data:/data \
-p 8080:8080 -p 8090:8090 -p 8091:8091 \
zephyrfs/coordinator
Docker Compose
version: '3.8'
services:
coordinator:
image: zephyrfs/coordinator:latest
ports:
- "8080:8080" # gRPC
- "8090:8090" # HTTP API
- "8091:8091" # Metrics
volumes:
- ./data:/data
- ./config.yaml:/config/config.yaml
environment:
- LOG_LEVEL=info
healthcheck:
test: ["CMD", "wget", "--spider", "http://localhost:8091/health"]
interval: 30s
timeout: 10s
retries: 3
Configuration
Basic Configuration
# config.yaml
database:
type: "bbolt"
path: "./coordinator.db"
grpc:
port: 8080
http:
enabled: true
port: 8090
coordinator:
replication_factor: 3
node_timeout: "30s"
heartbeat_interval: "10s"
health:
metrics_enabled: true
metrics_port: 8091
Environment Variables
| Variable | Description | Default |
|---|---|---|
CONFIG_PATH |
Path to configuration file | config.yaml |
LOG_LEVEL |
Logging level (debug/info/warn/error) | info |
DATA_PATH |
Data directory path | ./data |
DATABASE_URL |
PostgreSQL connection URL | - |
GRPC_PORT |
gRPC server port | 8080 |
HTTP_PORT |
HTTP API server port | 8090 |
METRICS_PORT |
Metrics server port | 8091 |
Production Configuration
database:
type: "postgres"
url: "${DATABASE_URL}"
grpc:
port: 8080
max_message_size: 16777216 # 16MB
coordinator:
replication_factor: 5
cleanup_interval: "10m"
node_inactive_after: "120s"
health:
check_interval: "60s"
metrics_enabled: true
API Reference
gRPC API
Node Management:
service CoordinatorService {
rpc RegisterNode(RegisterNodeRequest) returns (RegisterNodeResponse);
rpc UnregisterNode(UnregisterNodeRequest) returns (UnregisterNodeResponse);
rpc NodeHeartbeat(NodeHeartbeatRequest) returns (NodeHeartbeatResponse);
rpc GetActiveNodes(GetActiveNodesRequest) returns (GetActiveNodesResponse);
}
File & Chunk Management:
rpc RegisterFile(RegisterFileRequest) returns (RegisterFileResponse);
rpc GetFileInfo(GetFileInfoRequest) returns (GetFileInfoResponse);
rpc FindChunkLocations(FindChunkLocationsRequest) returns (FindChunkLocationsResponse);
rpc UpdateChunkLocations(UpdateChunkLocationsRequest) returns (UpdateChunkLocationsResponse);
REST API
Node Management:
POST /api/v1/nodes/register- Register a new nodeGET /api/v1/nodes/active- Get active nodesPOST /api/v1/nodes/{id}/heartbeat- Send heartbeatPOST /api/v1/nodes/{id}/unregister- Unregister node
File Management:
POST /api/v1/files/register- Register a fileGET /api/v1/files/{id}- Get file informationDELETE /api/v1/files/{id}- Delete file
Network Status:
GET /api/v1/network/status- Get network statusGET /api/v1/network/stats- Get network statistics
Health & Monitoring:
GET /health- Health checkGET /ready- Readiness checkGET /live- Liveness checkGET /metrics- Prometheus metrics
Example Usage
Register a Node (REST):
curl -X POST http://localhost:8090/api/v1/nodes/register \
-H "Content-Type: application/json" \
-d '{
"addresses": ["127.0.0.1:8080"],
"storage_capacity": 1000000000,
"capabilities": {"version": "1.0.0"}
}'
Get Network Status:
curl http://localhost:8090/api/v1/network/status
Health Check:
curl http://localhost:8091/health
Monitoring
Metrics
The coordinator exposes Prometheus-compatible metrics at /metrics:
# HELP coordinator_nodes_total Total number of registered nodes
# TYPE coordinator_nodes_total gauge
coordinator_nodes_total{status="active"} 5
coordinator_nodes_total{status="inactive"} 1
# HELP coordinator_files_total Total number of registered files
# TYPE coordinator_files_total gauge
coordinator_files_total 150
# HELP coordinator_chunks_total Total number of tracked chunks
# TYPE coordinator_chunks_total gauge
coordinator_chunks_total 1500
Health Checks
Kubernetes Liveness Probe:
livenessProbe:
httpGet:
path: /live
port: 8091
initialDelaySeconds: 30
periodSeconds: 10
Kubernetes Readiness Probe:
readinessProbe:
httpGet:
path: /ready
port: 8091
initialDelaySeconds: 5
periodSeconds: 5
Logging
Structured JSON logging with configurable levels:
{
"level": "info",
"time": "2024-01-15T10:30:45Z",
"msg": "Node registered",
"nodeID": "node-123",
"addresses": ["127.0.0.1:8080"],
"capacity": 1000000000
}
Development
Building
# Build binary
go build -o coordinator cmd/coordinator/main.go
# Build Docker image
docker build -t zephyrfs/coordinator .
# Run tests
go test ./...
# Run with race detection
go test -race ./...
# Generate protobuf code
make proto
Testing
# Unit tests
go test ./internal/...
# Integration tests
go test -tags=integration ./...
# Benchmark tests
go test -bench=. ./internal/coordinator/
# Coverage report
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out
Contributing
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature - Write tests for your changes
- Run tests:
go test ./... - Commit changes:
git commit -m "Add amazing feature" - Push branch:
git push origin feature/amazing-feature - Create Pull Request
Deployment
Production Checklist
- Configure PostgreSQL database
- Set up TLS certificates
- Configure monitoring and alerting
- Set resource limits and requests
- Configure backup strategy
- Set up log aggregation
- Configure service discovery
- Set up load balancing (for multiple instances)
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: zephyrfs-coordinator
spec:
replicas: 2
selector:
matchLabels:
app: zephyrfs-coordinator
template:
metadata:
labels:
app: zephyrfs-coordinator
spec:
containers:
- name: coordinator
image: zephyrfs/coordinator:latest
ports:
- containerPort: 8080
name: grpc
- containerPort: 8090
name: http
- containerPort: 8091
name: metrics
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: coordinator-secrets
key: database-url
livenessProbe:
httpGet:
path: /live
port: 8091
readinessProbe:
httpGet:
path: /ready
port: 8091
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Troubleshooting
Common Issues
Database Connection Failed:
Error: failed to open database: connection refused
- Check database configuration
- Verify database server is running
- Check network connectivity
High Memory Usage:
Warning: memory usage above 80%
- Monitor node count and file metadata
- Consider increasing memory limits
- Check for memory leaks in logs
Slow Response Times:
Warning: API response time > 1s
- Check database performance
- Monitor active connections
- Consider database indexing
Debug Mode
Enable debug logging for troubleshooting:
./coordinator -log-level debug
Or set environment variable:
export LOG_LEVEL=debug
./coordinator
Performance Tuning
Database Optimization:
- Use PostgreSQL for production workloads
- Configure appropriate connection pooling
- Add database indexes for frequently queried fields
Resource Limits:
- Set appropriate memory limits based on node count
- Monitor CPU usage during peak operations
- Configure garbage collection settings
License
MIT License - see LICENSE file for details.
Support
- Documentation: ZephyrFS Docs
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Security: security@zephyrfs.io