Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vemetric/vemetric/llms.txt

Use this file to discover all available pages before exploring further.

Monitoring & Observability

Proper monitoring is essential for maintaining a healthy Vemetric deployment. This guide covers logging, metrics, queue monitoring, and troubleshooting.

Logging

Application Logs

Vemetric uses Pino for structured JSON logging across all services.
# View logs (Docker)
docker logs -f vemetric-app

# View logs (local)
bun dev
# Logs appear in terminal
Log levels:
  • trace: Detailed debug info
  • debug: Debug information
  • info: General information
  • warn: Warning messages
  • error: Error messages
  • fatal: Fatal errors

Log Format

Pino outputs structured JSON logs:
{
  "level": 30,
  "time": 1709582400000,
  "pid": 12345,
  "hostname": "app-server",
  "msg": "Event received",
  "projectId": "abc123",
  "eventName": "page_view",
  "userId": "user-123"
}

Pretty Printing (Development)

In development, logs are formatted with pino-pretty:
[2024-03-04 12:00:00.000] INFO: Event received
  projectId: "abc123"
  eventName: "page_view"
  userId: "user-123"

Log Aggregation (Production)

For production, ship logs to a centralized service:
Vemetric includes optional Axiom integration:
.env
AXIOM_DATASET=vemetric-logs
AXIOM_TOKEN=your-axiom-token
Install @axiomhq/pino (already included in dependencies).

Queue Monitoring

BullBoard UI

Vemetric includes Bull Board for real-time queue monitoring.
1

Access BullBoard

Navigate to:
http://localhost:4100
Login with credentials from .env:
  • Username: BULLBOARD_USERNAME (default: bullboard)
  • Password: BULLBOARD_PASSWORD (default: password)
2

Monitor Queues

BullBoard shows all queues:
  • event-queue: Event processing
  • session-queue: Session aggregation
  • user-queue: User updates
  • device-queue: Device tracking
  • email-queue: Email delivery
  • first-event-queue: First event handling
  • enrich-user-queue: User enrichment
  • merge-user-queue: User merging
3

View Job Details

For each queue, you can:
  • View active, waiting, completed, and failed jobs
  • Inspect job data and results
  • View error stack traces for failed jobs
  • Retry or delete individual jobs
  • Pause/resume queues
Secure BullBoard with strong credentials and restrict network access in production. It provides full access to job data and queue controls.

Queue Metrics

Monitor queue health with Redis CLI:
# Connect to Redis
docker exec -it vemetric-redis redis-cli

# Count jobs in event queue
> LLEN bull:event-queue:wait

# View queue stats
> HGETALL bull:event-queue:meta

# List all queue keys
> KEYS bull:*:wait

Failed Jobs

Failed jobs are automatically stored in PostgreSQL:
SELECT 
  id,
  queueName,
  createdAt,
  error,
  data
FROM failed_queue_job
ORDER BY createdAt DESC
LIMIT 10;
This helps debug persistent failures.

Database Monitoring

PostgreSQL

SELECT 
  count(*) as connections,
  state
FROM pg_stat_activity
WHERE datname = 'vemetric'
GROUP BY state;
SELECT 
  pg_size_pretty(pg_database_size('vemetric')) as size;
SELECT 
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
Enable query logging in postgresql.conf:
log_min_duration_statement = 1000  # Log queries > 1s
View slow queries:
SELECT 
  query,
  calls,
  total_exec_time,
  mean_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

ClickHouse

SELECT
  table,
  formatReadableSize(sum(bytes)) AS size,
  formatReadableQuantity(sum(rows)) AS rows,
  count() AS parts
FROM system.parts
WHERE database = 'vemetric' AND active
GROUP BY table
ORDER BY sum(bytes) DESC;
SELECT
  query,
  formatReadableSize(memory_usage) AS memory,
  elapsed AS duration,
  read_rows,
  formatReadableSize(read_bytes) AS read_size
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_date = today()
ORDER BY duration DESC
LIMIT 10;
SELECT
  table,
  elapsed,
  progress,
  formatReadableSize(total_size_bytes_compressed) AS size
FROM system.merges
WHERE database = 'vemetric';
SELECT
  name,
  path,
  formatReadableSize(free_space) AS free,
  formatReadableSize(total_space) AS total
FROM system.disks;

Redis

# Connect to Redis
docker exec -it vemetric-redis redis-cli

# Memory stats
> INFO memory

# Keyspace stats
> INFO keyspace

# Client connections
> CLIENT LIST

# Slow log
> SLOWLOG GET 10

Health Checks

Service Health Endpoints

curl http://localhost:4000/api/health
Response:
{
  "status": "ok",
  "timestamp": "2024-03-04T12:00:00.000Z"
}

Docker Health Checks

Add health checks to your Docker Compose:
docker-compose.yml
services:
  app:
    image: vemetric-app
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:4000/api/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
  
  hub:
    image: vemetric-hub
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:4004/health"]
      interval: 30s
      timeout: 10s
      retries: 3
  
  postgres:
    image: postgres:17-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
  
  clickhouse:
    image: clickhouse/clickhouse-server:23.10-alpine
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8123/ping"]
      interval: 30s
      timeout: 10s
      retries: 3
  
  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
Check health status:
docker-compose ps

Metrics & Dashboards

ClickHouse Metrics

ClickHouse exposes Prometheus metrics on port 9363:
curl http://localhost:9363/metrics

Prometheus + Grafana Setup

Add Prometheus and Grafana to your stack:
prometheus:
  image: prom/prometheus:latest
  volumes:
    - ./prometheus.yml:/etc/prometheus/prometheus.yml
    - prometheus_data:/prometheus
  ports:
    - "9090:9090"

grafana:
  image: grafana/grafana:latest
  volumes:
    - grafana_data:/var/lib/grafana
  ports:
    - "3000:3000"
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=admin

Key Metrics to Monitor

Event Ingestion Rate

Track events/second through Hub service. Set up alerts for drops or spikes.

Queue Depth

Monitor BullMQ queue sizes. High depth indicates worker saturation.

Database Query Time

Track P95/P99 query latency for PostgreSQL and ClickHouse.

Error Rate

Monitor 4xx/5xx error rates in App and Hub services.

Memory Usage

Track Redis memory and ClickHouse memory usage.

Disk Space

Monitor disk usage for PostgreSQL, ClickHouse, and Redis volumes.

Error Tracking

Sentry Integration

Vemetric includes optional Sentry integration for error tracking:
.env
SENTRY_DSN=https://your-sentry-dsn@sentry.io/project-id
Sentry is already integrated via @sentry/bun in:
  • App service
  • Hub service
  • Worker service
Errors and exceptions are automatically reported to Sentry.

Alerting

Set up alerts for critical conditions:
Alert when queue depth exceeds threshold:
prometheus-alerts.yml
groups:
  - name: vemetric
    rules:
      - alert: HighQueueDepth
        expr: bull_queue_waiting_jobs > 1000
        for: 5m
        annotations:
          summary: "High queue depth detected"
Alert when disk usage exceeds 80%:
- alert: LowDiskSpace
  expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.2
  for: 10m
  annotations:
    summary: "Low disk space on database server"
Alert when service health checks fail:
- alert: ServiceDown
  expr: up{job="vemetric-app"} == 0
  for: 2m
  annotations:
    summary: "Vemetric App service is down"

Troubleshooting

Common Issues

Symptoms: BullBoard shows thousands of waiting jobsCauses:
  • Worker service not running
  • Worker overwhelmed by job volume
  • Database connection issues
Solutions:
  1. Check worker logs: docker logs vemetric-worker
  2. Verify database connectivity
  3. Scale worker horizontally (run multiple instances)
  4. Increase worker concurrency in worker configuration
Symptoms: Events sent but not visible in analyticsDebugging:
  1. Check Hub logs for event receipt: docker logs vemetric-hub
  2. Verify project token is correct
  3. Check BullBoard for job processing
  4. Query ClickHouse directly:
    SELECT * FROM event ORDER BY createdAt DESC LIMIT 10;
    
  5. Check Worker logs for errors
Symptoms: Dashboard takes >5 seconds to loadSolutions:
  1. Check ClickHouse query performance:
    SELECT query, elapsed FROM system.query_log 
    WHERE type = 'QueryFinish' ORDER BY elapsed DESC LIMIT 5;
    
  2. Optimize slow queries with materialized views
  3. Reduce date range for large datasets
  4. Add indexes if needed
  5. Scale ClickHouse vertically (more CPU/RAM)
Symptoms: “too many clients” or “connection pool timeout” errorsSolutions:
  1. Add PgBouncer for PostgreSQL connection pooling
  2. Increase PostgreSQL max_connections:
    ALTER SYSTEM SET max_connections = 200;
    SELECT pg_reload_conf();
    
  3. Review app connection pool settings
  4. Check for connection leaks in application code
Symptoms: ClickHouse queries fail with memory errorsSolutions:
  1. Increase ClickHouse memory limit in config.xml
  2. Optimize queries to process less data
  3. Add LIMIT clauses to queries
  4. Use sampling for large datasets:
    SELECT ... FROM event SAMPLE 0.1
    
  5. Scale ClickHouse vertically

Debug Mode

Enable verbose logging:
.env
LOG_LEVEL=debug
This increases log verbosity across all services.

Performance Tuning

Worker Concurrency

Increase worker job concurrency:
const worker = new Worker('queue-name', processor, {
  concurrency: 10 // Process 10 jobs concurrently
});

Redis Maxmemory

Configure Redis memory limits:
maxmemory 2gb
maxmemory-policy allkeys-lru

ClickHouse Compression

Enable compression for better storage:
ALTER TABLE event MODIFY SETTING 
  storage_policy = 'default';

Database Indexes

Add indexes for frequent queries:
CREATE INDEX idx_project_created 
ON event(projectId, createdAt);

Next Steps

Configuration

Review environment variable configuration

Architecture

Understand Vemetric’s system architecture