Documentation Index
Fetch the complete documentation index at: https://mintlify.com/vemetric/vemetric/llms.txt
Use this file to discover all available pages before exploring further.
Monitoring & Observability
Proper monitoring is essential for maintaining a healthy Vemetric deployment. This guide covers logging, metrics, queue monitoring, and troubleshooting.Logging
Application Logs
Vemetric uses Pino for structured JSON logging across all services.- App Service
- Hub Service
- Worker Service
trace: Detailed debug infodebug: Debug informationinfo: General informationwarn: Warning messageserror: Error messagesfatal: Fatal errors
Log Format
Pino outputs structured JSON logs:Pretty Printing (Development)
In development, logs are formatted withpino-pretty:
Log Aggregation (Production)
For production, ship logs to a centralized service:- Axiom
- Elasticsearch
- CloudWatch
Vemetric includes optional Axiom integration:Install
.env
@axiomhq/pino (already included in dependencies).Queue Monitoring
BullBoard UI
Vemetric includes Bull Board for real-time queue monitoring.Access BullBoard
Navigate to:Login with credentials from
.env:- Username:
BULLBOARD_USERNAME(default:bullboard) - Password:
BULLBOARD_PASSWORD(default:password)
Monitor Queues
BullBoard shows all queues:
event-queue: Event processingsession-queue: Session aggregationuser-queue: User updatesdevice-queue: Device trackingemail-queue: Email deliveryfirst-event-queue: First event handlingenrich-user-queue: User enrichmentmerge-user-queue: User merging
Queue Metrics
Monitor queue health with Redis CLI:Failed Jobs
Failed jobs are automatically stored in PostgreSQL:Database Monitoring
PostgreSQL
Connection Count
Connection Count
Database Size
Database Size
Table Sizes
Table Sizes
Slow Queries
Slow Queries
Enable query logging in View slow queries:
postgresql.conf:ClickHouse
Table Statistics
Table Statistics
Query Performance
Query Performance
Merge Performance
Merge Performance
Disk Usage
Disk Usage
Redis
Health Checks
Service Health Endpoints
- App Service
- Hub Service
Docker Health Checks
Add health checks to your Docker Compose:docker-compose.yml
Metrics & Dashboards
ClickHouse Metrics
ClickHouse exposes Prometheus metrics on port9363:
Prometheus + Grafana Setup
Add Prometheus and Grafana to your stack:Key Metrics to Monitor
Event Ingestion Rate
Track events/second through Hub service. Set up alerts for drops or spikes.
Queue Depth
Monitor BullMQ queue sizes. High depth indicates worker saturation.
Database Query Time
Track P95/P99 query latency for PostgreSQL and ClickHouse.
Error Rate
Monitor 4xx/5xx error rates in App and Hub services.
Memory Usage
Track Redis memory and ClickHouse memory usage.
Disk Space
Monitor disk usage for PostgreSQL, ClickHouse, and Redis volumes.
Error Tracking
Sentry Integration
Vemetric includes optional Sentry integration for error tracking:.env
@sentry/bun in:
- App service
- Hub service
- Worker service
Alerting
Set up alerts for critical conditions:Queue Depth Alerts
Queue Depth Alerts
Alert when queue depth exceeds threshold:
prometheus-alerts.yml
Database Disk Space
Database Disk Space
Alert when disk usage exceeds 80%:
Service Down
Service Down
Alert when service health checks fail:
Troubleshooting
Common Issues
High queue depth
High queue depth
Symptoms: BullBoard shows thousands of waiting jobsCauses:
- Worker service not running
- Worker overwhelmed by job volume
- Database connection issues
- Check worker logs:
docker logs vemetric-worker - Verify database connectivity
- Scale worker horizontally (run multiple instances)
- Increase worker concurrency in worker configuration
Events not appearing in dashboard
Events not appearing in dashboard
Symptoms: Events sent but not visible in analyticsDebugging:
- Check Hub logs for event receipt:
docker logs vemetric-hub - Verify project token is correct
- Check BullBoard for job processing
- Query ClickHouse directly:
- Check Worker logs for errors
Slow dashboard queries
Slow dashboard queries
Symptoms: Dashboard takes >5 seconds to loadSolutions:
- Check ClickHouse query performance:
- Optimize slow queries with materialized views
- Reduce date range for large datasets
- Add indexes if needed
- Scale ClickHouse vertically (more CPU/RAM)
Database connection pool exhausted
Database connection pool exhausted
Symptoms: “too many clients” or “connection pool timeout” errorsSolutions:
- Add PgBouncer for PostgreSQL connection pooling
- Increase PostgreSQL
max_connections: - Review app connection pool settings
- Check for connection leaks in application code
ClickHouse out of memory
ClickHouse out of memory
Symptoms: ClickHouse queries fail with memory errorsSolutions:
- Increase ClickHouse memory limit in
config.xml - Optimize queries to process less data
- Add
LIMITclauses to queries - Use sampling for large datasets:
- Scale ClickHouse vertically
Debug Mode
Enable verbose logging:.env
Performance Tuning
Worker Concurrency
Increase worker job concurrency:
Redis Maxmemory
Configure Redis memory limits:
ClickHouse Compression
Enable compression for better storage:
Database Indexes
Add indexes for frequent queries:
Next Steps
Configuration
Review environment variable configuration
Architecture
Understand Vemetric’s system architecture