Show HN: Cordon – Reduce large log files to anomalous sections
4 days ago
- #anomaly-detection
- #log-analysis
- #machine-learning
- Cordon uses transformer-based embeddings and density-based scoring for semantic anomaly detection in log files.
- Key principle: Repetitive patterns are considered normal; unusual, rare, or clustered events are highlighted.
- Features include semantic analysis, density-based scoring, noise reduction, and multiple backends (sentence-transformers or llama.cpp).
- GPU acceleration requires NVIDIA GPUs with Pascal architecture or newer; CPU mode is always available.
- Installation options include pip, uv, and cloning the repository for development.
- Basic usage involves running Cordon on log files with options for window size, k-neighbors, and anomaly percentile.
- Advanced configurations allow for GPU acceleration, anomaly range filtering, and detailed output.
- Cordon reduces large log files to semantically significant sections, achieving up to 98% reduction in some cases.
- Workflow includes ingestion, segmentation, vectorization, scoring, thresholding, merging, and formatting.
- Parameters like window_size, k_neighbors, and anomaly_percentile can be adjusted for different log types.
- Use cases include LLM pre-processing, initial triage, anomaly detection, and exploratory analysis.
- GPU acceleration provides significant speedups for large log files, with PyTorch used for k-NN scoring.