Hasty Briefsbeta

All tags

#llm

310 stories total

Bilingual

Show HN: Using DSPy to enrich a dataset of the Nobel laureate network
9 months ago
- Combining data from multiple sources to construct knowledge graphs presents challenges in disambiguating similar-looking entities.
- A two-part workflow involving vector embeddings and an 'LLM-as-a-judge' is introduced for entity disambiguation.
- DSPy, a declarative framework for building compound AI pipelines, is showcased for its ability to program LLMs without manual prompting.
- The workflow is applied to merge datasets of Nobel laureates and their mentorship relationships with enriched data from the Nobel Prize API.
- Vector indexing and search in Kuzu are used to find similar entities, followed by LLM-based disambiguation to merge datasets.
- The merged data enables answering complex questions about Nobel laureates, their mentors, affiliations, and more.
- DSPy's approach eliminates the need for manual prompt writing, focusing instead on declaring intent through programming.
- The methodology is cost-effective and scalable, with potential applications in various domains beyond the presented use case.
An LLM does not need to understand MCP
9 months ago
- Model Context Protocol (MCP) standardizes tool calling for agents, but LLMs don't inherently understand or need MCP.
- LLMs generate tool call snippets based on provided context but don't execute the calls themselves; the developer handles execution.
- MCP simplifies tool integration by acting as a universal adapter, reducing custom code for each tool.
- Tool calling involves providing tool definitions (name, description, inputs) in the prompt, and the LLM predicts the appropriate tool call.
- Context engineering is crucial for effective AI systems, ensuring the LLM has the right inputs to generate useful outputs.
- MCP benefits developers by managing tool complexity, enabling reuse, and maintaining consistent formats without LLM awareness.
- The separation of concerns—LLMs predict tool calls, while developers execute them—is key to efficient agentic loops.
Show HN: Octofriend, a cute coding agent that can swap between GPT-5 and Claude
9 months ago
- Octofriend is a coding assistant that works with OpenAI-compatible or Anthropic-compatible LLM APIs.
- It supports switching models mid-conversation and uses custom-trained ML models to handle tool call and code edit failures.
- Octofriend is compatible with GPT-5, Claude 4, GLM-4.5, and Kimi K2, among others.
- It manages thinking tokens effectively to ensure optimal performance.
- Octofriend has zero telemetry, ensuring privacy, and can be used with privacy-focused LLM providers.
- It can be run with `--unchained` to skip tool and edit confirmations.
- Octofriend looks for instruction files like OCTO.md, CLAUDE.md, and AGENTS.md, merging rules from project-specific and global files.
- It supports integration with MCP servers for richer functionality, configurable via `~/.config/octofriend/octofriend.json5`.
GPU-rich labs have won: What's left for the rest of us is distillation
9 months ago
- OpenAI spends over $50M daily on LLM training, making competition without massive resources futile.
- 2024 saw wasteful AI spending by enterprises, with their models quickly becoming outdated by new releases from major labs.
- Open-source models are catching up through distillation of large proprietary models, exemplified by Deepseek.
- The gap between open-source and proprietary models is widening due to GPU wealth disparity.
- 2025 focuses on agents and the application layer, with enterprises shifting to smaller, task-specific LLMs.
- Distillation allows training smaller models using outputs from large models, conserving performance while reducing costs.
- Distillation is key for reducing latency and costs post-product-market fit.
- Inference.net offers end-to-end distillation and inference solutions for founders focused on the application layer.
Apple taught an LLM to predict tokens up to 5x faster in math and coding tasks
9 months ago
- Apple's research introduces a 'multi-token prediction' (MTP) framework to speed up LLM responses while maintaining output quality.
- Traditional LLMs generate text one token at a time, which is slow due to autoregressive decoding.
- MTP allows models to predict multiple tokens at once using special 'mask' tokens in prompts.
- The model speculates on upcoming words and verifies them against standard autoregressive decoding, reverting if guesses fail.
- Testing with Tulu3-8B showed speedups of 2–3× for general tasks and up to 5× for predictable domains like coding and math.
- No degradation in generation quality was reported, thanks to 'gated LoRA adaptation.'
Yet Another LLM Rant
9 months ago
- ChatGPT 5 provided incorrect information about ZSTD compression support in Swift on iPhone, claiming it works on iOS 16+ when it has never been supported by Apple.
- LLMs like ChatGPT generate responses based on statistical likelihood, not factual accuracy or logical reasoning, leading to confident but incorrect answers.
- The author emphasizes that LLMs cannot think, learn, or reason; they only produce text that seems plausible based on their training data.
- A human analogy illustrates how humans can question, test, and update their beliefs based on new evidence, unlike LLMs which stick to statistically likely responses.
- The post advises against relying on LLMs for critical tasks and encourages using human judgment, creativity, and expertise instead.
GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2
9 months ago
- OpenAI released new open-weight LLMs: gpt-oss-120b and gpt-oss-20b, their first since GPT-2 in 2019.
- The architecture includes optimizations like MXFP4 quantization, allowing models to run locally on single GPUs.
- Key architectural changes from GPT-2 include removing dropout, using RoPE for positional embeddings, and replacing GELU with Swish/SwiGLU.
- Mixture-of-Experts (MoE) replaces single feed-forward modules, increasing model capacity while keeping inference efficient.
- Grouped Query Attention (GQA) and sliding window attention improve computational efficiency.
- RMSNorm replaces LayerNorm for better training efficiency.
- Comparison with Qwen3 shows differences in width vs. depth and expert configurations.
- gpt-oss models support adjustable reasoning effort levels (low/medium/high) via system prompts.
- Benchmarks show gpt-oss is competitive with proprietary models and Qwen3, despite being smaller.
- GPT-5 was released shortly after gpt-oss, with gpt-oss performing surprisingly well in comparison.
Nexus: An Open-Source AI Router for Governance, Control and Observability
9 months ago
- Introduction of Nexus, an AI router optimizing interactions between AI agents, MCP tools, and LLMs.
- Nexus serves as a central hub for MCP server aggregation and intelligent LLM routing.
- Solves challenges: MCP Server Aggregation and Intelligent LLM Routing.
- Addresses pain points: Context management, cost efficiency, observability, and security.
- Acts as a proxy layer, handling authentication, connection management, and response aggregation.
- Factors in model selection: Task type, latency, context length, and availability.
- Benefits: Simplifies system complexity, enables real-time monitoring, and ensures operational resilience.
- Future features include advanced routing algorithms, analytics dashboards, and enhanced security.
- Designed for seamless integration into existing AI workflows across various applications.
Can modern LLMs count the number of b's in "blueberry"?
9 months ago
- OpenAI released GPT-5, which did not meet expectations, particularly in answering simple questions like counting letters in words.
- GPT-5 incorrectly stated there are three 'b's in 'blueberry' when there are only two, a mistake replicated by multiple users.
- The issue may stem from tokenization, where LLMs process text numerically rather than as individual letters, making letter counting difficult.
- Despite tokenization challenges, some LLMs like Claude models correctly counted letters, showing variability in performance across different models.
- Testing various LLMs revealed GPT-5's consistent errors in counting 'b's in 'blueberry', while other models like Claude and Gemini showed mixed results.
- The post concludes that while LLMs can count letters, their accuracy is inconsistent, raising questions about their reliability for such basic tasks.
LLM Hallucination Seems Like a Big Problem, Not a Mere Speedbump
9 months ago
- LLMs like GPT-5 and Gemini 2.5 Flash frequently hallucinate nonexistent sources when asked for specific citations.
- Despite claims of reduced hallucinations, synthetic benchmarks fail to capture real-world frequency, and many examples of GPT-5 hallucinations exist.
- LLMs often insist their hallucinated sources are real, misleading users who lack skepticism.
- The need for constant human verification undermines the efficiency and value proposition of LLMs.
- LLMs do not think or reason; they are sophisticated next-character prediction engines.
- The hype around LLMs ignores their profound limitations, creating a dangerous bubble in both media and markets.
- Marketing around LLMs is misleading, as they construct responses based on scoring patterns rather than truth.
- In professional domains, verifying LLM outputs often takes longer than doing the work independently.
- Limited research shows engineers are slower when using LLMs, contradicting claims of efficiency.
- The Gell-Mann amnesia effect explains societal over-trust in LLMs despite evident flaws.
Writing an LLM from scratch, part 17 – the feed-forward network
9 months ago
- The feed-forward network in LLMs is crucial for processing context vectors after attention mechanisms.
- It consists of two linear layers with a GELU activation function, expanding and then reducing dimensions.
- Attention mechanisms gather information, but feed-forward networks perform the 'thinking' or pattern-matching.
- Feed-forward networks contain more parameters than attention mechanisms, indicating their importance.
- A single hidden layer in the feed-forward network acts as a universal approximator.
- The author initially underestimated the role of feed-forward networks in LLMs.
- Future posts may explore deeper networks and related research papers.
LLMs are not like you and me – and never will be
9 months ago
- LLMs are fundamentally different from humans and do not think like humans despite some superficial similarities.
- LLMs lack proper world models, leading to errors in tasks requiring temporal reasoning, common sense, and factual accuracy.
- Examples of LLM errors include incorrect historical facts, inability to play chess reliably, and failure to account for inflation or time.
- LLMs operate based on autocomplete mechanisms rather than genuine understanding or reasoning.
- The field of AI has not adequately addressed fundamental reasoning frameworks like time, space, and causality.
- Using LLMs as agents for complex tasks is unreliable due to their lack of proper world models and reasoning capabilities.
- Critics argue that claims of LLMs thinking like humans are based on ignorance or denial of how both LLMs and human brains work.
- Despite improvements, LLMs remain limited and cannot be trusted for tasks requiring deep reasoning or factual accuracy.
Rubberduck: Emulate OpenAI/Anthropic locally with caching and failure injection
9 months ago
- Rubberduck is a local LLM caching reverse proxy server that emulates major LLM providers.
- Supports OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, and Google Vertex AI.
- Features include caching, failure simulation, rate limiting, per-user proxy instances, and detailed logging.
- Perfect request/response compatibility with official SDKs and transparent header/authentication passthrough.
- SHA-256 cache keys based on normalized request bodies; only successful responses (2xx) are cached.
- Manual cache invalidation per proxy instance and respects upstream provider caching headers.
- Failure simulation includes timeouts, error injection (429, 500, 400), IP filtering, and rate limiting.
- Real-time request logging with metadata, exportable logs (CSV/JSON), and rolling metrics aggregation.
- Dashboard for live system stats, proxy monitoring, and cost tracking with token usage.
- Stripe-inspired UI with clean, modern, responsive design and JWT-based authentication.
- Supports email/password + social login (Google/GitHub), email verification, and password reset.
- Backend built with FastAPI (Python 3.11+), frontend with React + TypeScript (Node.js 18+).
- Easy setup with git clone, virtual environment, and npm install for frontend dependencies.
- Proxy management via web interface or SDK configuration with base URL changes.
- Testing includes unit, integration, and coverage tests for both backend and frontend.
- Development tools include Black, isort, mypy for Python, and linting for TypeScript.
- Open-source under MIT License with contributions welcome via fork, feature branch, and PR.
Show HN: langdiff – Stream valid JSON from LLMs with type-safe callbacks
9 months ago
- LangDiff is a Python library for streaming structured LLM outputs to frontends with intelligent partial parsing.
- It provides type-safe callbacks (on_append, on_update, on_complete) as tokens stream in.
- Supports Pydantic-style models for defining schemas and seamless interop with existing libraries like OpenAI SDK.
- Automatically generates JSON Patch diffs for efficient frontend synchronization.
- Enables building responsive AI applications with independent evolution of backend and frontend.
- Includes features like tracking mutations without changing code patterns and handling incomplete JSON tokens.
- Example use case: streaming multi-section articles with progressive updates.
- Apache-2.0 licensed with an example.py for a runnable demo.
Eca: Editor Code Assistant – AI pair programming capabilities agnostic of editor
9 months ago
- Editor-agnostic protocol for integrating LLMs with any editor.
- Single configuration setup works globally or locally across all editors.
- Features include chat interface, agentic capabilities, and context support.
- Supports multiple models like OpenAI, Anthropic, and Ollama.
- Server written in Clojure, inspired by LSP protocol.
- Benefits include tool call management, multiple LLM interaction, and telemetry.
- Easy installation with automatic server download for supported editors.
- Debugging options available with --log-level debug or --verbose.
- Open for contributions and sponsorships to support project growth.
Show HN: Yet Another Memory System for LLM's
9 months ago
- Persistent memory for LLMs with content-addressed storage, deduplication, and semantic search.
- Features include SHA-256 based storage, block-level deduplication, Zstandard/LZMA compression, and full-text/semantic search.
- Supports crash recovery via write-ahead logging and achieves high performance (100MB/s+ throughput).
- Available on Linux x86_64, ARM64, and macOS (Intel/Apple Silicon).
- Docker and Homebrew installation options provided.
- Requires C++20 compiler, CMake 3.20+, and Python 3.8+ for Conan builds.
- Includes CLI with TUI browser, MCP server, and support for PDF text extraction.
- Designed for seamless integration with LLMs, supporting storage, retrieval, and search operations.
- Optimization options for large files and memory usage adjustments available.
- Licensed under Apache-2.0.
I let LLMs write an Elixir NIF in C; it mostly worked
9 months ago
- The author built a cross-platform Elixir NIF in C to get on-demand disk-usage stats without relying on os_mon and its disksup service.
- Initial C code and Makefile were generated by Grok 3, with iterative reviews by Gemini 2.5 Flash and GPT-5 to ensure cross-platform compatibility (Linux, macOS, Windows, BSDs).
- The NIF provides functions to check disk space for a specific path, returning total, used, free, and available bytes.
- The project highlights the strengths and weaknesses of LLMs in coding, emphasizing the need for human oversight.
- The resulting Elixir package, DiskSpace, is available on Hex.pm and GitHub, with support for multiple OSs and Elixir/OTP versions.
- Testing was conducted on various BSDs (NetBSD, FreeBSD, OpenBSD), with DragonFlyBSD being the exception due to Erlang/OTP version constraints.
- The author reflects on the limitations of LLMs, noting overconfidence and occasional inaccuracies despite their utility in accelerating development.
UWU – generate CLI commands without leaving the terminal
9 months ago
- uwu is a lightweight CLI tool that converts natural language into shell commands using LLMs like GPT-5.
- It focuses solely on generating shell commands quickly without replacing comprehensive development tools.
- Users can edit generated commands before execution to add flags or modifications.
- Installation involves cloning the repository, installing dependencies with Bun, and building the binary.
- Configuration is done via a config.json file, supporting multiple AI providers like OpenAI, Claude, Gemini, and custom endpoints.
- Features include command history context for better command generation, configurable via the config file.
- Shell integration allows for seamless command generation and editing within the shell environment.
- Usage example: 'uwu generate a new ssh key called uwu-key and add it to the ssh agent'.
- Contributions are encouraged through pull requests.
LLMs tell bad jokes because they avoid surprises
9 months ago
- LLMs struggle with comedy, art, journalism, research, and science because they are designed to minimize surprise.
- Jokes require a balance of surprise and inevitability in hindsight, which LLMs cannot achieve due to their predictive nature.
- Good stories and journalism also rely on surprising yet inevitable elements, making them challenging for LLMs to generate effectively.
- LLMs perform better in areas like software development where minimizing surprises is beneficial.
- Future AI advancements may require hybrid architectures that incorporate curiosity to seek and validate surprising truths.
MCP Tools and Dependent Types
9 months ago
- The author experimented with creating an MCP server for the Defold editor to allow Claude (an AI) to evaluate Lua code within the editor's scripting context.
- The AI's approach was entertaining but inaccurate, preferring trial-and-error over documentation, which may become frustrating as LLM accuracy expectations rise.
- Structured outputs via JSON schemas are a common solution for improving LLM accuracy, where tools define input schemas that the AI must adhere to.
- A challenge arises with dynamically resolving JSON schemas for complex domains, such as editing 3D models in Defold, where data shapes depend on runtime context.
- The proposed solution involves a two-step process: first, the AI selects a resource, then the system constructs a JSON schema based on the resource's data shape for the AI to use.
- Current MCP lacks support for dependent types or dynamic schema resolution, which would enable more complex interactions like the proposed edit_resource tool with schema lookup.

first prev7next

About|Login

#llm

Show HN: Using DSPy to enrich a dataset of the Nobel laureate network

An LLM does not need to understand MCP

Show HN: Octofriend, a cute coding agent that can swap between GPT-5 and Claude

GPU-rich labs have won: What's left for the rest of us is distillation

Apple taught an LLM to predict tokens up to 5x faster in math and coding tasks

Yet Another LLM Rant

GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2

Nexus: An Open-Source AI Router for Governance, Control and Observability

Can modern LLMs count the number of b's in "blueberry"?

LLM Hallucination Seems Like a Big Problem, Not a Mere Speedbump

Writing an LLM from scratch, part 17 – the feed-forward network

LLMs are not like you and me – and never will be

Rubberduck: Emulate OpenAI/Anthropic locally with caching and failure injection

Show HN: langdiff – Stream valid JSON from LLMs with type-safe callbacks

Eca: Editor Code Assistant – AI pair programming capabilities agnostic of editor

Show HN: Yet Another Memory System for LLM's

I let LLMs write an Elixir NIF in C; it mostly worked

UWU – generate CLI commands without leaving the terminal

LLMs tell bad jokes because they avoid surprises

MCP Tools and Dependent Types