Hasty Briefsbeta

All tags

#ai agents

178 stories total

Bilingual

AI agents: Less capability, more reliability, please
a year ago
- Sergey Filimonov shares insights on AI agents.
- Emphasizes the need for reliability over capability in AI agents.
- Published on March 30, 2025.
- Invites readers to subscribe for updates.
- Encourages feedback and questions from readers.
Show HN: Cloud-Ready Postgres MCP Server
a year ago
- PG-MCP is a server implementation of the Model Context Protocol (MCP) for PostgreSQL databases.
- It provides a comprehensive API for AI agents to interact with PostgreSQL databases.
- Key features include multi-database support, rich catalog information, and query explanation tools.
- Built-in contextual information for PostgreSQL extensions like PostGIS and pgvector.
- Includes tools for connection management, query execution, and schema exploration.
- Supports read-only mode by default with secure connection handling.
- Can be deployed using Docker Compose or run directly with Python.
- Includes test scripts and a Claude-powered natural language to SQL conversion tool.
- Built on MCP, FastMCP, asyncpg, and YAML for extension context.
- Contributions welcome for additional extensions and features.
Kilo Code: Speedrunning open source coding AI
a year ago
- The Vesuvius Challenge demonstrated how a fast-moving community can achieve breakthroughs, inspiring the application of this approach to AI agents.
- Kilo Code aims to create the most user-friendly AI coding agent quickly, leveraging community feedback and rapid development.
- The team, assembled in a week, includes experienced professionals like JP Posma, Justin Halsall, and Johan Otten, focusing on speed and innovation.
- Recent improvements include no need for OpenRouter accounts, free tier with Claude 3.7 Sonnet, good defaults, and multiple onboarding enhancements.
- Future plans include instant app creation, up-to-date docs, browser IDE, local models, live collaboration, parallel-agents, and more advanced AI agent capabilities.
- Kilo Code is currently available in VS Code with a free tier offering $15 in tokens monthly, encouraging user feedback via Github and Discord.
What, exactly, is an 'AI Agent'? Here's a litmus test
a year ago
- An AI agent is an AI-driven system with autonomy and the ability to use tools, understand, and reason.
- There's confusion over what qualifies as an AI agent, with a general 'we’ll know it when we see it' attitude.
- Tines proposes a litmus test for AI agents based on identity: if the AI system performs actions under its own identity, it's an agent.
- In law, agents act on behalf of principals but can also act independently under their own identity, bearing responsibility for their actions.
- Enterprise software systems define identity by the actor named in audit logs for any action taken.
- AI agents must have autonomy, capability, and reasoning to bear responsibility, similar to human employees.
- AI assistants are supervised by humans, while AI agents can act independently on behalf of an organization.
- Many AI software products, including Tines' Workbench, are not agents but assist humans in their work.
- The promise of AI agents lies in their ability to act independently beyond just assisting humans.
The Agent2Agent Protocol (A2A)
a year ago
- AI agents enhance productivity by autonomously handling recurring or complex tasks in enterprises.
- The Agent2Agent (A2A) protocol is introduced to enable AI agents to interoperate across different systems and vendors.
- A2A is an open protocol supported by over 50 technology partners and leading service providers.
- The protocol allows AI agents to communicate, securely exchange information, and coordinate actions across enterprise platforms.
- A2A complements Anthropic's Model Context Protocol (MCP) and is designed to address challenges in large-scale, multi-agent systems.
- Key principles of A2A include facilitating communication between client and remote agents for task execution.
- Example use case: A2A can streamline hiring processes by enabling agents to collaborate on candidate sourcing and scheduling.
- A2A is released as open source, with a production-ready version expected later this year.
- Partners highlight the protocol's potential to drive innovation, efficiency, and seamless agent collaboration.
Google Agentspace
a year ago
- Google Agentspace enables AI agents to securely access and search across enterprise applications.
- Breaks down data silos with Google-quality multimodal search across connected apps like Box, Confluence, and Google Drive.
- Offers prebuilt expert AI agents for tasks like deep research, idea generation, and content synthesis.
- Allows creation of custom agents with Agent Designer, requiring no technical expertise.
- Ensures security with enterprise-grade access controls and real-time sync of permissions and data.
- Supports multimodal search across text, images, videos, and audio for comprehensive information retrieval.
- Leverages Google’s knowledge graph for personalized and context-aware search experiences.
- Generates high-quality multimodal content, including presentations, social content, and learning materials.
- Automates common tasks like sending emails, scheduling meetings, and updating applications.
- Features specialized agents like Idea Generation, Deep Research, and NotebookLM Plus for enhanced productivity.
- Agent Gallery provides a hub to discover and deploy Google, custom, and partner-developed agents.
- Empowers business teams (marketing, sales, developers, HR) with tailored AI agent solutions.
- Built on Google Cloud’s secure-by-design infrastructure for safe deployment across organizations.
Mosaic (YC W25) is building a general purpose agent for video editing
a year ago
- Mosaic is an agentic video editing paradigm that allows users to create and run their own multimodal video editing agents in a node-based canvas.
- The role involves accelerating the development of the core agentic video editing paradigm, building scalable pipelines for video processing and inference, creating evaluations, and making high-level design decisions.
- Mosaic's initial prototype won the $25,000 grand prize in the Google Gemini Kaggle competition and best demo in the Y Combinator W25 batch.
- The team consists of ex-Tesla engineers and is looking for a Founding Engineer to help accelerate video editing from hours to seconds.
Agency vs. Control vs. Reliability in Agent Design
a year ago
- High-agency tasks require agents to act competently, reliably, and consistently, especially in high-value use cases like customer support.
- Customer support is challenging due to knowledge gaps, impatient users, and time constraints, contrasting with ideal environments where agents have complete knowledge and forgiving conditions.
- LLMs like Anthropic's 'computer use' and OpenAI's DeepResearch show advancements in high-agency tasks, but real-world applications like Fin face reliability issues.
- Customers expect high reliability and control from agents, especially for sensitive tasks like subscription management, refunds, and context gathering.
- Measuring agent performance involves simulating tasks with predefined outcomes, user prompts, and stopping conditions to assess reliability and consistency.
- The 'pass^k' metric is stricter than 'pass@k', requiring consistent success over multiple repetitions, which is crucial for customer support reliability.
- Agency and reliability are inversely related; high-agency agents often perform inconsistently, especially in complex tasks.
- The 'Give Fin a Task' (GFAT) agent balances agency and control by using step-based instructions, improving reliability for simple and moderate tasks.
- GFAT's composability allows breaking complex tasks into simpler, more reliable steps, enhancing overall performance and customer satisfaction.
- Early benchmarks show GFAT significantly improves reliability, especially for simple and moderate tasks, by constraining agency and emphasizing structured execution.
12-factor Agents: Patterns of reliable LLM applications
a year ago
- The project '12-factor-agents' is open-source and welcomes contributions.
- Dex has explored various AI agent frameworks and found most 'AI Agents' lack true agentic behavior.
- The article introduces '12-factor agents' as a methodology for building reliable, scalable, and maintainable LLM-powered software.
- Key factors include natural language to tool calls, owning prompts and context window, and making agents stateless reducers.
- The traditional DAG approach is contrasted with agents that let LLMs determine the workflow path dynamically.
- A common journey for SaaS builders involves starting with a framework but often needing to rebuild from scratch for quality.
- The article emphasizes modular concepts from agent building that can be integrated into existing products for faster deployment.
- Disclaimers note the article is not a critique of existing frameworks and acknowledges their contributions to the AI ecosystem.
- The author shares additional resources and links for further reading on effective agent building.
LLMs Do Not Predict the Next Word
a year ago
- LLMs are initially trained to predict the next token in a sequence, a process known as the next-token objective.
- Instruction finetuning is used to adapt LLMs for specific tasks by training them on datasets designed for prompting, improving zero-shot learning capabilities.
- Reinforcement Learning from Human Feedback (RLHF) is a key training step where LLMs are optimized to produce outputs that humans prefer, moving beyond simple next-token prediction.
- RLHF involves two main steps: reward modeling, where a model learns to predict human preferences, and proximal policy optimization (PPO), which adjusts the LLM to maximize these rewards while staying close to its original behavior.
- LLMs can be viewed as agents that take actions (producing tokens) to maximize rewards, similar to how chess-playing models choose moves to win games.
- The concept of AI agents extends LLMs by mapping their token outputs to real-world actions, enhancing their utility beyond text generation.
- Despite their capabilities, LLMs trained with RLHF can sometimes produce outputs that seem good to humans but are actually flawed, a phenomenon known as reward hacking.
- The training and capabilities of LLMs suggest they are more than just next-token predictors; they are complex systems optimized for various objectives, including human appeal and task performance.
Principles for Building One-Shot AI Agents
a year ago
- EdgeBit is a security platform that helps teams find and fix vulnerabilities with features like Dependency Autofix.
- Dependency Autofix uses a reachability engine to identify impactful updates, saving engineers time.
- One-shot AI agents automate complex tasks without human input, as seen in EdgeBit's Dependency Autofix.
- EdgeBit's confidence in updates comes from static analysis, dependency calculation, and agentic workflows.
- The transition from pipeline-based to agentic workflows aimed to maintain determinism while adding flexibility.
- Focused tools are emphasized over generic ones to prevent errors and improve efficiency.
- Hard and soft failures are used to ensure correctness, with hard failures exiting when outside the problem space.
- Persistence in LLMs can lead to loops; EdgeBit mitigates this with focused tools and failure handling.
- EdgeBit's approach to one-shot agents can be applied to other code maintenance tasks for efficiency gains.
Inner Loop Agents
a year ago
- Inner loop agents allow LLMs to execute tool calls directly without client intervention.
- Regular LLMs rely on clients to parse and execute tools, while inner loop agents handle this internally.
- The LLM emits text with tool calls and a special token (<|eot|>) to signal completion.
- Software like Ollama and vLLM parse LLM output and manage the loop until the <|eot|> token is encountered.
- Inner loop agents enable concurrent tool use during the LLM's thinking process, enhancing efficiency.
- Models like o3 and o4-mini are trained to be agentic, optimizing tool use through reinforcement learning.
- Emergent tool use, where LLMs effectively use new tools without specific training, is still theoretical.
- Current options for tool use include MCP descriptions or training models specifically for tool use.
- Google's Agent 2 Agent (A2A) protocol facilitates communication between different fine-tuned LLM agents.
- Training an LLM with tools doesn't require the tools to be executed on the same host as the LLM.
Mission Impossible: Managing AI Agents in the Real World
a year ago
- AI tools are rapidly evolving, making it challenging to maintain control over AI agents in software development.
- Effective management of AI agents requires careful planning and setting clear guidelines to avoid common pitfalls.
- Key techniques include choosing the right tools, defining work scope, creating detailed plans, and revising them as needed.
- The quality of input materials (code, diagrams, data, prompts) significantly impacts AI agent success.
- Understanding and keeping up with tool documentation is crucial despite the convenience of AI assistance.
- Planning is emphasized as 90% of the work with AI agents, with reusable plans being essential for efficiency.
- Vibe coding (casual, unstructured requests) is discouraged for production code, as it often leads to unreliable outputs.
- Agents should be guided with modular, small steps to prevent deviations and ensure successful implementations.
- Testing plans rigorously and verifying AI-generated code manually is necessary to avoid hidden issues.
- Refactoring with AI can significantly reduce technical debt and improve code maintainability.
- Cost controls are important when using AI tools, including setting spending limits and choosing appropriate models.
- Model Context Protocol (MCP) is introduced as a method for agent communication, though it doesn't offer new capabilities.
CMU TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
a year ago
- TheAgentCompany is introduced as an extensible benchmark for evaluating AI agents on real-world professional tasks.
- AI agents are tested in a simulated small software company environment, mimicking tasks like web browsing, coding, and communication.
- Baseline agents powered by both closed API-based and open-weights language models (LMs) are evaluated.
- The most competitive agent can autonomously complete 24% of tasks, indicating potential for simpler task automation.
- More complex, long-horizon tasks remain beyond the capabilities of current AI systems.
- The study highlights implications for industry adoption of AI and economic policy regarding labor market effects.
Show HN: AG-UI Protocol – Bring Agents into Frontend Applications
a year ago
- AG-UI is an open, lightweight, event-based protocol for agent-human interaction.
- It standardizes how AI agents connect to front-end applications with 16 standard event types.
- AG-UI includes a middleware layer for compatibility across diverse environments and event transports.
- Developed based on real-world requirements and collaboration with the CopilotKit community and leading agent frameworks.
- Supports several popular agent frameworks like LangGraph, Mastra, CrewAI, and AG2.
- Enables features like agentic chat, bi-directional state sync, generative UI, and human-in-the-loop collaboration.
- Provides ready-to-use client libraries for React and messaging platforms like WhatsApp and WeChat.
- Hosts events like 'How to Bring AI Agents Into Frontend Applications' and 'AG-UI Protocol Working Group'.
- Encourages contributions from the community to improve AG-UI.
Show HN: Muscle-Mem, a behavior cache for AI agents
a year ago
- Muscle-mem is a Python SDK for AI agents that records and replays tool-calling patterns to solve tasks deterministically.
- It aims to reduce LLM usage for repetitive tasks, improving speed, reducing variability, and cutting token costs.
- Muscle-mem is not an agent framework but a tool to integrate with existing agents for safe tool reuse.
- The system uses Checks for cache validation to determine if an environment is safe for action.
- A demo and open-source release are planned for May 7 and 8, 2025, respectively.
- The SDK includes an Engine class to manage task execution and cache, and a @engine.tool decorator to record tool invocations.
- Checks are used to validate cache hits by capturing and comparing environment features before or after tool calls.
- An example demonstrates using timestamps and a one-second expiration for cache validation with the hello tool.
- Feedback is encouraged via the Muscle Mem Discord and GitHub repository.
An MCP-powered agent in 50 lines of code
a year ago
- MCP (Model Context Protocol) is a standard API for exposing sets of Tools that can be integrated with LLMs.
- An Agent can be implemented as a simple while loop on top of an MCP client, making Agentic AI simpler.
- The article demonstrates a Tiny Agent implementation in TypeScript, connecting to local MCP servers for tools like file system access and web browsing.
- Recent LLMs support function calling (tool use) natively, simplifying the integration of tools without manual prompt engineering.
- The MCP client connects to servers, formats their tools for LLM use, and handles tool calls and responses.
- The Agent's control flow includes tools for task completion and user questions, breaking the loop when needed.
- Future steps include experimenting with different models and inference providers, and contributions are encouraged.
Show HN: Pixelagent – Build your Stateful Agent Framework in 200 lines of code
a year ago
- Pixeltable provides a declarative framework for building custom agentic applications with features like memory, tool-calling, and multimodal support.
- Key features include data orchestration, native multimodal support (text, images, audio, video), a type-safe Python framework, and model agnosticism.
- The platform offers observability with automatic logging of messages, tool calls, and performance metrics.
- Agentic extensions allow adding reasoning, reflection, memory, knowledge, and team workflows.
- Users can implement long-term memory systems with semantic search and add custom Python functions as tools.
- Pixeltable supports ReAct patterns for step-by-step reasoning and planning, with structured prompts and dynamic system updates.
- Agent memory is automatically persisted in tables, and tool call history can be accessed and customized.
- The framework allows transforming agent blueprints into distributable packages on PyPI.
- Installation is straightforward with provider-specific dependencies for different AI models (e.g., Anthropic, OpenAI).
- Tutorials and examples are available for core concepts, advanced patterns, and specialized techniques.
Agent2Agent – A Technical Deep Dive into the Protocol's Core Logic
a year ago
- Firecrawl, a Y-Combinator backed startup, is hiring AI Agents for content generation, coding, and customer support roles, offering a monthly salary of $5000.
- The Agent2Agent (A2A) protocol, proposed by Google, is an open communication standard enabling interoperability between independent AI agents, treating them as discoverable 'black boxes'.
- A2A is designed around a Client-Server model, allowing client agents to access remote agent functionalities without knowing their implementation details.
- Core components of A2A include AgentCards (agent business cards), Tasks (work instructions), Artifacts (response content), Messages (conversation tracking), and Push Notifications (async processing).
- A2A RPC methods include tasks/send (synchronous processing), tasks/sendSubscribe (streaming), and tasks/get (retrieving task status).
- The A2A Memory Layer manages queued messages, status tracking, and result communication, with TaskStore/TaskManager at its core.
- A2A implementation involves handlers for different request types and a TaskManager to invoke agent logic and update task statuses.
- A comparison between A2A and MCP (another protocol) is teased for a future article.
If an AI agent can't figure out how your API works, neither can your users
a year ago
- AI agents act like junior developers, using API docs to make requests and adjust parameters until they succeed.
- Poor API developer experience (e.g., outdated examples, vague errors) stalls both AI agents and human developers.
- Clear documentation, detailed error messages, and consistent API design improve usability for both AI agents and humans.
- AI agents follow a ReAct cycle (Reasoning and Action) similar to human problem-solving when interacting with APIs.
- Scenario comparison shows that clear error messages and documentation reduce time-to-success for AI agents.
- AI agents highlight API usability gaps, serving as diagnostic tools for improving developer experience (DX).
- Best practices for API design include consistency, comprehensive documentation, detailed errors, and guided examples.
- Using AI agents as smoke testers in CI/CD pipelines helps catch API issues before they affect users.
- Designing APIs for simplicity and natural use benefits both AI agents and human developers.
- Closing the feedback loop by analyzing agent failures and support tickets improves API usability over time.

firstprev0next

About|Login

#ai agents

AI agents: Less capability, more reliability, please

Show HN: Cloud-Ready Postgres MCP Server

Kilo Code: Speedrunning open source coding AI

What, exactly, is an 'AI Agent'? Here's a litmus test

The Agent2Agent Protocol (A2A)

Google Agentspace

Mosaic (YC W25) is building a general purpose agent for video editing

Agency vs. Control vs. Reliability in Agent Design

12-factor Agents: Patterns of reliable LLM applications

LLMs Do Not Predict the Next Word

Principles for Building One-Shot AI Agents

Inner Loop Agents

Mission Impossible: Managing AI Agents in the Real World

CMU TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Show HN: AG-UI Protocol – Bring Agents into Frontend Applications

Show HN: Muscle-Mem, a behavior cache for AI agents

An MCP-powered agent in 50 lines of code

Show HN: Pixelagent – Build your Stateful Agent Framework in 200 lines of code

Agent2Agent – A Technical Deep Dive into the Protocol's Core Logic

If an AI agent can't figure out how your API works, neither can your users