LangExtract: Python library for extracting structured data from language models
9 months ago
- #Python
- #LLM
- #Text Extraction
- LangExtract is a Python library for extracting structured information from unstructured text using LLMs.
- Key features include precise source grounding, reliable structured outputs, optimized long document processing, interactive visualization, flexible LLM support, and adaptability to any domain.
- Supports cloud-based models like Google Gemini and local models via Ollama, requiring API keys for cloud models.
- Quick start involves defining a prompt, providing examples, and running extraction with a few lines of code.
- Installation is straightforward via pip, with options for development mode and Docker.
- API key setup can be done via environment variables, .env files, or directly in code (not recommended for production).
- Examples include processing full texts like Romeo and Juliet and extracting medical information from clinical notes.
- Contributions are welcome, with guidelines provided in CONTRIBUTING.md.
- Testing can be done locally with pytest or tox, with instructions for handling dependencies.
- Disclaimer notes that LangExtract is not an officially supported Google product and is subject to Apache 2.0 License.