Hasty Briefsbeta

We collected 10k hours of neuro-language data in our basement

3 days ago
  • #AI-research
  • #data-collection
  • #neuro-language
  • Collected ~10k hours of neuro-language data from thousands of unique individuals, the largest dataset of its kind.
  • Trained thought-to-text models to decode semantic content from noninvasive neural data, with zero-shot examples provided.
  • Participants engaged in freeform conversations with an LLM for two hours, producing multimodal neural data aligned with text and audio.
  • Improved participant engagement by personalizing LLM interactions and implementing a token quality scoring system.
  • Designed and optimized multimodal headsets by combining the best single-modality headsets and 3D printing custom parts.
  • Switched data format to Zarr 3 for unified storage, improving real-time quality checks and reducing marginal data cost by ~30%.
  • Found that data quantity (after ~4k-5k hours) outweighs noise reduction, making extreme noise-reduction efforts less critical.
  • Implemented dynamic pricing and overbooking in a custom booking suite to maximize headset occupancy.
  • Capped participant sessions at 10 to ensure dataset diversity, balancing unique participants against total hours.
  • Reduced marginal cost per usable hour by ~40% through backend optimizations, real-time data checks, and improved session management.