Hasty Briefsbeta

Bilingual

Large Feeds and RFC 5005

4 months ago
  • #Performance Optimization
  • #Feed Processing
  • #SQLite
  • Xobaque is being used to back Search indieblog.page, importing nearly 5000 feeds.
  • A single SQLite writer is a bottleneck due to the inefficiency of 'select foo' if found 'update foo' else 'insert foo'.
  • UPSERT cannot be used because full-text search is implemented via a virtual table that doesn't allow constraints or indexes.
  • Some blogs have feeds with an excessive number of pages (e.g., 12000, 4000, 2000), which seems unnecessary for update feeds.
  • RFC 5005 is mentioned as a solution for feed paging and archiving.
  • Current architecture involves 10 Go routines fetching feeds using If-Modified-Since and If-None-Match headers, skipping 304 responses.
  • A lock is acquired before writing to disk to ensure only one SQLite writer at a time, which is slow.
  • Performance improvements were made by splitting select, insert, and updates into chunks of 1000 items, reducing runtime to about 12 hours.
  • Potential next steps include using global variables for prepared statements and filtering feeds based on Last-Modified headers to skip unchanged pages.
  • A log analysis showed 1782 feeds processed with SQL and 2179 skipped due to HTTP caching, with most pages unchanged.