Hasty Briefsbeta

Bilingual

The last six months in LLMs, illustrated by pelicans on bicycles

a year ago
  • #AI
  • #LLMs
  • #Benchmarking
  • The speaker presented a keynote on the last six months in LLMs at the AI Engineer World’s Fair in San Francisco.
  • Over 30 significant LLM models were released in the past six months, making it challenging to evaluate and compare them.
  • The speaker introduced a unique benchmark involving generating an SVG of a pelican riding a bicycle to evaluate LLMs.
  • Notable model releases include Amazon's Nova models, Meta's Llama 3.3 70B, and DeepSeek's open-weight models.
  • DeepSeek's R1 reasoning model caused a significant stock market drop, wiping $600 billion from NVIDIA's valuation.
  • Mistral Small 3, a 24B model, was highlighted for its efficiency and capability, running on a laptop with limited RAM.
  • Anthropic's Claude 3.7 Sonnet and OpenAI's GPT 4.5 were discussed, with Claude being a favorite despite GPT 4.5's high cost and underwhelming performance.
  • OpenAI's 'GPT-4o native multimodal image generation' feature was a massive success, attracting 100 million new users in a week.
  • The speaker criticized ChatGPT's new memory feature for compromising user control over context.
  • Recent trends in LLMs include the integration of tools and reasoning, enhancing their capabilities and applications.
  • The speaker highlighted risks associated with LLMs, such as prompt injection and the 'lethal trifecta' of private data access, malicious instructions, and data exfiltration mechanisms.
  • The pelican benchmark was humorously acknowledged by Google during their I/O keynote, prompting the speaker to consider a new benchmark.