Evolutionary Tree, ChatGPT Behavior, Adam Instability, Neural-Cracking-Machines, AI Risk Assessment Techniques

Joe H.
July 20, 2023

In today’s deep dive, we’re uncovering the roots of Large Language Models (LLMs) with the Constellation web app, grappling with behavior changes in the GPT series, and confronting Adam’s instability in large-scale machine learning. We’ll also be cracking passwords with a universal model and assessing the risks for AGI companies. These revelations, drawn from the latest Arxiv papers and vibrant Hacker News discussions, promise to reshape our understanding of AI evolution, optimization algorithms, cybersecurity, and risk management in AGI. So buckle up, because we’re about to embark on an insightful journey through today’s cutting-edge AI research.

Top Papers

1) Evolutionary Tree and Graph for Large Language Models

Summary:

The authors created Constellation, a web app that provides a visual representation of the hierarchical relationships among Large Language Models (LLMs) like ChatGPT and Bard, addressing their lack of a comprehensive index.

View PDF | Chat with this paper

Copy slides outline   Copy embed code   Download as Word

Unraveling the Evolution of Large Language Models (LLMs)

Source: arxiv.org - PDF - 2,384 words - view

Hacker News:

The paper “An Evolutionary Tree and Graph for Large Language Models” explores the evolution of large language models and has received 15 points on Hacker News. View on HN

  • The input text is about an evolutionary tree and graph for large language models.
  • The presentation paper behind the concept is available on arxiv.org.
  • The paper is titled “An Evolutionary Tree and Graph for 15,821 Large Language Models.”
  • There are 15 points on arxiv.org related to this topic.
  • The concept is also discussed on the website https://constellation.sites.stanford.edu/.

2) Behavior Changes in GPT-3.5 and GPT-4

Summary:

This summary highlights behavior changes in GPT-3.5 and GPT-4 to aid users in comprehending and utilizing these extensive language models.

View PDF | Chat with this paper

Copy slides outline   Copy embed code   Download as Word

Behavior Changes in GPT-3.5 and GPT-4

Source: arxiv.org - PDF - 5,229 words - view

Hacker News:

The text discusses the evolving behavior of ChatGPT and its effectiveness in handling mathematics, with suggestions for using GPT-4 with a Wolfram plugin and debates about the efficiency of language models for math. View on HN

  • ChatGPT’s ability to handle mathematics and solve math problems is a topic of discussion.
  • Some users suggest using GPT-4 with a Wolfram plugin for math problems, while others argue that using language models for math is inefficient.
  • Tokenization of digits poses challenges for ChatGPT in solving math problems.
  • ChatGPT’s behavior changes over time, and there are concerns about model regressions and optimization schemes.
  • OpenAI denies intentionally degrading ChatGPT’s behavior for cost-saving purposes.
  • The use of mixture-of-experts routing in GPT4 architecture may contribute to the changing behavior of ChatGPT.
  • Transparency about technical details of cloud products like ChatGPT is welcomed, but exposing every detail can hinder development.
  • OpenAI invests in models like ChatGPT and aims to build an ecosystem of companies around them.

3) Adam Instability in Large-Scale Machine Learning

Summary:

The paper explores the cause of instability in training large language models, identifying the Adam optimization algorithm as the main contributor.

View PDF | Chat with this paper

Copy slides outline   Copy embed code   Download as Word

Adam Instability in Large-Scale Machine Learning

Source: arxiv.org - PDF - 10,736 words - view

Hacker News:

The text discusses the Adam instability in large-scale machine learning and explores the use of black box optimization for unconventional architectures and objective functions without good gradients, while mentioning ongoing efforts to improve optimization algorithms. View on HN

  • Adam instability in large-scale machine learning is a topic of interest.
  • Gradients in machine learning can become auto-correlated, leading to instability.
  • The Adam optimization algorithm is derived from adaptive moment estimation.
  • There is ongoing research on using derivative-free/black-box optimizers for training large networks.
  • Training language models without gradient descent is challenging and time-consuming.

4) Universal Neural-Cracking-Machines Self-Configurable Password Models

Summary:

The paper presents a universal password model that adjusts its guessing strategy according to the target system, with the results indicating that both seeded and tailored models outperform the baseline, with seeded models being slightly more effective.

View PDF | Chat with this paper

Copy slides outline   Copy embed code   Download as Word

Universal Neural-Cracking-Machines: The Future of Password Models

Source: arxiv.org - PDF - 21,020 words - view

5) Risk Assessment for AGI Companies Techniques and Recommendations

Summary:

The text suggests that AGI companies like OpenAI and Google DeepMind should improve their risk management practices by adopting safety-critical industry techniques and considering the role of humans in their control structure models.

View PDF | Chat with this paper

Copy slides outline   Copy embed code   Download as Word

Risk Assessment for AGI Companies Techniques and Recommendations

Source: arxiv.org - PDF - 24,728 words - view

Ready for more?

Check out other posts from this blog.

View all »