Scaling Transformers to 1B Tokens, Practical Rowhammer Fingerprinting, Conservation Laws for Gradient Flows, Mixture-of-Experts with Instruction Tuning Win

Joe H.
July 07, 2023

Welcome back to another deep dive into the cutting-edge world of research papers. Today, we’re tackling everything from the L ONG N ET Transformer variant’s unprecedented ability to handle a whopping 1 billion tokens, to the intriguing technique of Rowhammer fingerprinting with Centauri, and the geometric complexities of gradient descent in machine learning. We’re also delving into the benefits of instruction tuning for Mixture-of-Experts models in large language models. As always, we’ll be spicing things up with a dash of discussion from the ever-insightful Hacker News community. So, buckle up and prepare for an intellectual adventure through the latest in tech research.

Top Papers

1) Scaling Transformers to 1000000000 Tokens

Summary:

The L ONG N ET Transformer variant has the ability to process sequences up to 1 billion tokens with dilated attention while still performing well on shorter sequences.

View PDF | Chat with this paper

Copy slides outline   Copy embed code   Download as Word

Scaling Transformers to 1,000,000,000 Tokens

Source: arxiv.org - PDF - 6,326 words - view

Hacker News:

Scaling transformers to 1 billion tokens is crucial for capturing long-range dependencies in text sequences and achieving AGI, although the adequacy of computational scale for models is a topic of debate. View on HN

  • The scaling of transformers to 1 billion tokens is discussed.
  • Concerns are raised about the effectiveness of attention mechanisms in capturing long-range dependencies in text sequences.
  • The human brain has 150 trillion synapses/parameters, while GPT-3 has 175 billion parameters.
  • There is an ongoing debate about the computational scale for models like GPT-3 and the need for further scaling.
  • The number of tokens in a language model determines the length of the context window.

(Illustration) An illustration of a powerful, robotic or mechanized creature, possibly a mecha, exuding energy or power. 3D Note: The image is a stylized, non-realistic depiction of a creature, suggesting it's an artistic creation rather than a photo or other type of image.

2) Centauri Practical Rowhammer Fingerprinting

Summary:

Centauri is a reliable technique that exploits manufacturing process variations to create distinct and consistent fingerprints across devices for Rowhammer fingerprinting.

View PDF | Chat with this paper

Copy slides outline   Copy embed code   Download as Word

Centauri Practical Rowhammer Fingerprinting: Building Unique and Stable Fingerprints

Source: arxiv.org - PDF - 14,522 words - view

Hacker News:

Centauri is a method that uses Rowhammer attacks to obtain computer fingerprints for unique identification purposes. View on HN

  • Centauri: Practical Rowhammer Fingerprinting is a method to obtain a fingerprint of a computer using a Rowhammer attack.
  • This fingerprint can uniquely identify a computer, even among those with identical hardware and software.
  • The technique can be implemented in native code and possibly in JavaScript, though less reliably and more slowly.
  • There is currently no widespread and effective mitigation for Rowhammer techniques, making devices more vulnerable over time.
  • The design defect that allows Rowhammer to work has not been corrected, despite being known for almost a decade.

(Illustration) An illustration of a person with futuristic elements, likely a cyborg, against a vibrant orange background. #FFA500 | #0080FF | #000000 | 3D, cyberpunk | Colors: #FFA500, #0080FF, #000000 Note: The image is a digitally created artwork depicting a stylized figure, rather than a photograph or other image type.

3) Scaling Transformers to 1000000000 Tokens

Summary:

The L ONG N ET Transformer variant has the ability to process sequences up to 1 billion tokens with dilated attention while still performing well on shorter sequences.

View PDF | Chat with this paper

Copy slides outline   Copy embed code   Download as Word

Scaling Transformers to 1,000,000,000 Tokens

Source: arxiv.org - PDF - 6,326 words - view

(Illustration) An illustration of a large, powerful robot with glowing purple fists. The robot appears to be in motion, possibly preparing to strike. #800080 | #ffa500 | #000000 | 3D | Colors: #800080, #ffa500, #000000 Note: The image is a stylized, non-realistic depiction of a robot, indicating it is an illustration rather than a photo or other image type.

4) Conservation Laws for Gradient Flows

Summary:

The article examines the geometric aspects of gradient descent in machine learning, focusing on conservation laws and the preservation of functions during optimization.

View PDF | Chat with this paper

Copy slides outline   Copy embed code   Download as Word

The Geometric Aspects of Gradient Descent in Machine Learning

Source: arxiv.org - PDF - 19,160 words - view

(Illustration) An illustration of a person with short dark hair, depicted in a vibrant, geometric style. #ff6600 | #0033ff | #ff00ff | geometric, abstract | Colors: #ff6600, #0033ff, #ff00ff Note: The image is a stylized, non-realistic depiction of a person, clearly an artistic creation rather than a photo or other type of image.

5) Mixture-of-Experts Meets Instruction Tuning

Summary:

The paper discusses the benefits of instruction tuning for Mixture-of-Experts models in comparison to dense models in large language models.

View PDF | Chat with this paper

Copy slides outline   Copy embed code   Download as Word

Mixture-of-Experts Meets Instruction Tuning

Source: arxiv.org - PDF - 15,911 words - view

(Illustration) An illustration shows two depictions of the same woman, side by side, with different lighting and color schemes.  She wears glasses and has short hair. #00FFFF | #FF69B4 | #FFA500 | #0000FF | 3D | Colors: #00FFFF, #FF69B4, #FFA500, #0000FF Note: The image is a digitally created artwork, not a photograph or other type of image. It features stylized depictions of a person.