Top arXiv Papers: Orca, AGI, Information Extraction, Causation Inference, Instruction Tuned Models
In today’s edition, we dive into the cutting-edge world of AI research, exploring Microsoft’s Orca model that outshines its open-source counterparts, a thought-provoking estimation of AGI likelihood by 2043, a weakly supervised approach to extracting vital information from handwritten medical documents, and the limitations of large language models in causal inference. As always, we’ll be shedding light on the lively discussions from Hacker News, delving into the minds of experts and enthusiasts alike. Stay tuned for an insightful journey through these fascinating breakthroughs and thought-provoking debates.
Top Papers
1) Orca Progressive Learning from Complex Explanation Traces
Summary:
Orca is a machine learning model developed by Microsoft Research that outperforms other open-source models in instruction following and TruthfulQA tasks, using imitation learning and diverse imitation data with rich signals from GPT-4, while covering 29 distinct skills and respecting user privacy and consent.
- Microsoft Research has developed Orca, a 13-billion parameter machine learning model that imitates the reasoning process of large foundation models (LFMs) through imitation learning.
- Orca outperforms conventional instruction-tuned models such as Vicuna-13B and ChatGPT on the BBH benchmark and shows competitive performance in professional and academic examinations like the SAT, LSAT, GRE, and GMAT.
- Orca learns from diverse imitation data with judicious sampling and selection, guided by teacher assistance from rich signals from GPT-4 including explanation traces, step-by-step thought processes, and other complex instructions.
- Orca addresses challenges such as small scale homogeneous training data and a lack of rigorous evaluation by tapping into large-scale and complex reasoning benchmarks like Big-Bench Hard (BBH) and AGIEval.
- Orca Progressive Learning is a system that uses complex instructions and explanations for progressive learning in pre-trained language models (LFMs).
- Orca outperforms Vicuna in instruction following and TruthfulQA tasks, but trails behind ChatGPT and GPT-4.
Hacker News:
Hacker News is experiencing slow response times. View on HN
- Hacker News is experiencing slow response times
- Requests cannot be served quickly
- Apology for the inconvenience caused
- No indication of when the issue will be resolved
- Implication that this is not a frequent occurrence
2) Estimating Likelihood of Transformative AGI by 2043
Summary:
The text is missing, please provide the input text.
3) Weakly Supervised Information Extraction from Handwritten Documents
Summary:
The article proposes a weakly supervised approach to extracting medicine names from handwritten medical prescriptions using a domain-specific medicine language model and weakly supervised segmentation, which significantly enhances the performance of existing OCR systems.
- A weakly supervised approach is proposed for extracting medicine names from handwritten medical prescriptions using a domain-specific language model and weakly supervised segmentation.
- The model achieves 78% pixel mIoU using weak labels and enhances the performance of existing OCR systems.
- The approach involves using an OCR labeling function and a segmentation labeling function, which improves over iterations.
- The authors use a medicine name vocabulary and a dataset of 9645 handwritten prescriptions written by 117 doctors.
- The algorithm developed can selectively infuse domain knowledge and correct errors caused by misinterpreting similar-looking medicines or OCR errors.
- The paper reviews various methods for weakly supervised information extraction from handwritten documents, emphasizing the importance of weak supervision in training models and highlighting potential for further research.
4) Large Language Models and Causal Inference
Summary:
This article introduces a new dataset to test large language models’ ability to infer causation from correlation and evaluates their performance, highlighting their limited causal inference skills and proposing a new dataset generation process.
- Large language models have limited causal inference skills and perform poorly on the Corr2Cause task.
- A new dataset of 400K samples is proposed to test causal reasoning abilities and evaluate the performance of 17 LLMs.
- Directed graphical causal models (DGCMs) are used to represent causal relationships among variables.
- RoBERTa-Large MNLI is the best-performing model for causal inference, but identifying V-structure remains challenging.
- The authors suggest future work to enhance LLMs’ skills with out-of-distribution perturbations and connect the benchmark to real-world false beliefs.
5) Instruction Tuned Models for Quick Learning
Summary:
The input text is missing.