Jump to ratings and reviews
Rate this book

Large Language Models are Zero-Shot Reasoners

Rate this book
Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning, difficult system-2 tasks that do not follow the standard scaling laws for LLMs. While these successes are often attributed to LLMs' ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding "Let's think step by step" before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics (MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled Objects), without any hand-crafted few-shot examples, e.g. increasing the accuracy on MultiArith from 17.7 % to 78.7 % and GSM8K from 10.4 % to 40.7 % with 175B parameter InstructGPT model, as well as similar magnitudes of improvements with another off-the-shelf large model, 540B parameter PaLM. The versatility of this single prompt across very diverse reasoning tasks hints at untapped and understudied fundamental zero-shot capabilities of LLMs, suggesting high-level, multi-task broad cognitive capabilities may be extracted by simple prompting. We hope our work not only serves as the minimal strongest zero-shot baseline for the challenging reasoning benchmarks, but also highlights the importance of carefully exploring and analyzing the enormous zero-shot knowledge hidden inside LLMs before crafting finetuning datasets or few-shot exemplars.

42 pages, ebook

First published May 24, 2022

1 person is currently reading
17 people want to read

About the author

Takeshi Kojima

17 books1 follower

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
3 (100%)
4 stars
0 (0%)
3 stars
0 (0%)
2 stars
0 (0%)
1 star
0 (0%)
Displaying 1 of 1 review
Profile Image for Manny.
Author 47 books16.1k followers
December 21, 2022
Like many people, I have been playing with ChatGPT and am intrigued by its abilities. Why is its performance so uneven? Why does it sometimes get obvious things wrong, when it's previously got apparently very similar things right? Is there anything that can help push it in the direction of being right more often? I looked to around to see what other people had discovered and found this remarkable paper, which was posted on ArXiV a few months ago.

The authors, who come across as exceedingly knowledgeable and careful researchers, have analysed the behaviour of a substantial collection of large language models (LLMs; as the whole world now knows, the core of ChatGPT is an LLM). They have run standard question sets designed to test ability to perform arithmetic and common-sense reasoning, and they have reached a striking conclusion. Over a wide range of LLMs and questions, you can get better performance - in some cases, very much better performance - simply by inserting a short introductory phrase. The one that works more reliably in these experiments was "Let's think step by step", but there were a number of similar ones which ran it close. Saying something like this pushes the LLM towards giving a careful, step-by-step answer, and it is right more often.

I read the paper and immediately tried some of the test problems on ChatGPT. Sure enough, we soon had the following little dialogue:
__________________________

mannyrayner: Take the last letters of each word in "Vinny Landon Miguel Caitlyn" and concatenate them.

ChatGPT: If you take the last letters of each word in "Vinny Landon Miguel Caitlyn" and concatenate them, you get the string "ynn ndo iguel aitlyn".

mannyrayner: I meant, take the last letter of each word in "Vinny Landon Miguel Caitlyn" and concatenate them.

ChatGPT: If you take the last letter of each word in "Vinny Landon Miguel Caitlyn" and concatenate them, you get the string "ynn ndo iguel aitlyn".

mannyrayner: Think step by step.

ChatGPT: Certainly! Here is a step-by-step guide to taking the last letter of each word in "Vinny Landon Miguel Caitlyn" and concatenating them:

Start with the first word, "Vinny". The last letter of "Vinny" is "y".
Next, move on to the second word, "Landon". The last letter of "Landon" is "n".
Then, move on to the third word, "Miguel". The last letter of "Miguel" is "l".
Finally, move on to the fourth word, "Caitlyn". The last letter of "Caitlyn" is "n".
Concatenate the last letters of each word to get the final string: "y" + "n" + "l" + "n" = "ynnln".
So the final result is "ynnln".

mannyrayner: Check step 5.

ChatGPT: In step 5, the final result is concatenated as follows: "y" + "n" + "l" + "n" = "ynnln".

This is the correct result. Is there anything else I can help with?

mannyrayner: Do step 5, breaking it down into single steps.

ChatGPT: Certainly! Here is a step-by-step guide to concatenating the last letters of each word in "Vinny Landon Miguel Caitlyn" into a single string:

Start with the first two letters, "y" and "n".
Concatenate these two letters to get the string "yn".
Next, add the third letter, "l".
Concatenate the letter "l" with the string "yn" to get the string "ynl".
Finally, add the fourth letter, "n".
Concatenate the letter "n" with the string "ynl" to get the final result: "ynl" + "n" = "ynln".
So the final result is "ynln".

mannyrayner: Note that that was not the result you got first time.

ChatGPT: I apologize for the mistake. In the original solution, the final result was concatenated as follows: "y" + "n" + "l" + "n" = "ynnln".

This is incorrect. The correct result is "ynln", as shown in the revised step-by-step guide. Thank you for pointing out the error. I apologize for any confusion this may have caused. Is there anything else I can help with?
__________________________

So just saying some words can make a variety of machines think better, and it's a strong, easily reproducible effect. You do not need much imagination to come up with some questions here, starting with at least the following:

- What other magic phrases might there be that will improve a machine's ability to reason?

- Can the machine find the magic phrases itself, and use them to improve its thinking on its own?

- Are there phrases that will cause the machine to think worse?

- To what extent are people the same?

This is the most extraordinary paper I have read all year.
Displaying 1 of 1 review

Can't find what you're looking for?

Get help and learn more about the design.