Learning a made-up language

Uncontested touchdown Learning a made-up language, II

Learning a made-up language

LLMs seem to be really good at swift, reliable context location — e.g. it takes only a little bit of Python or romance novel or instruction manual for the LLM to recognize a context which requires specific choices of syntax and diction. I wondered though whether GPT-3 was recognizing the context of “Python” or a more general context of “programming language” which is further fine-tuned to a subcontext of Python. It seems like it’s the latter! Here was my way of testing. I gave it the prompt

The following is code in a programming language called Ranger that computes the first n triangular numbers.
ask('Please enter a number: '),n)#initialize the triangular numbertriangular_number : 0as i ranges to n 1triangular_number : triangular_number iwrite i then triangular_number

Ranger doesn’t exist: I just made something up where for loops are written “as i ranges to” and assignment of variables is done with a : and you don’t enclose the argument of a print (or “write”) statement in parentheses. To be fair, it’s basically python with some dictionary changes. Still, I thought I would see if GPT-3 can write code in Ranger even though there’s no Ranger code in its training set and so no actual “Ranger context” to locate. And it can!

Write a program in Ranger to compute the 10th prime number.
#initialize the prime numberprime_number : 1#initialize the countcount : 0#iterate from 2 to infiniteas i ranges to infiniteis_prime : trueas j ranges to i-1 if i % j is 0 is_prime : false breakif is_prime is true count : count 1 if count is 10 prime_number : i breakwrite prime_number

So you can see that GPT-3 correctly infers that the right context is “C/Python-like code” and correctly recognizes from the prompt that : means assignment, “as … ranges” is how you do a for loop, etc. It’s interesting that it’s way of testing primality is to check divisors of i all the way up to i; surely there isn’t a lot of existing code in the training set that does it in this massively inefficient way!

To be fair, this is an easy problem, and if you give it a somewhat harder one, it does this:

Write a program in Ranger to count the number of binary strings of length 10 which have no occurrence of the substring ‘0101’.
# Initialize the number of binary stringsnum_binary_strings : 0# Iterate over all binary strings of length 10as binary_string ranges to 2^10 # Check if the binary string contains the substring '0101' if binary_string does not contain '0101' num_binary_strings : num_binary_strings 1# Print the resultwrite 'Number of binary strings of length 10 which have no occurence of the substring "0101": ', num_binary_strings

I guess this is sort of pseudocode? It doesn’t really know how to iterate over binary strings but knows there are 2^10 of them so it just “ranges” to that. Nor does it know how to check string containment in Ranger (how could it?) so it switches to English. Not a bad answer, really!

It would be interesting to try something like this where the invented language is a little more different from existing languages than “Python with some 1-for-1 word and symbol changes.”

View more on Jordan Ellenberg's website »

1 like · Like • 0 comments • flag

Published on February 19, 2023 13:12

No comments have been added yet.

Jordan Ellenberg's Blog

Jordan Ellenberg's profile
411 followers

Jordan Ellenberg isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.

Follow Jordan Ellenberg's blog with rss.

delete edit this post