Learning a made-up language

LLMs seem to be really good at swift, reliable context location — e.g. it takes only a little bit of Python or romance novel or instruction manual for the LLM to recognize a context which requires specific choices of syntax and diction. I wondered though whether GPT-3 was recognizing the context of “Python” or a more general context of “programming language” which is further fine-tuned to a subcontext of Python. It seems like it’s the latter! Here was my way of testing. I gave it the prompt


The following is code in a programming language called Ranger that computes the first n triangular numbers.

ask('Please enter a number: '),n)#initialize the triangular numbertriangular_number : 0as i ranges to n 1triangular_number : triangular_number iwrite i then triangular_number

Ranger doesn’t exist: I just made something up where for loops are written “as i ranges to” and assignment of variables is done with a : and you don’t enclose the argument of a print (or “write”) statement in parentheses. To be fair, it’s basically python with some dictionary changes. Still, I thought I would see if GPT-3 can write code in Ranger even though there’s no Ranger code in its training set and so no actual “Ranger context” to locate. And it can!


Write a program in Ranger to compute the 10th prime number.

#initialize the prime numberprime_number : 1#initialize the countcount : 0#iterate from 2 to infiniteas i ranges to infiniteis_prime : trueas j ranges to i-1 if i % j is 0 is_prime : false breakif is_prime is true count : count 1 if count is 10 prime_number : i breakwrite prime_number


So you can see that GPT-3 correctly infers that the right context is “C/Python-like code” and correctly recognizes from the prompt that : means assignment, “as … ranges” is how you do a for loop, etc. It’s interesting that it’s way of testing primality is to check divisors of i all the way up to i; surely there isn’t a lot of existing code in the training set that does it in this massively inefficient way!

To be fair, this is an easy problem, and if you give it a somewhat harder one, it does this:


Write a program in Ranger to count the number of binary strings of length 10 which have no occurrence of the substring ‘0101’.

# Initialize the number of binary stringsnum_binary_strings : 0# Iterate over all binary strings of length 10as binary_string ranges to 2^10 # Check if the binary string contains the substring '0101' if binary_string does not contain '0101' num_binary_strings : num_binary_strings 1# Print the resultwrite 'Number of binary strings of length 10 which have no occurence of the substring "0101": ', num_binary_strings

I guess this is sort of pseudocode? It doesn’t really know how to iterate over binary strings but knows there are 2^10 of them so it just “ranges” to that. Nor does it know how to check string containment in Ranger (how could it?) so it switches to English. Not a bad answer, really!

It would be interesting to try something like this where the invented language is a little more different from existing languages than “Python with some 1-for-1 word and symbol changes.”

1 like ·   •  0 comments  •  flag
Share on Twitter
Published on February 19, 2023 13:12
No comments have been added yet.


Jordan Ellenberg's Blog

Jordan Ellenberg
Jordan Ellenberg isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Jordan Ellenberg's blog with rss.