Russell Atkinson's Blog, page 88
September 12, 2016
Computer cipher solving – Lesson 6: interactive solvers
I’ve had a fun three days at the ACA convention in Dallas but I’m back and ready to finish up this topic. There is way too much involved in computer solving to cover it all in my blog, so this is going to be my last post on the subject for now. I just hope that I’ve given you enough inspiration and maybe some clear understanding of how you can solve ciphers with computers.
Perhaps the most valuable and easiest computer solving program you can write is an interactive solver that mimics what you would do to solve it on paper. For example, with a simple cryptogram, a good interactive solver will allow you to input the ciphertext, display it in a handy way and have a way to input a letter you want to substitute for another, such as typing the letters in text fields or by clicking a mouse on an alphabet, etc. When you tell it that G=A it will go through the program and display above (or below if you prefer) an A over every G. It will also display a key alphabet equivalence. The advantage of using a computer program is that it makes all the substitutions or transpositions immediately and error-free . It should allow you to delete or undo some action you’ve tried. Such a program avoids all the transcription errors and messy erasures and can provide some useful statistics such as the frequency count. Even if you have an automatic solver (like a hillclimber) for a cipher type, you should probably also have an interactive solver since the automatic ones aren’t always so automatic.
I include in this class programs that perform a single useful task. A good example would be a crib placer. Many puzzlers provide a crib (short plaintext segment) along with the cipher, but usually the location is not known. A crib placer can tell you all the places it can fit. Another example would be a program that computes useful statistics such as letter frequency count, index of coincidence, Normor score, and so forth.
Many serious programmers are used to working in a command-line environment. That’s fine if that’s what you like, but I really like graphic user interfaces (GUIs). A decent GUI will allow you to see plaintext in one color and ciphertext in another. It will allow a large or small font. It provides nicely labeled input and output boxes. It will allow you to interrupt or pause execution when things look promising. It will display everything in nice neat rows or columns. It can highlight things of interest such as a possible placement for a crib, or a count of the number of trial decrypts it has done. As I mentioned earlier, I’m not much of a programmer. I’ve never worked as one and I don’t know the languages that are now popular, but I know that I solve ciphers better when I can see things displayed in the way I want them displayed. Cut, paste, click the button, and watch it go. I’ve learned enough to do that.
So how do you write one? For each cipher it will be different, but most programmers will not have trouble figuring out how to do that. For non-programmers, though, you can get much of this functionality with programs you already have available such as spreadsheets, word processors, and, of course, an Internet full of interactive solvers.
The post Computer cipher solving – Lesson 6: interactive solvers appeared first on OnWords.
September 8, 2016
Computer cipher solving – Lesson 5: Cribs
I use cribs in cipher solving at least four ways: 1) Research; 2) Tetragram scoring; 3) Length scoring; 4) Restrictive coding.
Research: I use the crib to determine the subject matter of the plaintext. That allows me to guess other words in the text or recognize likely letter sequences or keys. For example, if the crib is “his beard,” I might be inclined to look for or at least recognize the words Lincoln or Hemingway in the plaintext or key. I can use the crib content to try to extend the crib. Google Ngrams, for example, will tell you what words most often follow a sequence of other words. Sometimes I find the full plaintext online, although I rarely do this except for Xenocrypts since I don’t want to spoil the fun of solving. For some of the toughest ones, though, it may be the only way. Pencil and paper solvers use cribs the same way.
Tetragram scoring: I mentioned this in an earlier lesson. At the beginning of the program I load the tetragram frequency data into an array. After doing that I add points for each tetragram that appears in the crib. In addition to making my hillclimber or other program recognize a better decryption, it has the advantage of not requiring significant additional run time. The extra points don’t have to be added in during each tetragram lookup, only once at the very start. This method has a minor drawback. Sometimes the program may tend to lock in to a false solution that happens to produce the crib, or some portion of the crib, but this is rare and is usually short-lived. You can always rerun the program without a crib being entered. It has another disadvantage: it will not recognize a close match if there are no full identical tetragrams in the matching section. For example, if the crib is “hisbeard” and a trial decryption produces “hixbeaqd” the tetragram scores will be the default ones and not recognize this as coming close to the crib.
Length scoring: I’ve found this to be a quite effective improvement to tetragram scoring, although they can be used together. Like tetragram scoring it has the advantage of not requiring any additional programming on individual ciphertexts, but unlike tetragram scoring, it does use up a bit of extra run time. It solves the problem I just mentioned in the previous paragraph. What I do is run the crib down the decryption and in each spot count the number of letters that are in the same place in both crib and decrypt. In the example above hisbeard and hixbeaqd have six letters in common. I then take the highest-scoring instance for the length of a decryption, 6 in this example. I typically take that number, subtract 3 (assuming it is at least 3), and square the result, then add that to my score. In this example it would add 9 points (6-3 squared) to the score, the equivalent of a high-scoring tetragram. I use this method mostly on cipher types that have longer cribs. It has a good ability to hold hillclimbers close when they get close. It works well with a wide variety of cipher types, but not as well on transposition types or combination tramp/sub types like Bazeries or Myszkowskis. Those types may have the crib letters in close proximity to each other, but not in the right order, or with an extra letter or two between. I’ve considered writing something that will give extra points for those situations, but I haven’t been industrious enough to do that yet.
Restrictive coding: This term refers to the use of information from the crib to restrict the search space or execution time on a solving program. It can take many forms. For example, if you know the crib and its placement, you can write code into your solver that ignores trial decryptions that do not have that crib in that location, thus saving the time of scoring them and the problem of high-scoring false solutions crowding out the correct solution. I have a polybius square program I use to produce possible keys for many cipher types like Bifids, Two-squares, Playfairs, etc. I have a section in the source code where I program in the various letter relationships that I learned from the crib placement, such as requiring specific letters to be in the same row or column, etc. Thus it can be used to search for keys, not just on trial decryptions. The obvious disadvantage is that it requires programming for each individual cipher that is attacked, which can be fun but also is subject to the usual frustrations of bugs in the code and the time it takes to get to a solution. It helps if you can restrict the area of the code that is modified to a single compact module so that you don’t have to find (and later find to undo) all the scattered modifications.
The post Computer cipher solving – Lesson 5: Cribs appeared first on OnWords.
September 7, 2016
Computer cipher solving – Lesson 4: Simulated Annealing
Hillclimbing has an inherent problem: the local maximum. I’ve described this in previous lessons. One way to deal with it is is simply to start over again with a new scrambled key repeatedly. Scryer calls that Shotgun Hillclimbing, I think because a shotgun scatters its shot randomly over a target surface.
Another approach is Simulated Annealing. If that Wikipedia description looks intimidating or unclear, I’ll simplify it as best I can. For our purposes in hillclimbing the idea is really pretty simple. When you compare a new key to the previous one, instead of swapping keys only when the new one is better, swap sometimes when it is the same or even a bit worse. At first you can do this a significant number of times, but as your loop progresses, you must gradually reduce the number of times this happens and the size of the downward gap you are willing to accept. For example you might start by randomly accepting 20% of the time swaps that score 5% worse. After a few hundred or a few thousand iterations, though, those numbers might be 3% worse 10% of the time, and so on. Eventually you will end up not accepting worse-scoring swaps at all. In other words, you will be doing straight hillclimbing at the end. The progression from many to few “bad” swaps is made by reference to a table and a variable called the Temperature (T). When T is high, many relatively large “bad” swaps are accepted, but as T is lowered, fewer such swaps are accepted. The table contains values used to specify the random breakpoint numbers for the scoring gap. Every X number of iterations T is lowered until eventually you are just hillclimbing. You will be required to experiment to see the range from high to low of T based on your cipher type and scoring method.
This approach allows you to jump around the landscape a lot more at the beginning trying to get onto the right mountain before settling in on a steady climb up. The improvement is quite marked for some ciphers. After a while, though, if no solution comes, you will probably need to shoot that shotgun again and start over with a newly scrambled key and T boosted back to a high number.
My description of simulated annealing may not comport with a strict mathematical description. There are other, similar methods that use other terms for technical reasons but I lump them all together as simulated annealing. The concept is the same: allow your program to make some key changes that appear bad at first, but as you improve your result, limit those allowances.
The post Computer cipher solving – Lesson 4: Simulated Annealing appeared first on OnWords.
September 6, 2016
Computer cipher solving – Lesson 3: scoring decryptions
A hillclimbing program needs to know when a decryption is better than another. So do other types of attacks. So your program swaps two letters of a key and decrypts, but the decryption looks like gibberish to the naked eye, just as the previous one did using the old key. Which key do you save? There are several methods. I mentioned locating the crib in the previous lesson. But the crib and other normal text is likely to appear only at the end of a successful hill climb, i.e. as you near the top. A better approach to start with is to use statistics.
There are various statistical measures that can be used. They include the Index of Coincidence, the Digraphic Index of Coincidence, the Normor score, word list scoring (i.e. counting how many words appear in the trial decryption, or what percentage of the letters are contained in the words that appear), and others. But the most useful measure I have found is n-gram (or n-graph) frequency scoring. An n-gram is a sequence of n adjacent characters where n is an integer. I have successfully used digram and trigram scoring in the past, especially when I was using Turbo Pascal on a 16-bit machine and the language was not capable of a full tetragram array structure. There were workarounds, but the big breakthrough in efficiency and effectiveness came with the advent of 32-bit (and now 64-bit) machines capable of holding frequency data for tetragrams, i.e. a 27x27x27x27 data structure. I include spaces as well as the 26-letter alphabet in my data, although most others I know in the ACA do not, so their arrays are a bit smaller. I have not found that using 5- or 6-gram frequency data is better than tetragram, so that seems to be the sweet spot.
The basic idea is to examine the decryption taking it in overlapping 4-letter (or 4-character) sections and adding points to the score based on the frequency of the tetragram in normal English. For example, if your trial decryption were ‘frumqxing…’ you would look up the score for “frum” then add the score of “rumq”, “umqx”, etc. until done. Although these are not words, some of them will have a non-zero frequency in English and thus some score. For example “frum” appears in the phrase “cupofrum” and the word “frumpy.” So how do you know what score to give each tetragram? There are tables of data out there on the Internet if you look hard enough, but I recommend collecting your own data. To translate raw frequency data into points you will have to decide your own method. I know some people add the logarithms of the frequencies. (Adding the actual frequencies tends to overweight the most frequent tetragrams). Since I do my cipher solving almost exclusively on ACA ciphers, I use my own collection of solved ACA ciphers to collect the data and I give each observed tetragram a score of 1 to 9. I don’t have a fixed algorithm for where the breakpoints are; I just picked ones that seemed to divide up the set into useful-sized chunks. Such data is available for other languages, too. Obviously the source of the data should match the type of text you expect, not only in the language used, but whether it is technical, military, dialogue, etc.
These methods are not mutually exclusive. When scoring a decryption you can use tetragram frequencies but also add some points if the crib appears, or you can boost the score of the tetragrams that appear in the crib. I’ll discuss use of the crib in scoring more in a later lesson. Use of the IC or DIC in combination with n-gram scoring sometimes is helpful, too.
September 5, 2016
Computer cipher solving – Lesson 2: Hillclimbing continued
Hillclimbing sounds fairly straightforward, but there are issues to be aware of. For example, step 1 is to pick a random key. Why not use a standard key, such as the regular alphabet A-Z for a cryptogram? After all, you end up swapping letters around randomly anyway. Actually, you can do this but it will usually take longer for statistical reasons too complicated to explain, and you may well get stuck on the same local maximum every time. See the last paragraph for more on this. More important is that you modify the key (step 4) randomly. For example, you cannot just swap every letter with every other letter in an orderly fashion, i.e. first with second, then first with third, fourth, etc. That first comparison you made will only be a valid test for that one arrangement of all the other letters. If you were to compare those same two letters in the same two places after many of the other letters have been swapped around, they may not produce the same result. If you don’t want to take my word for it, just try it yourself and you’ll see that choosing two letters randomly works better.
Another issue is how to modify the key. I’ve used swapping two letters in the key as an example, and that works fine for simple substitution ciphers because we know what the key is – 26 different letters, A-Z, in some order. But a different type of modification may be necessary for other cipher types. For a Playfair or other polybius-square based key you may need to swap entire columns or rows sometimes instead of letter pairs or use a mix of methods. Sometimes making a three-letter switch works better. For a Myszkowski, swapping doesn’t work because the key contains an unknown number of repeated letters. You might need to pick a random letter in the right range (but not necessarily another key letter) and replace one of the key letters for step 4. You need to experiment and adapt to the specific cipher type.
A very important issue is how to tell if you’re program is producing better keys. Metaphorically speaking, what do you use for your altimeter as you climb the hill? For that matter, how do you know that you’ve solved the cipher? This method produces thousands or even millions of decryptions in a matter of minutes. You can’t examine them all by eye and make a personal judgment call. There are several useful methods. One simple method applies when you have a crib, i.e. bit of known plaintext. Suppose you know the word “nuclear” appears in the text of the message, but you don’t know where. That’s called a crib. You can test your decryptions by checking to see if the word nuclear appears in the decryption and either save or display those decryptions and the keys that produced them. That might work pretty well with a short simple cipher like a Baconian, since if the crib is produced, usually the rest of the decryption will also be correct or close enough that you can figure it all out. But for other types this isn’t enough. You can produce that one word many ways and the rest of the decryption may be useless gibberish. Not only that, you may have the entire solution almost perfect but for the word “nuclear” misspelled as “unclear.” Your program will skip over that and never show it to you since the crib isn’t there. What you need for hillclimbing and most other forms of computer attack is a scoring method. You need to give points for improvements. For step 6 you will save a key if it produces a decryption with a higher score and reject it if it doesn’t. I’ll discuss in another lesson how to do this.
Before we leave hillclimbing, though, there is another big issue to deal with: local maxima, i.e. those incorrect decryptions that can’t be improved further by the incremental changes to the key. Remember how I mentioned that the paratrooper can end up on the top of the wrong hill or rise and not be able to climb any further? That happens constantly in hillclimbing, which is why I put in step 8. You can simply give up and go back and scramble your key completely and start over. If you always start with a straight A-Z alphabet or other “standard” key, that’s like always dropping onto the same exact starting point, the small hill next to the correct mountain and you will never get to the radar station by climbing the hill. Start over with a random key. Computers can repeat a process millions of times very quickly, so this method generally works pretty well, but there are more sophisticated, more effective methods of getting off, or avoiding altogether, the hill. That, too, is a topic for a future lesson.
September 4, 2016
Computer cipher solving – Lesson 1: Hillclimbing
Since I’m going to the ACA convention in a few days, cryptography is on my mind. I thought I’d depart from my regular book reviews and give a few brief lessons on computer solving of ciphers. I only apply these methods to simple ACA ciphers like cryptograms, Playfair, etc., but the principles and general techniques apply to many sophisticated encryption methods. I’ll only use pseudocode since programmers use many different languages, and I only know a bit of some older types (BASIC, Pascal, Delphi). First, I’ll discuss a common method of attack: hillclimbing.
Before I describe how to do it, you’ll understand it better if you know why it’s called hillclimbing. Imagine you’re a paratrooper who is dropping onto an enemy’s mountain range. Your mission is to get to the top of the mountain to disable his radar station and anti-aircraft guns. You parachute down in the dead of night. It’s foggy and pitch black. You can’t turn on a light and can’t see exactly where you are or which direction the mountaintop is. Your only tool is a very sensitive altimeter with a lighted screen. You decide just to go up. You measure your altitude and then take a step, then look at it again. If you’re higher, you repeat the process from there. If you’re lower or the same altitude, you step back to where you were and step a different direction. By always going higher you will eventually reach the top. But the top of what? You may be on a foothill or ridge or side bluff or even the wrong mountain. After all, the mountain isn’t a smooth cone rising from a smooth plain. However, you aren’t the only paratrooper. If hundreds of your fellow paratroopers land all over the area, scattered widely, one or more is bound to land close enough to the top so that this method of always climbing will get him to the radar station. That’s the principle behind hillclimbing. Take a random shot at a solution, and improve it gradually until you can’t improve it any more. Do that many times until one of the attempts produces a solution.
I’m assuming you know what type of cipher you are attacking, or at least think you know, have a cipher to decrypt, and have a decryption engine for it. All you need is the right key. Hillclimbing code will:
Pick a random key
Decrypt using that key
Measure how good the decryption is ( a future lesson will discuss how to do this)
Make a small change in the key (e.g. swap two letters)
Repeat steps 2 and 3
If the new key produces a better decryption, keep it, otherwise keep the first one
Repeat from step 4 until you get no more improvement
Go back to step 1 and repeat the whole process
The process is a looping procedure that ends only when you stop it or it meets some test or limit you have programmed into it. If you’re lucky, or good, you stop because you have found a solution. That’s it for today. You now understand the idea behind hillclimbing.
New American Cryptogram Association (ACA) website
The ACA just launched a redesigned website. The main domain name hasn’t changed but the files are in new directories and many have been renamed, so you cipher fans may encounter broken links. Please update your own if you have any on your puzzle caches, etc. Here’s the URL for the new Cipher Types page: http://www.cryptogram.org/resources/cipher-types/
Speaking of the ACA, I will be attending the convention in Irving, Texas next weekend and presenting a talk on a method for solving Foursquare ciphers. I hope to see some of my long-time ACA friends and make some new ones.
August 31, 2016
100 Likes
My Facebook Cliff Knowles Mysteries page has reached 100 likes. A big thank you to all my fans.
August 29, 2016
The Next Pandemic by Ali S. Khan
The Next Pandemic: On the Front Lines Against Humankind’s Gravest Dangers by Ali S. Khan
My rating: 5 of 5 stars
This memoir by a soldier of a different kind made me appreciate all the people who protect us from harm every day. As a retired FBI agent I am well aware how government officials and employees are sometimes revered without good reason but just as often disparaged, resented, or even reviled for being less than impossibly perfect. Dr. Khan is one of those adventurous epidemiologists who has spent a career charging into Ebola-infested regions of Africa, SARS hotbeds in Asia, and, closer to home, outbreaks of West Nile Virus, hantavirus, and many other threatening diseases here in the U.S. The book will fascinate anyone with an interest in science in general and the excitement and challenges of medical field work in particular. We all owe a great debt to Dr. Khan and his colleagues.
Most interesting to me, largely because of the FBI involvement, was his account of the anthrax attacks that hit Washington D.C., some media outlets, and a few other spots in the U.S. in 2001 right after the 9/11 World Trade Center attack. I have no qualms in saying the FBI bungled that investigation from day 1, while I still maintain that it is the finest law enforcement agency in the world. Khan is a Muslim American and was placed under suspicion briefly at an airport, an incident he recounts with surprisingly good humor, but the real miscarriage of justice was how the FBI quickly focused their investigation on an innocent scientist named Hatfill, who worked only with viruses (anthrax is a bacterium, not a virus), while allowing the actual perpetrator, an anthrax expert named Ivins, to inject himself into the investigation. It took seven years for them to finally come down hard on Ivins, who promptly committed suicide once they got the goods on him. He was clearly mentally disturbed yet held a sensitive position at a military base in bioweapons research. The author justifiably jabs the FBI hard, but later in the book describes the many missteps that various elements in the medical and public health infrastructure have made in many if not all of the outbreaks described. The reality is that when you don’t know the who, what, or how of some crisis, it’s easy to make mistakes and neither doctors nor FBI agents are immune. In all these cases, though, eventually the government agencies managed to contain the problem, whether criminal or epidemiological.
That’s something else I really appreciated about the book. He skewers so many myths, rumors, and outright lies about various epidemics or outbreaks that it makes one wonder if we can believe anything the news agencies report. Scary headlines or TV teasers bring readers and viewers so even the most ludicrous rumors get trumpeted without checking. I’ve certainly seen my share of inaccurate reporting about the FBI cases I worked.
The writing is very readable and lighthearted in style without seeming patronizing or lacking in seriousness. The tribulations of travel to and in third world countries is the subject of many humorous anecdotes. Khan has a ghost writer, William Patrick, who in this case is given credit on the front cover and even a photo credit on the inside back flap, something I like to see. As a writer myself I know that having a good story is not enough in an of itself; one needs good writing skills to make it come alive for the reader. Patrick deserves credit for a top-notch job. I greatly enjoyed this book.
August 28, 2016
Open Season by C.J. Box
Open Season by C.J. Box
My rating: 5 of 5 stars
C.J. Box has a new fan. This is a terrific mystery, full of good, solid detective work, a likeable main character, and a great setting. Joe Pickett is a Wyoming State game warden, a bit different for a mystery leading man, and he’s as decent and wholesome as you’re ever going to find between two book covers. He’s a family man with a lovely wife and two beautiful daughters. He’s surrounded by gun-toting louts, bureaucrats, and dilettante environmentalists, among others. One of those louts turns up dead on his woodpile in the middle of the night.
I’ve had the privilege of living in Wyoming very near the fictional town where Joe lives. My grandfather was a county sheriff there and when he wasn’t sheriff he sometimes led elk hunters into the mountains, too, so this setting and story line hit very close to home. Box has the local demographic and terrain absolutely nailed. I really like that Joe is no super-hero. He’s not a good shot with a pistol, although he wields a mean shotgun. He’s good-sized if I remember right, but he doesn’t get into fist fights because he’s too laid back and peaceable. He’s a straight arrow who knows it’s his duty to enforce the game laws and the endangered species act even when he doesn’t seem to think they’re all that sensible. He’s just a nice guy who loves the mountains and is trying to do his job and support his family.
There’s enough action and suspense to keep you reading (or listening, as I did), yet there was a minimum of gore and no cursing at all that I recall. The only real criticism I have is the author’s decision to make the murderer a pedophile, too. It wasn’t consistent with anything else in the plot and we had plenty of reason to hate him already without that. I’ve seen other authors do that with their first book. Hey, we get it, he’s the bad guy, you don’t need that fondling and nasty language.
The narrator, David Chandler, was excellent.


