Randall Hyde's Blog, page 2
March 13, 2021
The iMac Pro/M1
Recently I decided it was about time to translate my “Art of Assembly Language” book from the 80×86 CPU to the ARM processor. Of course, the first question I had to answer was “which ARM CPU to use?” There are a wide variety of ARM CPUs out there, with different instruction sets. There are 32-bit ARM microcontrollers that are very popular in embedded systems (such as the M0 Cortex and M4 Cortex CPUs, typically found in single-board computers such as the Teensy 4.x or the Raspberry Pico). There are tons of ARM-based 32-bit microcontroller boards out there; STM makes a huge number, Adafruit, Sparkfun, and Seeed Studio produce a bunch of them. Of course, the Raspberry Pi (prior to the Pi 3 & 4) used 32-bit CPUs (A11, amongst others). Today the Pi Zero and Zero W still use 32-bit CPUs. For this reason, the Raspberry Pi OS (a Linux variant) runs in 32-bit mode, even on the 64-bit Pi 3 & 4 machines. Tons of 32-bit ARM machines are still being sold today.
However, 32-bits is so…. 1990s. Ever since Apple introduced 64-bit CPUs in their iPhone and iPad lines (and Android followed within a few years), 64 bits has been “where it’s at.” Then, of course, Apple announced the Apple Silicon M1 series of CPUs whose “system on a chip” performance was blowing away contemporary x86 PCs. With all the buzz, it became clear to me that I had to leave 32-bits behind and concentrate on 64-bit ARM assembly language.
Mind you, I’m not interested in writing “The Art of ARM Assembly” strictly for the Apple Mx-series computer systems. There are a large number of 64-bit ARM machines laying around, all of which can be programmed using the 64-bit variant of the GNU “as” assembler (hereafter, “gas”). My current plans are to ensure that all the example programs will compile (assemble) and run on the following systems:
Mac systems based on the Mx series CPUs (e.g., the M1)Raspberry Pi 3 & 4 system running a 64-bit version of Pi OS or LinuxJetson Nano boards running a 64-bit version of LinuxPine 64/RockPro/PineTop 64-based systems running a 64-bit version of LinuxAs I write this, I’m currently receiving the final edits and layouts of my new book “The Art of 64-Bit Assembly Language” from No Starch Press (https://www.nostarch.com). This book is a rewrite of my earlier “The Art of Assembly Language” using 64-bit x86-64 assembly language on a Windows PC (with the Microsoft MASM assembler). As the chapters come in for the x86 book, I’m going to rewrite them using ARM assembly language. Now I already own a couple Pi 3 & 4 systems, the Pine 64 systems (Pine 64, RockPro 64, a Pine Top, and a Pine Top Pro), and a Jetson Nano board. I also own an iPhone and an iPad so I can test ARM assembly language running on those devices. The only thing missing was an Apple Mx-based system.
My original plan was to wait for the next system to arrive. My 16″ MacBook Pro has 64 GB RAM; my iMac Pro also has 64 GB RAM. The M1 Mini, with a maximum of 16GB is a bit anemic for my tastes. I feel that 32GB is the smallest “maximum RAM complement” a system should have. However, I really can’t afford to wait for the M2 (or whatever it will be) to arrive, so I broke down and bought a 16GB/2TB M1 Mac Mini so I could get started on the book.
The only problem with the M1 Mini is that I just don’t have the desk space for yet another Apple computer. I have an iMac Pro sitting on my desk already. Other tables around by workspace are filled with electronics gear and instrumentation (I develop PCBs and circuit boards for the nuclear industry). Furthermore, all my authoring tools run under Windows/Parallels (Adobe Framemaker and Canvas X being the primary ones). Generally, I’m running one or two Parallels windows on my iMac Pro (hence the need for 64GB RAM) while working on a book–one for Framemaker and one for compiling and running programs (assuming I’m not compiling and running them under macOS). Trying to juggle a separate computer on my desk for code development isn’t practical.
The solution was quite simple: the iMac Pro/M1. I’m already used to juggling multiple “computers” on my desktop (I have 14 different Parallels setups on my iMac Pro, allowing me to run various versions of Windows, macOS, Linux, and FreeBSD); why not just add another window for an ARM-based macOS? Of course, Parallels cannot emulate an ARM (M1) running macOS (nor would I want to use it, if it did), but it is possible to add a window on the iMac Pro providing access to my M1 Mini using Apple’s Remote Desktop application (about $80 on the App store). The reviews were terrible (1.8 out of 5 stars), so I was a bit concerned about spending this kind of money on the application only to find that it doesn’t do the job. However, I bought it and installed it, and it works sufficiently well for my purposes that I’m not going to complain too loudly about it. Yes, the screen refresh is a teensy bit slow; I wouldn’t want to edit video or play games on the M1 while operating from the iMac Pro, but for code development purposes it works just fine.
Some might question the sanity of using the M1 as a peripheral computer for the iMac Pro. After all, don’t all the benchmarks show the M1 blowing away Intel machines (like the several-year old Xeon CPU in the iMac Pro)? Well, the iMac Pro still has many advantages over the M1 Mini:
The iMac Pro has an incredibly beautiful screen. You have to spend a lot of money to get a screen this good.The iMac Pro, with an x86-64 CPU, runs x86-based operating systems (like Windows, Linux, and Free BSD). Sure, most of those OSes have ARM variants, but often the applications are only available on the x86 (especially Windows).The iMac Pro has 64GB memory (need I say more?). Granted, I use most of that memory running Parallels VMs (which aren’t available on the M1 Mini, so it doesn’t need quite as much memory), but there are some things I do where 16GB just isn’t quite enough.The iMac Pro is running a lot of commercial software that I don’t feel like paying to upgrade to run on the M1. For example, cheap guy that I am, I’m still running Adobe CS6 (in a Parallels VM running Yosemite) because I’m morally opposed to subscription software. For the little that I use Photoshop and GoLive these days, it’s not worth the money to upgrade (especially to a subscription-based system). Also, I have lots of orphaned software that will never run on an M1 (as it’s 32-bit and won’t even run on the latest macOS).Of course, the M1 Mini has lots of advantages as well. For example, it supposedly faster than the iMac Pro (don’t know, haven’t actually run any applications where the speed difference was important). The main advantage, to me, is that it has an ARM-based CPU which executes 64-bit ARM assembly code, so I can develop and test ARM assembly code for my book.
So my ultimate goal was to create an iMac Pro machine capable of compiling, running, and testing ARM assembly language code. Here’s how I did it:
I set up the M1 Mini, updated the OS, and did all the normal stuff you do with a new machine. I did not bother copying any of my applications to the M1, as I’m perfectly happy running most of my code on the iMac Pro.I installed Xcode on the M1 Mini (to gain access to the GNU gas assembler and other tools).I installed Apple’s Remote Desktop application on my iMac Pro.I connected the M1 Mini to my local Ethernet (thereby connecting it to the iMac Pro).I booted the M1 Mini and started the Remote Desktop app. Logged on to the M1 Mini from the Remote Desktop Window.On the M1 Mini, I used to Go>Connect to Server Finder menu option to connect to my shared drive on the iMac Pro (where I keep all my data, including my book files and source code).Voila! I’m in business developing source code for The Art of ARM Assembly Language.The cool thing about this approach is that I can now use different writing and development tools across multiple machines and operating systems. For example, I generally prefer editing program text using the CodeWright text editor under Windows (a really ancient program, from the 1990s) or BBEdit under macOS (versus Xcode, which is a bit bloated and not the most appropriate tool for writing ARM assembly language). Because I’ve got Windows running in one (or two) windows, macOS/x86 running on the iMac Pro, and macOS/ARM (M1) running in another window, all sharing the same information on my “data” network drive, it’s trivial to switch between systems, applications, and CPUs. All using my original, comfortable, workflow.
Although I cannot recommend anyone getting an iMac Pro to create their own iMac Pro/M1, this combination does demonstrate how to upgrade a (now discontinued) system. Of course, there is nothing iMac Pro-specific about my setup. You could pull this same trick with any iMac, or any Mac for that matter. Someday, of course, we’ll see Apple Silicon Mx-series CPUs in the iMac form factor. Until then, this is a convenient way to get the power of an M1 running on an existing x86 iMac system.
Some additional comments:
What about Windows on ARM?
I mentioned a set of ARM-based machines I intend to support when writing The Art of ARM Assembly. Noticeably absent from this list was Microsoft’s ARM-based Windows machines, such as the Surface X. I personally have nothing against Microsoft (I actually own two x86-based Surface machines). The problem with trying to support Microsoft ARM machines is that Microsoft uses their own assembler with a sufficiently different syntax from GAS that it would make it very confusing to include such code in the book containing GAS code. If I get my hands on a Surface X (or other MS ARM machine), I might add an appendix that describes the differences and put the MS-compatible source code on line. However, given the cost and low acceptance of MS-based ARM machines in the marketplace, this won’t be a very high priority.
What about 32-bit code?
Why not teach both 32-bit and 64-bit ARM assembly in the same book? The problem with that theory is that the 32-bit instruction set is substantially different from the 64-bit instruction set (it’s not like the x86 where you rename a few registers and you’re good to go). Attempting to pack both instruction sets into the same book would simply lead to confusion on the part of beginning assembly language programmers. So 32-bits, because it’s the “old fashioned” stuff, got nixed.
February 5, 2021
The 65000: Yet Another 6502 Dream Machine
The 6502 was the very first microprocessor I ever programmed. Way back in 1978 I was working on an Apple II programming embedded applications on a “SuperKIM” 6502-based development board. All in assembly language, of course. Back then I wrote a fairly-well received assembler for the Apple II call “LISA: Lazer’s Interactive Symbolic Assembler.” I wrote lots of articles about the 6502 in various magazines of that era. Even wrote a book on the subject (see https://www.randallhyde.com/AssemblyL...). Needless to say, I had invested a lot in the good old 6502.
Then the Mac and PC came along. I stuck with the 65xx family up until the day the Apple // GS was finally put out to pasture by Apple (every now and then I dig my // GS out of storage and start it up; then immediately put it back when I’m reminded how slow it is). I haven’t written a line of 65xx code in over 30 years (as I write this). But there’s still a soft spot in my heart for that little old device.
Every now and then I wonder “gee, how would I design a 65xx device today for an embedded system?” I’m not talking about building the latest and greatest 64-bit CPU, mind you, just something that would sit somewhere between an Atmel AVR and and ARM running on an Arduino-class SBC.
Finally, I got bored enough with work that I decided to take the day off and work through the encoding for a 65xx-class machine. First, I thought I’d call it the 65020 (seemed like a cool play on the 6800->68000 and 68020 CPUs of years gone by). Naturally, somebody beat me to that name with their own little mind experiment. So I went with the more boring name of “65000.”
Well, without further ado, here’s a link to the page I created for this mental toy:
https://www.randallhyde.com/FunProjects/65000/65000.html
Now I’ll have to think about writing an emulator and assembler for it, or maybe really go crazy and figure out how to create a live piece of hardware using an FPGA like the Sparkfun/Alchitry Au+. In any case, enjoy.
January 17, 2021
CISC vs. RISC: The Definition
The term RISC was first coined in the early 1980s. RISC architectures were a reaction to the ever-increasing complexity in architecture design (epitomized by the DEC VAX-11 architecture). It rapidly became the darling architecture of academia and almost every popular computer architecture textbook since that period has trumpeted that design philosophy. Those text books (and numerous scholarly and professional papers and articles) claimed that RISC would quickly supplant the “CISC” architectures of that area offering faster and lower-cost computer systems. A funny thing happened, though, the x86 architecture rose to the top of the performance pile and (until recently) refused to give up the performance throne. Could those academic researchers have been wrong?
Before addressing this issue, it is appropriate to first define what the acronyms “RISC” and “CISC” really mean. Most (technical) people know that these acronyms stand for “Reduced Instruction Set Computer” and “Complex Instruction Set Computer” (respectively). However, these terms are slightly ambiguous and this confuses many people.
Back when RISC designs first started appearing, RISC architectures were relatively immature and the designs (especially coming out of academia) were rather simplistic. The early RISC instruction sets, therefore, were rather small (it was rare to have integer division instructions, much less floating-point instructions, in these early RISC machines). As a result, people began to interpret RISC to mean that the CPU had a small instruction set, that is, the instruction set was reduce. I denote this interpretation as “Reduced (Instruction Set) Computer” with the parenthesis eliminating the ambiguity in the phrase. However, the reality is that RISC actually means “(Reduced Instruction) Set Computer” — that is, it is the individual instructions that are simplified rather than the whole instruction set. In a similar vein, CISC actually means “(Complex Instruction) Set Computer”, not “Complex (Instruction Set) Computer” (although the latter is often true as well).
The core concept behind RISC is that each instruction doesn’t do very much. This makes it far easier to implement that instruction in hardware. A direct result, of this, is that the hardware runs much faster because there are fewer gates decoding the instruction and acting on the semantics of the instruction. Here are the core tenets of a RISC architecture:
A load/store architecture (software only accesses data memory using load and store instructions)A large register bank (with most computations taking place in registers)Fixed-length instructions/opcodes (typically 32 bits)One instruction per cycle (or better) execution times (that is, instructions cannot be so complex they require multiple clock cycles to execute)Compilers will handle optimization tasks, so there is no worry about difficult-to-write machine code.Early RISC designs (and, for the most part, the new RISC-V design) stuck to these rules quite well. The problem with RISC designs, just as what happened with CISC before them, is that as time passed the designers found new instructions that they wanted to add to the instruction set. The fixed-size (and very limited) RISC instruction encodings worked against them. Today’s most popular RISC CPU (the ARM) has greatly suffered from the kludges needed to handle modern software (this was especially apparent with the transition from 32 bits to 64). Just as the relatively well-designed PDP-11 architecture begat the VAX-11, just as the relatively straight-forward 8086 begat the 80386 (and then the x86-64), kludges to the ARM instruction set architecture has produced some very non-RISC-like changes. Sometimes I wonder if today’s ARM architecture would be viewed with similar disdain to the CISC architectures of yesterday by those 1980s researchers. This is, perhaps, the main reason the RISC-V architecture has given up on the fixed-instruction-length encoding tenet–it make it impossible to cleanly “future-proof” the CPU.
The original premise with RISC is that you design a new, clean, architecture (like the RISC-V) when time passes and you need something better than the 30-year-old design that you’ve been using (i.e., the ARM). Of course, the big problem with starting over (which is why the x86 has been King for so long) is that all that old, wonderful, software won’t run on the new CPUs. For all it’s advantages, it’s unlikely you’ll see too many designs flocking to the RISC-V CPU anytime soon; there’s just no software for it. Today, RISC-V mainly finds use in some embedded projects where the engineers are writing all the software for their device; they don’t depend on thousands or millions of “apps” out there for the success of their product.
When RISC CPUs first became popular, they actually didn’t outperform the higher-end CISC machines of the day. It was always about the promise of what RISC CPUs could do as the technology matured. However, those VAX-11 machines (and the Motorola 680×0 and National Semiconductor 32000 series) machines still outperformed those early RISC machines. FWIW, the 80×86 family *was* slower at the time; it wasn’t until the late 1980s and early 1990s that Intel captured the performance crown; in particular, the Alpha and Sparc CPUs were tough competitors for a while. However, once the x86 started outrunning the RISC machines (a position it’s held until some of the very latest Apple Silicon SOCs have come along), there was no looking back. RISCs, of course, made their mark in two areas where Intel’s CISC technology just couldn’t compete very well: power and price. The explosion in mobile computing gave RISC the inroads to succeed where the x86 was a non-starter (all that extra complexity costs money and watts; the poison pill for mobile systems). Today, of course, RISC owns the mobile market.
In the 1980s and 1990s, there was a big war in the technical press between believers in CISC and RISC. All the way through the 2000s (and even 2010s), Intel prowess kept the RISC adherents at bay. They could claim that the x86 was a giant kluge and its architecture was a mess. However, Intel kept eating their lunch and producing faster (if not frighteningly expensive) CPUs.
Unfortunately, Intel seems to have lost their magic touch in the late 2010s and early 2020s. For whatever reason they have been unable to build devices using the latest processes (3 to 5 nm, as I write this) and other semiconductor manufacturers (who build RISC machines, specifically ARM) have taken the opportunity to zoom past the x86 in performance. Intel’s inability to improve their die manufacturing processes likely have nothing to do with the RISC vs CISC debate, but this hiccough on their part may be the final nail in the coffin for the x86’s performance title and is likely to settle the debate, once and for all.
January 14, 2021
Hype 4 Pro
I created my first web page back at the dawn of the World-Wide Web while teaching at UC Riverside in the 1990s. Back then, all the coding was done strictly in HTML using a plain text editor. Twenty-five years later, my web pages still looked like they are hand-written in HTML, despite the fact that I’d switched to Adobe GoLive and then Dreamweaver along the way. My website is currently too big to do a complete overhaul on (25 years of kludging stuff onto it). However, after separating www.randallhyde.com from plantation-productions.com, I decided I ought to start cleaning it up, starting with the main web pages.
As I wanted a tool to support HTML 5 animations (to support some of my books with tutorials), I settled on Hype 4 Pro (https://tumult.com/hype/pro/). While I’m busy proving that a great piece of software is no match for my poor graphic design skills, I have been very pleased with how well Hype has allowed me to lay out the pages the way I want, rather than the way HTML insists on displaying them despite my best intentions.
Though I’ve yet to get into the animation features, I can definitely rate Hype 4 Pro as “very good” at this point.
Cheers,
Randy Hyde
January 13, 2021
Introduction
I first started putting information on the Web way back in the early-to-middle 1990’s while teaching Computer Science at UC Riverside. Back then you hand-coded Web pages in HTML with a text editor. Internet connections were slow, so you avoided graphics and other objects that took too long to download. My, how things have changed.
At the time, I create “Webster” (www.webster.cs.ucr.edu). That website became quite well known in the assembly language programming community as I hosted a copy of my course notes “The Art of Assembly Language Programming” on that site. In the early 2000’s No Starch Press approached me about publishing the book. At that point, it was a bit too large, so a lot of material was cut. That material became the bulk of “Write Great Code, Volume 1: Understanding the Machine.” Volume 2 (“Thinking Low-Level, Writing High-Level”) followed shortly, with a promise of a soon-to-arrive Volume 3 on software engineering.
About that time (2004-2005), however, I started working with General Atomics on new software for their Triga Nuclear Reactor digital consoles. For the next 10-12 years I was kept extremely busy working on that software (and traveling all over the world to install and deploy it). Volume 3 fell by the wayside.
I didn’t completely forgo work on my books. Around 2010 I did a second edition of “The Art of Assembly Language.” However, it wasn’t until around 2017-2018 that I had time to work on the Write Great Code series. Unfortunately, by then, computer technology had progressed to the point that Volumes 1 and 2 were in desperate need of an update. So I cranked out the second editions of the first two volumes (which finally appeared in 2019. Volume 3 turned out to be a whole lot larger than I initially expected; it got broken up into three volumes on it’s own: Volume 3: Engineering Software covering models, methodologies, and system documentation, Volume 4: Great Design, covering software analysis and design, and Volume 5: Great Coding that covers the actual construction process (this is all in addition to Volume 6: Testing, Debugging, and Quality Assurance, which was always planned for in the series).
Volume 3 finally hit the shelves in late 2020.
As I write this, I actually have four separate books in the editorial process:
First of all, it was time to do something about “The Art of Assembly Language.” The original version of this book first appeared in 1989 (as a set of notes for a course I was teaching at Cal Poly Pomona). That book was geared towards the 8-/16-bit 8088 microprocessor in the original IBM PC. The first published edition (No Starch, 2003) covered the 32-bit variants of the x86 using the High-Level Assembler. Within 10 years, 64-bit variants of the x86 were becoming popular. However, porting HLA to support 64 bit code was a huge undertaking (it took about four man-years to get HLA working in the first place, plus many years of support thereafter). At the time, my attitude was “learn 32-bit code; after that 64-bits is nearly trivial.” This was an okay attitude until new operating systems (especially Apple’s) stopped running 32-bit code altogether. Though HLA still ran on Windows (at least, as of Windows 10), the writing was on the wall, it was time to update “The Art of Assembly Language.” Unfortunately, the issue with updating HLA still remained. So, with a sad heart, I chose to abandon HLA and return to using Microsoft’s MASM assembler (the assembler used by the original version of the book back in the 8088 days). After considerable effort and rewriting, “The Art of 64-bit Assembly Language” was written. As I write this, it is in the middle of editing and technical review. It should head into production towards the middle of 2021.
The original class notes I wrote for “The Art of Assembly Language Programming” yielded a bit more than “The Art of Assembly” (No Starch Press) and “Write Great Code” (also No Starch Press). There was a ton of really advanced stuff that never made it beyond the electronic version appearing on Webster. At least, not until now. I’ve taken that material, undated it to 64 bits and modern versions of Windows, and have written “The Art of 64-bit Assembly Language, Volume 2.” I’ve completed the first draft of this book and it’s sitting in the editing queue at No Starch waiting for the production of Volume 1 to complete.
Since the very first days I began programming for a living (way back in 1977, actually), I’ve been working with embedded systems and programming hardware. Though I’ve taught lots of hardware classes at the University level, I’ve never written a hardware-related book. Until now. I just finished the first draft of “The Book of I2C” which is an in-depth treatment of the Inter-IC bus (also known as the I2C, or I-squared-C, bus). Anyone who has connected up hardware to an Arduino or Raspberry Pi has probably had to deal with this bus. Soon (figure 2022), No Starch will have a great reference for programmers dealing with I2C.
“Write Great Code, Volume 4: Great Design” is in the middle of the rough draft (about ⅓ done). I’ll have more to say about WGC4 as I get further into it. You can probably expect it around the middle to end of 2022.
Well, this is a long post so I’ll stop here.
Cheers,
Randy Hyde
Randall Hyde's Blog
- Randall Hyde's profile
- 16 followers
