Andrew Huang's Blog, page 4
July 31, 2024
Name that Ware, July 2024
The Ware for July 2024 is shown below.




Thanks again to jackw01 for contributing this ware! The last two images might be killer clues that give away the ware, but they are also so cool I couldn’t not include them as part of the post.
Winner, Name that Ware June 2024
The Ware for June 2024 is a hash board from an Antminer S19 generation bitcoin miner, with the top side heatsinks removed. I’ll give the prize to Alex, for the thoughtful details related in the comments. Congrats, email me for your prize!

I chose this portion of the miner to share for the ware because it clearly shows the “string of pearls” topology of the miner. Each chip’s ground is the VDD of the chip to the left. So, the large silvery bus on the left is ground, then the next rank is VDD, the next is VDD*2, VDD*3, and so forth, until you reach about 12 volts. The actual value of VDD itself is low – around 0.3V, if I recall correctly. Miners focus on efficiency, not performance, so the mining chips are operated at a lower core voltage to improve computational efficiency per joule of energy consumed, at the expense of having to throw more chips at the problem for the same computational throughput.
Along the bottom side of the image, you can see that the signals between chips cross the voltage domains without a level shifter. This is one of the things that I thought was kind of crazy, because there is a risk of latch-up whenever you feed voltages in that are above or below the power supply rails. However, because the core VDD is so low (0.3V), the ground difference between adjacent chips is small enough that the parasitic diode necessary to initiate the latch-up cascade isn’t triggered.
The whole arrangement relies on each chip maintaining its core voltage to under the latch-up threshold. But what controls that voltage? The answer is that it’s pretty much unregulated – if you were to say, run hashing on one chip but leave another chip idle, the core voltages will diverge rapidly, because the idle chip will have a higher impedance than the active chip but being wired in series, all the current of the active chip must flow through the idle chip, and thus the idle chip’s voltage rises until its idle leakage becomes big enough to feed the active chips in the chain.
Of course, the intention is that all the chips are simultaneously ramped up in terms of computational rate, so the energy consumption is equally distributed across the string and you don’t get any massive voltage shifts between ranks.
You’ll note the supplemental metal sheets soldered onto the power busses on the board. I had removed one of the sheets in the top right of the image. These bus bars are added to the board to reduce the V-I drop of the power distribution. Each string pushes 10’s of amps of current, so reducing the resistive losses of the copper traces by adding these bus bars in between chips measurably improves the efficiency of the hasher. If you look at a more zoomed-out version of the image, the distance between the chips actually increases as you go across the board. I’m not sure exactly why this is the case but I think it’s because one side is farther from the fans, and so they adjust the density of the chips to even out the air temperature as it makes its way across the board.
Anyways, at this massive current draw, any chip failure gets…”exciting”, to use a term of art. It turns out that a miner can operate with one busted chip in the chain. Since leakage goes up super-linearly with temperature and voltage, they basically rely on physics to turn the broken chip into a pass-through heating element. The voltage shift between ranks does go up slightly in that case, but apparently still by not enough to trigger latch-up.
That being said, latch-up is a big concern. Power cycling one of these beasts takes minutes — there is an interlock in the system that will wait a very long time with the power off to ensure that every capacitor in the chain has fully discharged. In order to prevent latch-up on a power-cycle, I imagine that every capacitor’s voltage has to be discharged to something quite a bit less than 0.3V. If you’ve ever poked around an “idle” board (powered off, but possibly with some live peripheral connector plugged in), you’ll know that even the tiniest leakages can easily develop 0.1V across a node, and 0.3V+ levels can persist for a very long time without explicit discharge clamps (I’ve seen some laptops designed with discrete FETs to discharge voltage rails internally so one can do a quick suspend/resume cycle without triggering latch-up). Since the design has no clamps to discharge the internal nodes (that I can see) I presume it just relies on the minutes of idle time on the reboot to ensure that the inherent leakage paths (which become exponentially less leaky as the voltages go down) discharge the massive amount of capacitance in the string of chips.
Of course, all these shenanigans are done in the name of efficiency and low cost. Providing a buck regulator per chip would be cost-prohibitive and less efficient; you can’t beat the string-of-pearls topology for efficiency (this is what all home LED lighting uses for a reason).
But, as a “regular” digital engineer who strives for everything to go to ground, and if ground isn’t ground you fix it so that the ground plane doesn’t move, it’s kind of mind blowing that you can get such a complicated circuit to work with no fixed ground between chips. The ground of each chip floats and finds its truth through the laws of physics and the vagaries of yield and computation-dependent energy usage patterns. It’s a really brilliant example that reinforces the notion that even though one may abstract ground to be a “global 0 voltage”, in reality, ground is always local and voltages only exist in relation to a locally measurable reference point – ground bounce is “just fine” so long as your entire ground bounces. If you told me the idea of stringing chips together VDD-to-GND without showing me an implementation, I would have vehemently asserted it’s not possible, and you’d have all sorts of field failures and yield problems. Yet, here we are: I am confronted with proof that my intuition is incorrect. Not only is it possible, you can do it at kilowatt-scales with hundreds of chips, tera-hashes of throughput, with countless deployments around the world. Hats off to the engineers who pulled this together! I have to imagine that on the way to getting this right, there had to be some lab somewhere that got filled with quite the puff of blue smoke, and the acrid smell that heralds new things to learn.
June 30, 2024
Name that Ware, June 2024
The Ware for June 2024 is shown below.

This one will probably be a super-easy guess for some folks, but the details of the design of this type of ware are interesting from an engineering standpoint. Some of the tricks used here are kind of mind blowing; on paper, I wouldn’t think it would work, yet clearly it does. I’ll go into it a bit more next month, when we name the winner. Or, if you’re already an expert in this type of ware and have circuit-level insights to share, be my guest!
Winner, Name that Ware May 2024
The Ware from May 2024 is a Generac RXSC100A3 100-amp automated load transfer switch. It senses when utility power fails and automatically throws a switch to backup power. Thanks to Curtis Galloway for contributing this ware; he has posted a nice write-up about his project using it. The total function of the ware was a bit difficult to guess because the actual load switch wasn’t shown – the relays don’t do the power switching; they merely control a much larger switch.
A lot of questions in the comments about why a 50/60Hz jumper. I think it’s because as a backup power switch, a deviation of the mains frequency by 10Hz is considered a failure, and thus, backup power should be engaged. I also think the unit does zero-crossing detection to minimize arcing during the switch-over. The system drives a pair of rather beefy solenoids, which can be seen to the left of the load switch in the image below.

And this is the inside of the 100-amp load switch itself:

For better images hit up Curtis’ blog!
I think Ben got pretty close in his analysis of the ware, noting the 50/60Hz jumper could be for mains failure detection. Congrats, email me for your prize!
An administrative note: my ISP upgraded the blog’s database server this past month. Because this blog was created before UTF-8 was widely adopted, the character sets got munged, which required me to muck around with raw SQL queries to fix things up. I think almost everything made it through, but if anyone notices some older posts that are messed up or missing, drop a comment here.
I’m also working on trying to configure a fail-over for my image content server (bunniefoo.com), which is still running on a very power-efficient Novena board of my own design from over a decade ago. It’s survived numerous HN hugs of death, but the capacitor electrodes are starting to literally oxidize away from a decade of operating in salty, humid tropical air. So, I think it might be time to prep a fail-safe backup server — a bit like the load switch above, but for bits not amps. However, I think I’m more in my element configuring a 100A load switch than an IP load balancer. The Internet has only gotten more hostile over the past decades, so I’m going to have some learning experiences in the next couple of months. If you see something drop out for more than a day or so, give me a shout!
June 2, 2024
Formlabs Form 4 Teardown
Formlabs has recently launched the fourth edition of their flagship SLA printer line, the Form 4. Of course, I jumped on the chance to do a teardown of the printer; I’m grateful that I was able to do the same for the Form 1, Form 2, and Form 3 generations. In addition to learning a lot from the process of tearing down a single printer, I am also gaining a unique perspective on how a successful hardware startup matures into an established player in a cut-throat industry.
Financial interest disclosure: Formlabs provides me two printers for the teardown, with no contingencies on the contents or views expressed in this post. I am also a shareholder of Formlabs.
A Bit of Background
The past few years has seen a step-function in the competitiveness of products out of China, and SLA 3D printers have been no exception. The Form 4 sits at a pivotal moment for Formlabs, and has parallels to the larger geopolitical race for technological superiority. In general, Chinese products tend to start from a low price point with fewer features and less reliability, focusing on the value segment and iterating their way towards up-market opportunities; US products tend to start at a high price point, with an eye on building (or defending) a differentiated brand through quality, support, and features, and iterate their way down into value-oriented models. We now sit at a point where both the iterate-up and iterate-down approaches are directly competing for the same markets, setting the stage for the current trade war.
Being the first to gain momentum in a field also results in the dilemma of inertia. Deep financial and human capital investments into older technology often create a barrier to adopting newer technology. Mobile phone infrastructure is a poster child for the inertia of legacy: developing countries often have spectacular 5G service compared to the US, since they have no legacy infrastructure on the balance sheet to depreciate and can thus leapfrog their nations directly into mature, cost-reduced and modern mobile phone infrastructure.
In the case of 3D printers, Formlabs was founded in 2011, back when UV lasers were expensive, UV LEDs weren’t commercially viable, and full HD LCD screens (1080p) were just becoming mainstream. The most viable technology for directing light at the time was the galvanometer: a tiny mirror mounted on a motor that can adjust its angle with parts-per-thousands accuracy, tens of thousands of times a second. They invested in full custom galvanometer technology to create a consumer-priced SLA 3D printer – a technology that remained a gold standard for almost a decade, and powered three generations of Form printers.

Above is the light path architecture of the Form 1 and Form 2 printers. Both used a laser plus a pair of galvanometers to direct light in a 2-D plane to cure a single layer of resin through raster scanning.

The Form 3, introduced in 2019, pushed galvanometer technology to its limit. This used a laser and a single galvanometer to scan one axis, bounced off a curved mirror and into the resin tank, all in a self-contained module called a “light processing unit” (LPU). The second axis came from sliding the LPU along the rail, creating the possibility of parallel LPUs for higher throughput, and “unlimited” printing volume in the Y direction.
In the decade since Formlabs’ founding, LCD and LED technology have progressed to the point where full-frame LCD printing has become viable, with the first devices available in the range of 2015-2017. I got to play with my first LCD printer in 2019, around the time that the Form 3 was launched. The benefits of an LCD architecture were readily apparent, the biggest being build speed: an LCD printer can expose the entire build area in a single exposure, instead of having to trace a slice of the model with a single point-like laser beam. However, LCD technology still had a learning curve to climb, but manufacturers climbed it quickly, iterating rapidly and introducing new models much faster than I could keep up with.
Five years later and one pandemic in between, the Form 4 is being launched, with the light processing engine fully transitioned from galvanometer-and-laser to an LCD-and-LED platform. I’m not privy to the inside conversations at Formlabs about the change, but I imagine it wasn’t easy, because transitioning away from a decade of human capital investment into a technology platform can be a difficult HR challenge. The LPU was a truly innovative piece of technology, but apparently it couldn’t match the speed and cost of parallel light processing with an LCD. However, I do imagine that the LPU is still indispensable for high-intensity 3D printing technologies such as Selective Laser Sintering (SLS).

Above is the architecture of the Form 4 – “look, ma, no moving parts!” – an entirely solid state design, capable of processing an entire layer of resin in a single go.
I’m definitely no expert in 3D printing technology – my primary exposure is through doing these teardowns – but from a first-principles perspective I can see many facial challenges around using LCDs as a light modulator for UV, such as reliability, uniformity, and build volume.
As their name implies, LCDs (liquid crystal displays) are built around cells filled with liquid crystal. Liquid crystals are organic molecules; 5CB is a textbook example of an LC compound.

Above is the structure of 5CB, snagged from its Wikipedia page; a few other LC molecules I looked up share a similar structure. I’m no organic chemist, but if you asked me “do you think a molecule like this might interact with intense ultraviolet light”, my answer is “definitely yes” – look at those aromatic rings!
A quick search seems to indicate that LCDs as printer elements can have a lifetime as short as a few hundred hours – which, given that the UV light is only on for a fraction of the time during a print cycle, probably translates to a few hundred prints. So, I imagine some conversations were had at Formlabs on how to either mitigate the lifetime issues or to make the machine serviceable so the LCD element can be swapped out.
Another challenge is the uniformity of the underlying LEDs. This challenge comes in two flavors. The first problem is that the LEDs themselves don’t project a uniform light cone – LEDs tend to have structural hot spots and artifacts from things such as bond wires that can create shadows; but diffusers and lenses incur optical losses which reduce the effective intensity of the light. The second is that the LEDs themselves have variance between devices and over time as they age, particularly high power devices that operate at elevated temperatures. This can be mitigated in part with smart digital controls, feedback loops, and cooling. This is helped by the fact that the light source is not on 100% of the time – in tests of the Form 4 it seems to be much less than 25% duty cycle, which gives ample time for heat dissipation.
Build volume is possibly a toss-up between galvo and LCD technologies. LCD resolutions can be made cheaply at extremely high resolutions today, so scaling up doesn’t necessarily mean coarser prints. However, you have to light up an even bigger area, and if your average build only uses a small fraction of the available volume, you’re wasting a lot of energy. On the other hand, with the Form 3’s LPU, energy efficiency scales with the volume of the part being printed: you only illuminate what you need to cure. And, the Form 3 can theoretically print an extremely wide build volume because the size in one dimension is limited only by the length of the rail for the LPU to sweep. One could conceivably 3D print the hood of a car with an LPU and a sufficiently wide resin tank! However, in practice, most 3D prints focus on smaller volumes – perhaps due to the circular reasoning of people simply don’t make 3D models for build volumes that aren’t available in volume production.
Despite the above challenges, Formlabs’ transition to LCD technology comes with the reward of greatly improved printing times. Prints that would have run overnight on the Form 3 now finish in just over an hour on the Form 4. Again, I’m not an expert in 3D printers so I don’t know how that speed compares against the state of the art today. Searching around a bit, there are some speed-focused printers that advertise some incredible peak build rates – but an hour and change to run the below test print is in net faster than my workflow can keep up with. At this speed, the sum of time I spend on 3D model prep plus print clean and finishing is more than the time it takes to run the print, so for a shop like mine where I’m both the engineer and the operator, it’s faster than I can keep up with.

A Look around the Exterior of the Form 4
Alright, let’s have a look at the printer!

The printer comes in a thoughtfully designed box, with ample padding and an opaque dust cover.

I personally really appreciate the detail of the dust cover – as I only have the bandwidth to design a couple products a year, the device can go for months without use. This is much easier on the eyes and likely more effective at blocking stray light than the black plastic trash bags I use to cover my printers.

The Form 4 builds on the design language of previous generation Form printers, with generous use of stamped aluminum, clean curves, and the iconic UV-blocking orange acrylic tank. I remember the first time I saw the orange tank and thought to myself, “that has to be really hard to manufacture…” I guess that process is now fully mature, as even the cheapest SLA printers incorporate that design touchstone.
A new feature that immediately leaps out to me is the camera. Apparently, this is for taking time lapses of builds. It increases the time to print (because the workpiece has to be brought up to a level for the camera to shoot, instead of just barely off the surface of the tank), but I suppose it could be useful for diagnostics or just appreciating the wonder of 3D printing. Unfortunately, I wasn’t able to get the feature to work with the beta software I had – it took the photos, but there’s something wrong with my Formlabs dashboard such that my Form 4 doesn’t show up there, and I can’t retrieve the time-lapse images.
Personally, I don’t allow any unattended, internet-connected cameras in my household – unused cameras are covered with tape, lids, or physically disabled if the former are not viable. I had to disclose the presence of the camera to my partner, which made her understandably uncomfortable. As a compromise, I re-positioned the printer so that it faces a wall, although I do have to wonder how many photos of me exist in my boxer shorts, checking in on a print late at night. At least the Form 4 is very up-front about the presence of the camera; one of the first things it does out of the box is ask you if you want to enable the camera, along with a preview of what’s in its field of view. It has an extremely wide angle lens, allowing it to capture most of the build volume in a single shot; but it also means it captures a surprisingly large portion of the room behind it as well.

Kind of a neat feature, but I think I’ll operate the Form 4 with tape over the camera unless I need to use the time-lapse feature for diagnostics. I don’t trust ‘soft switches’ for anything as potentially intrusive as a camera in my private spaces.

The backside of the Form 4 maintains a clean design, with another 3D printed access panel.

Above is a detail of the service plug on the back panel – you can see the tell-tale nubs on the inner surface of 3D printing. Formlabs is walking the walk by using their own printers to fabricate parts for their shipping products.

The bottom of the printer reveals a fan and a massive heatsink underneath. This assembly keeps the light source cool during operation. The construction of the bottom side is notably lighter-duty than the Form 3. I’m guessing the all solid-state design of the Form 4 resulted in a physically lighter device with reduced mechanical stiffness requirements.

Inside the printer’s cover, we get our first look at the pointy end of the stick – the imager surface. Unlike its predecessors, we’re no longer staring into a cavity filled with intricate mechanical parts. Instead we’re greeted with a textured LCD panel, along with some intriguing sensors and mounting features surrounding it. A side effect of no longer having to support an open-frame design is that the optical cavity within the printer is semi-sealed, with a HEPA-filtered centrifugal fan applying positive pressure to the cavity while cooling the optics. This should improve reliability in dusty environments, or urban centers where fine soot from internal combustion engines somehow finds their way onto every optical surface.

One minor detail I personally appreciate is the set of allen keys that come with the printer, hidden inside a nifty 3D printed holder. I’m a fan of products that ship with screwdrivers (or screwdriver-oids); it’s one of the feature points of the Precursor hardware password manager that I currently market.
The Allen key holder is just one of many details that make the Form 4 far easier to repair than the Form 3. I recall the Form 3 being quite difficult to take apart; while it used gorgeous body panels with compound splines, the situation rapidly deteriorated from “just poking around” to “this thing is never going back together again”. The Form 4’s body panels are quite easy to remove, and more importantly, to re-install.
A Deep Dive into the Light Path
Pulling off the right body panel reveals the motherboard. Four screws and the panel is off – super nice experience for repair!

And below, a close-up of the main board (click for a larger version):

A few things come out at me on first glance.
First up, the Raspberry Pi 4 compute module. From a scrappy little “$35 computer” put out by a charity originally for the educational market, Raspberry Pi has taken over the world of single board computers, socket by socket. Thanks to the financial backing it had from government grants and donations as well as tax-free status as a charity, it was able to kickstart an unusually low-margin hardware business model into a profitable and sustainable (and soon to be publicly traded!) organization with economies of scale filling its sails. It also benefits from awesome software support due to the synergy of its charitable activities fostering a cozy relationship with the open source community. Being able to purchase modules like the Raspberry Pi CM with all the hard bits like high-speed DDR memory routing, emissions certification, and a Linux distro frees staff resources in other hardware companies (like Formlabs and my own) to focus on other aspects of products.
The next thing that attracted my attention is the full-on HDMI cable protruding from the mainboard. My first thought was to try and plug a regular monitor into that port and see what happens – turns out that works pretty much exactly as you’d expect:

In the image above, the white splotches correspond to the areas being cured for the print – in this case, the base material for the supports for several parts in progress.

Above is a view of the layer being printed, as seen in the Preform software.

And above is a screenshot showing some context of the print itself.
The main trick is to boot the Form 4 with the original LCD plugged in (it needs to read the correct EDID from the panel to ensure it’s there, otherwise you get an error message), and then swap out the cable to an external monitor. I didn’t let the build run for too long for fear of damaging the printer, but for the couple layers I looked at there were some interesting flashes and blips. They might hint at some image processing tricks being played, but for the most part what’s being sent to the LCD is the slice of the current model to be cured.
I also took the LCD panel out of the printer and plugged it into my laptop and dumped its EDID. Here are the parameters that it reports:

I was a little surprised that it’s not a 4K display, but actually, we have to remember that each “color pixel” is actually three monochrome elements – so it probably has 2520 elements vertically, and 4032 elements horizontally. While resolutions can go higher than that, there are likely trade-offs on fill-factor (portion of a pixel that is available for light transmission versus reserved for switching circuitry) that are negatively impacted by trying to push resolution unnecessarily high.

The LCD panel itself is about as easy to repair as the side body panels; just 8 accessible screws, and it’s off.

Another minor detail I really enjoyed on the LCD panel is the 3D-printed retaining clip for the HDMI cable. I think this was probably made out of nylon on one of Formlabs’ own SLS printers.

Turn the LCD panel assembly over, and we see a few interesting details. First, the entire assembly is built into a robust aluminum frame. The frame itself has a couple of heating elements bonded to it, in the form of PCBs with serpentine traces. This creates an interesting conflict in engineering requirements:
The resin in the tank needs to be brought to temperature for printingThe LCD needs to be kept cool for reliabilityBoth need to be in intimate contact with the resinFormlabs’ solution relies on the intimate contact with the resin to preferentially pull heat out of the heating elements, while avoiding overheating of the LCD panel, as shown below.

The key bit that’s not obvious from the photo above is that the resin tank’s lower surface is a conformal film that presses down onto the imaging assembly’s surface, allowing heat to go from the heater elements almost directly into the resin. During the resin heating phase of the print, a mixer turns the resin over continuously, ensuring that conduction is the dominant mode of heat transfer into the resin (as opposed to a still resin pool relying on natural convection and diffusion). The resin is effectively a liquid heat sink for the heater elements.
Of course, aluminum is an excellent conductor of heat, so to prevent heat from preferentially flowing into the LCD, gaps are milled into the aluminum panel that go along the edges of the panel, save the corners which still touch for good alignment. Although the gaps are filled with epoxy, the thermal conduction from the heating elements into the LCD panel is presumably much lower than that into the resin tank itself, thus allowing heating elements situated mere millimeters away from an LCD panel to heat the resin, without overheating the LCD.
One interesting and slightly puzzling aspect of the LCD is the textured film applied to the top of the LCD assembly. According to the Formlabs website, this is a “release texture” which prevents a vacuum between the film and the LCD panel, thus reducing peel forces and improving print times. The physics of print release from the tank film is not at all obvious to me, but perhaps during release phase, the angle between the film and the print in progress plays a big role in how fast the film can be peeled off. LCDs are extremely flat and I can see that without such a texture, air bubbles could be trapped between the film and the LCD; or if no air bubbles were there, there could be a significant amount of electrostatic attraction between the LCD and the film that can lead to inconsistent release patterns.
That being said, the texture itself creates a bunch of small lenses that should impact print quality. Presumably, this is compensated in the image processing pipeline by pre-distorting the image such that the final projected image is perfect. I tried to look for signs of such compensation in the print layers when I hooked the internal HDMI cable to an external monitor, but didn’t see it – but also, I was looking at layers that already had some crazy geometry to it, so it’s hard to say for sure.
The LCD driver board itself is about what you’d expect: an HDMI to flat panel converter chip, plus an EDID ROM.

As a side note, an LCD panel – a thing we think of typically as the finished product that we might buy from LG, Innolux, Sharp, CPT, etc. – is an assembly built from a liquid crystal cell (LC cell) and a backlight, along with various films and a handful of driver electronics. A lot of the smaller players who sell LCD panels are integrators who buy LC cells, backlights, and electronics from other specialty vendors. The cell itself consists of the glass sheets with the transparent ITO wires and TFT transistors, filled with LC material. This is the “hard part” to make, and only a few large factories have the scale to produce them at a competitive cost. The orange thing we’re looking at in the Form 4 is more precisely described as an LC cell plus some polarizing films and a specialized texture on top. Building a custom LC cell isn’t profitable unless you have millions of units per year volume, so Formlabs had to source the LC cells from a vendor specialized in this sort of thing.
Hold the panel up to a neutral light source (e.g., the sun), and we can see some interesting behaviors.
The video above was taken by plugging the Form 4’s LCD into my laptop and typing “THIS IS A TEST” into a word processor (so it appears as black text on a white background on my laptop screen). The text itself looks wider than on my computer screen because the Formlabs panel is probably using square pixels for each of the R, G, and B channels. For their application, there is no need for color filters; it’s just monochrome, on or off.
I suspect the polarizing films are UV-optimized. I’m not an expert in optics, but from the little I’ve played with it, polarizing films have a limited bandwidth – I encountered this while trying to polarize IR light for IRIS. I found that common, off-the-shelf polarizing films seemed ineffective at polarizing IR light. I also suspect that the liquid crystal material within the panel itself is tailored for UV light – the contrast ratio is surprisingly low in visible light, but perhaps it’s much better in UV.
I’m also a bit puzzled as to why rotating the polarizer doesn’t cause light to be entirely blocked in one of the directions; instead, the contrast inverts, and at 45 degrees there’s no contrast. When I try this in front of a conventional IPS LCD panel, one direction is totally dark, the other is normal. After puzzling over it a bit, the best explanation I can come up with is that this is an IPS panel, but only one of the two polarizing films have been applied to the panel. Thus an “off” state would rotate the incoming light’s polarization, and an “on” state would still polarize the light, but a direction 90 degrees from the “off” state.

Above is a diagram illustrating the function of an IPS panel from Wikipedia.
I could see maybe there is a benefit to removing the incoming light polarizer from the LCD, because this polarizer would have to absorb, by definition, 50% of the energy of the incident unpolarized light, converting that intense incoming light into heat that could degrade the panel.
However, I couldn’t find any evidence of a detached polarizer anywhere in the backlight path. Perhaps someone with a bit more experience in liquid crystal panels could illuminate this mystery for me in the comments below!
Speaking of the backlight path – let’s return to digging into the printer!

About an inch behind the LCD is a diffuser – a clear sheet of plastic with some sort of textured film on it. In the photo above, my hand is held at roughly the exit plane of the LED array, demonstrating the diffusive properties of the optical element. My crude tests couldn’t pick up any signs of polarization in the diffuser.
Beneath the diffuser is the light source. The light source itself is a sandwich consisting of a lens array with another laminated diffuser texture, a baffle, an aluminum-core PCB with LED emitters, and a heat sink. The heat sink forms a boundary between the inside and outside of the printer, with the outside surface bearing a single large fan.
Below is a view of the light source assembly as it comes out of the printer.

Below is some detail of the lens array. Note the secondary diffuser texture film applied to the flat surface of the film.

Below is a view of the baffle that is immediately below the lens array.

I was a bit baffled by the presence of the baffle – intuitively, it should reduce the amount of light getting to the resin tank – but after mating the baffle to the lens assembly, it becomes a little more clear what its function might be:

Here, we can see that the baffle is barely visible in between the lens elements. It seems that the baffle’s purpose might be to simply block sidelobe emissions from the underlying dome-lensed LED elements, thus improving light uniformity at the resin tank.
Beneath the baffle is the LED array, as shown below.

And here’s a closer look at the drive electronics:

There’s a few interesting aspects about the drive electronics, which I call out in the detail below.

The board is actually two boards stacked on top of each other. The lower board is an aluminum-core PCB. If you look at the countersunk mounting hole, as highlighted by the buff-colored circle, you can see the shiny inner aluminum core reflecting light.
If you’re not already familiar with metal-core PCBs, my friends at King Credie have a nice description. From their site:

The most economical (and most common) metal-core stack-up is a sheet of aluminum that has a single-layer PCB bonded to it. This doesn’t have as good thermal performance as a copper-core board with direct thermal heat pads, but for most applications it’s good enough (and much, much cheaper).
However, because the aluminum board is single-layer, routing is a challenge. Again, referring to the detail photo of the board above, the green circle calls out a big, fat 0-ohm jumper – you’ll see many of them in the photo, actually. Because of this topological limitation, it’s typical to see conventional PCBs soldered onto a metal-core PCB to instantiate more complicated bits of circuitry. The cyan circle calls out one of the areas where the conventional PCB is soldered down to the metal-core PCB using edge-plated castellations. This arrangement works, but can be a little bit tricky due to differences in the thermal coefficient of expansion between aluminum and FR-4, leading to long-term reliability issues after many thermal cycles. As one can see from this image, a thick blob of solder is used to connect the two boards. The malleability of solder helps to absorb CTE mismatch-induced thermal stresses.
The light source itself uses the MAX25608B, a chip capable of individually dimming up to 12 high-current LEDs in series (incidentally, I recently made a post covering the theory behind this kind of lighting topology for IRIS). This is not a cheap chip, given the Maxim brand and the AEC-Q100 automotive rating (although, the automotive rating means it can operate at up to 125 °C – a great feature for a chip mounted to a heat sink!), but I can think of a couple reasons why it might be worth the cost. One is that the individual dimming control could give Formlabs the ability to measure each LED in the factory and match brightness across the array, through a per-printer unique lookup table to dim the brightest outliers. Another is that Formlabs could simply turn off LEDs that are in “dark” regions of the exposure field, thus reducing wear and tear on the LCD panel. The PreForm software could track which regions of the LCD have been used the least, and automatically place prints in those zones to wear-level the LCD. Perhaps yet another reason is that the drivers are capable of detecting and reporting LED faults, which is helpful from a long-term customer support perspective.
To investigate the light uniformity a bit more, I defeated the tank-close sensors with permanent magnets, and inserted a sheet of white paper in between the resin tank and the LCD to capture the light exiting the printer just before it hits the resin.
However, a warning: don’t try this without eye protection, as the UV light put out by the printer can quickly damage your eyes. Fortunately, I happen to have a pair of these bad boys in my lab since I somewhat routinely play with lasers:

Proper eye safety goggles will have their protection bandwidths printed on them: keep in mind that regular sunglasses may not offer sufficient protection, especially in non-visible wavelengths!
With the resin tank thus exposed, I was able to tell the printer to “print a cleaning sheet” (basically a single-layer, full-frame print) and capture images that are indicative of the uniformity of the backlighting:

Looks pretty good overall, but with a bit of exposure tweaking on the camera, we can see some subtle non-uniformities:

The image above has some bubbles in the tank from the mixer stirring the resin. I let the tank sit overnight and captured this the next day:

The uniformity of the LEDs changes slightly between the two runs, which is curious. I’m not sure what causes that. I note that the “cleaning pattern” doesn’t cause the fan to run, so possibly the LEDs are uncompensated in this special mode of operation.
The other thing I’d re-iterate is that without manually tweaking the exposure of the camera, the exposure looks pretty uniform: I cherry-picked a couple images so that we can see something more interesting than a solid bluish rectangle.
Other Features of Note
I spent a bit longer than I thought I would poking at the light path, so I’ll just briefly touch on a few other features I found noteworthy in the Form 4.

I appreciated the new foam seal on the bottom of the case lid. This isn’t present on the Form 3. I’m not sure exactly why they introduced it, but I have noticed that there is less smell from the printer as it’s running. For a small urban office like mine, the odor of the resin is a nuisance, so this quality of life improvement is appreciated.

I mentioned earlier in this post the replaceable HEPA filter cartridge on the intake of a blower that creates a positive pressure inside the optics path. Above is a photo of the filter. I was a little surprised at how loose-fitting the filter is; usually for a HEPA filter to be effective, you need a pretty tight fit, otherwise, particulates just go around the filter.

The small plastic protrusion that houses the camera board (shown above) also contains the resin level sensor (shown below).

The shape of the transducer on the sensor makes me think that it uses an ultrasonic time-of-flight mechanism to detect the level of the liquid. I’m impressed at the relative simplicity of the circuit – assuming I’m correct about my guess about the sensor, it seems that the STM32F303 microcontroller is directly driving the transducer, and the sole external analog circuit is presumably an LNA (low noise amplifier) for capturing the return echo.
The use of the STM32 also indicates that Formlabs probably hand-rolled the DSP pipeline for the ultrasound return signal processing. I would note that I did have a problem with the printer overfilling a tank with resin once during my evaluation. This could be due to inaccuracy in the sensor, but it could also be due to the fact that I keep the printer in a pretty warm location so the resin has a lower viscosity than usual, and thus it flows more quickly into the tank than their firmware expected. It could also be due to the effect of humidity and temperature on the speed of sound itself – poking around the speed of sound page on Wikipedia indicates that humidity can affect sound speed by 0.1-0.6%, and 20C in temperature shifts things by 3% (I could find neither a humidity nor an air temperature sensor in the region of the ultrasonic device). This seems negligible, but the distance from the sensor to the tank is about 80mm and they are filling the tank to about 5mm depth +/- 1mm (?), so they need an absolute accuracy of around 2.5%. I suspect the electronics itself are more than capable of resolving the distance, as the time of flight from the transducer and back is on the order of 500 microseconds, but the environmental effects might be an uncompensated error factor.
Nevertheless, the problem was quickly resolved by simply pouring some of the excess resin back into the cartridge.

Speaking of which, the Form 4 inherits the Form 3’s load-cell for measuring the weight of the resin cartridge, as well as the DC motor-driven pincer for squishing the dispenser head. The image above shows the blind-mating seat of the resin cartridge, with the load cell on the right, and the dispenser motor on the left.

A pair of 13.56 MHz RFID/NFC readers utilizing the TRF7970 allow the Form 4 to track consumables. One is used to read the resin cartridge, and the other is used to read the resin tank.

Finally, for completeness, here’s some power numbers of the Form 4. On standby, it consumes around 27 watts – just a hair more than the Form 3. During printing, I saw the power spike as high as 250 watts, with a bit over 100 watts on average viewed at the plug; I think the UV lights alone consume over 100 watts when they are full-on!
Epilogue
Well, that’s a wrap for this teardown. I hope you enjoyed reading it as much as I enjoyed tearing through the machine. I’m always impressed by the thoroughness of the Formlabs engineering team. I learn a lot from every teardown, and it’s a pleasure to see the new twists they put on old motifs.
While I wouldn’t characterize myself as a hardcore 3D printing enthusiast, I am an occasional user for prototyping mechanical parts and designs. For me, the dramatically faster print time of the Form 4 and reduced resin odor go a long way towards reducing the barrier to running a 3D print. I look forward to using the Form 4 more as I improve and tune my IRIS machine!
May 30, 2024
Name that Ware, May 2024
The Ware for May 2024 is shown below.

This is a guest ware, but I’ll reveal the contributor when I reveal the ware next month, as the name and link would also lead to the solution.
Winner, Name that Ware April 2024
Last month’s ware was a “Z16 AI Voice Translator”. The name doesn’t really tell you much, so here’s some photos that give a better idea of what the device is about.

Above are the main parts of the device. The construction is simple and straightforward: big battery, screen with captouch, a rear-facing camera, and a speaker in a large chamber. The speaker chamber is the portion of the plastic mid-case that has the words “XP-Z6” emblazoned on it.
As many contestants observed, the device is based on a Mediatek chip that features a now mostly-obsolete 3G modem. Some correctly picked out that despite the potential for a cellular connection, it only has a wifi antenna.

This board design was almost certainly adapted by deleting parts from a reference design with 3G functionality, as evidenced by the footprint of the RF shielding containing blank areas that are just large enough to accommodate the analog front-end (AFE) companion chip required for 3G. You can see some images of such an AFE in a decade-old post I did about reverse engineering an MT6260A, a distant ancestor of the SoC in this device.

The OS is basically a stripped down Android phone that runs some real-time voice and image translation software, which feels descended from Google Translate, or possibly “borrowed” from Rabbit OS. Above is a snapshot of the home screen UI. I’m not sure who pays for the back-end service for the on-line translation — it doesn’t require a subscription or registration of any kind for the device to operate (I can almost hear the howls of pain from VCs from all the way across the Pacific) — but more importantly, the device is capable of offline translation.
I was initially highly suspicious of the offline claim, so I purchased a sample to try it out. In fact, the device works as advertised with no internet connection at all (although you do need to initially provision it with some translation dictionaries).

It does a surprisingly good job of picking up and translating voice in real-time, and it can also translate text from photos as well in offline mode (although not in real-time: you have to shoot, crop and wait a couple seconds).
The device is exactly what it needs to be and no more — from the oversized speaker (so you can hear the synthesized voice output in a noisy saloon) to the minimal rear-facing camera, to the pair of buttons on the side used by each speaker to toggle the language recognition mode. Personally, I’d prefer if the device had no on-line capability at all; perhaps I’ll strip out the wifi antenna entirely. The UI is similarly minimal, full of quirks that I’m fine “learning through” but any trendy tech reviewer would rip it apart in disgust: “ugh, the icon design is soooo last generation and the menu navigation require two swipes. This work is so derivative“.
At \$40, it’s cheap enough to carry around as a backup translator when I’m on the road or for lending to local guides; I don’t lose sleep if it gets nicked at the bar or left in a taxi. On close inspection, some of the parts look like they were recycled (which imho is not a bad thing that old silicon is finding new life), and the fit and finish is pretty good for \$40 but nowhere near the standard of thousand-dollar smartphones. But, like most shanzhai products it has already been cloned and is available in a wide variety of form factors and price points. I’m pretty sure the ones that sell for hundreds of dollars aren’t materially different from this one — maybe they use better quality parts, but more likely than not they are simply charging more for the same BOM because Western consumers expect a device like this to cost much more than it does. Smartphone advertising has done an amazing job of programming consumers to accept higher prices for minor differences.
I wonder how it does all of its processing offline – my guess is the small Mali GPU included on the 3G chipset is just powerful enough to do real-time audio inference, and/or some amount of character recognition processing on images. The fact that it can do this processing offline, and so cheaply, further implies that it’s now easy to embed highly effective speech-to-text listening devices in every kiosk and hallway, which makes mass surveillance/real-time ad targeting a lot more bandwidth efficient. It’s small enough to Trojan into a wall power adapter, and with 3G functionality you can efficiently exfiltrate text conversations over SMS after screening for key words. Without a screen or battery, the cost overhead would be just a few dollars per surveillance endpoint. A power adapter listening device would be super easy to bundle into almost any off the shelf consumer product, giving it a free pass into conference rooms, homes and hotel rooms around the world.
Ah yes, I should also probably pick a winner. This ware suffers from the “every contemporary gadget looks like a cell phone” problem, so it was tough to guess (but also tough to image search). For no particularly strong reason, I’ll pick AZeta as the winner – the response did identify the device is wifi only, and highlighted the computational potential of the SoC; and a baby monitor is a kind of surveillance device. Congrats, email me for your prize!
April 30, 2024
Name that Ware, April 2024
The ware for April 2024 is shown below:

In some ways, this is a much easier ware than last month’s, but I wonder if anyone will be able to name the precise function of this ware. Thanks to Ole for taking the photo, and for the adventures en route to the teardown!
Winner, Name that Ware March 2024
Last month’s ware was internals from a VCH-1006 passive hydrogen maser. KE5FX has published a great write-up about the unit, its history, and how it was repaired.

I’ll give the prize to Hessel. The guess given was about as close as anything I could have done myself — a pretty challenging ware, as I’ve never seen anything quite like this before. Congrats, email me for your prize!
April 22, 2024
Automated Stitching of Chip Images
This is the final post in a series about non-destructively inspecting chips with the IRIS (Infra-Red, in-situ) technique. Here are links to previous posts:
IRIS project overviewMethodologyLight source electronicsFine focus stageLight source mechanismsMachine control & focus softwareThis post will cover the software used to stitch together smaller images generated by the control software into a single large image. My IRIS machine with a 10x objective generates single images that correspond to a patch of silicon that is only 0.8mm wide. Most chips are much larger than that, so I take a series of overlapping images that must be stitched together to generate a composite image corresponding to a full chip.
The un-aligned image tiles look like this:

And the stitching software assembles it into something like this:

The problem we have to solve is that even though we command the microscope to move to regularly spaced intervals, in reality, there is always some error in the positioning of the microscope. The accuracy is on the order of 10’s of microns at best, but we are interested in extracting features much smaller than that. Thus, we must rely on some computational methods to remove these error offsets.
At first one might think, “this is easy, just throw it into any number of image stitching programs used to generate panoramas!”. I thought that too.
However, it turns out these programs perform poorly on images of chips. The most significant challenge is that chip features tend to be large, repetitive arrays. Most panorama algorithms rely on a step of “feature extraction” where it uses some algorithms to decide what’s an “interesting” feature and line them up between two images. These algorithms are tuned for aesthetically pleasing results on images of natural subjects, like humans or outdoor scenery; they get pretty lost trying to make heads or tails out of the geometrically regular patterns in a chip image. Furthermore, the alignment accuracy requirement for an image panorama is not as strict as what we need for IRIS. Most panaroma stitchers rely on a later pass of seam-blending to iron out deviations of a few pixels, yielding aesthetic results despite the misalignments.
Unfortunately, we’re looking to post-process these images with an image classifier to perform a gate count census, and so we need pixel-accurate alignment wherever possible. On the other hand, because all of the images are taken by machine, we never have to worry about rotational or scale adjustments – we are only interested in correcting translational errors.
Thus, I ended up rolling my own stitching algorithm. This was yet another one of those projects that started out as a test program to check data quality, and suffered from “just one more feature”-itis until it blossomed into the heaping pile that it is today. I wouldn’t be surprised if there were already good quality chip stitching programs out there, but, I did need a few bespoke features and it was interesting enough to learn how to do this, so I ended up writing it from scratch.
Well, to be accurate, I copy/pasted lots of stackoverflow answers, LLM-generated snippets, and boilerplate from previous projects together with heaps of glue code, which I think qualifies as “writing original code” these days? Maybe the more accurate way to state it is, “I didn’t fork another program as a starting point”. I started with an actual empty text buffer before I started copy-pasting five-to-twenty line code snippets into it.
Sizing Up the Task
A modestly sized chip of a couple dozen square millimeters generates a dataset of a few hundred images, each around 2.8MiB in size, for a total dataset of a couple gigabytes. While not outright daunting, it’s enough data that I can’t be reckless, yet small enough that I can get away with lazy decisions, such as using the native filesystem as my database format.
It turns out that for my application, the native file system is a performant, inter-operable, multi-threaded, transparently memory caching database format. Also super-easy to make backups and to browse the records. As a slight optimization, I generate thumbnails of every image on the first run of the stitching program to accelerate later drawing operations for preview images.
Each file’s name is coded with its theoretical absolute position on the chip, along with metadata describing the focus and lighting parameters, so each file has a name something like this:
x0.00_y0.30_z10.00_p6869_i3966_t127_j4096_u127_a0.0_r1_f8.5_s13374_v0.998_d1.5_o100.0.png
It’s basically an underscore separated list of metadata, where each element is tagged with a single ASCII character, followed by its value. It’s a little awkward, but functional and easy enough to migrate as I upgrade schemas.
Creating a Schema
All of the filenames are collated into a single Python object that tracks the transformations we do on the data, as well as maintains a running log of all the operations (allowing us to have an undo buffer). I call this the “Schema” object. I wish I knew about dataframes before I started this project, because I ended up re-implementing a lot of dataframe features in the course of building the Schema. Oh well.
The Schema object is serialized into a JSON file called “db.json” that allows us to restore the state of the program even in the case of an unclean shutdown (and there are plenty of those!).

The initial state of the program is to show a preview of all the images in their current positions, along with a set of buttons that control the state of the stitcher, select what regions to stitch/restitch, debugging tools, and file save operations. The UI framework is a mix of PyQt and OpenCV’s native UI functions (which afaik wrap PyQt objects).

Above: screenshot of the stitching UI in its initial state.
At startup, all of the thumbnails are read into memory, but none of the large images. There’s an option to cache the images in RAM as they are pulled in for processing. Generally, I’ve had no trouble just pulling all the images into RAM because the datasets haven’t exceeded 10GiB, but I suppose once I start stitching really huge images, I may need to do something different.
…Or maybe I just buy a bigger computer? Is that cheating? Extra stick of RAM is the hundred-dollar problem solver! Until it isn’t, I suppose. But, the good news is there’s a strong upper bound of how big of an image we’d stitch (e.g. chips rarely go larger than the reticle size) and it’s probably around 100GiB, which somehow seems “reasonable” for an amount of RAM to put in one desktop machine these days.
Again, my mind boggles, because I spend most of my time writing Rust code for a device with 16MiB of RAM.
Auto Stitching Flow
At the highest level, the stitching strategy uses a progressive stitch, starting from the top left tile and doing a “lawn mower” pattern. Every tile looks “left and up” for alignment candidates, so the very top left tile is considered to be the anchor tile. This pattern matches the order in which the images were taken, so the relative error between adjacent tiles is minimized.

Before lawn mowing, a manually-guided stitch pass is done along the left and top edges of the chip. This usually takes a few minutes, where the algorithm runs in “single step” mode and the user reviews and approves of each alignment individually. The reason this is done is if there are any stitching errors on the top or left edge, it will propagate throughout the process, so these edges must be 100% correct before the algorithm can run unattended. It is also the case that the edges of a chip can be quite tricky to stitch, because arrays of bond pads can look identical across multiple frames, and accurate alignment ends up relying upon random image artifacts caused by roughness in the die’s physical edges.
Once the left and top edges are fixed, the algorithm can start in earnest. For each tile, it starts with a “guess” of where the new tile should go based on the nominal commanded values of the microscope. It then looks “up” and “left” and picks the tile that has the largest overlapping region for the next step.

Above is an example of the algorithm picking a tile in the “up” direction as the “REF” (reference) tile with which to stitch the incoming tile (referred to as “SAMPLE”). The image above juxtaposes both tiles with no attempt to align them, but you can already see how the top of the lower image partially overlaps with the bottom of the upper image.
Template Matching
Next, the algorithm picks a “template” to do template matching. Template matching is an effective way to align two images that are already in the same orientation and scale. The basic idea is to pick a “unique” feature in one image, and convolve it with every point in the other image. The point with the highest convolution value is probably going to be the spot where the two images line up.

Above: an example of a template region automatically chosen for searching across the incoming sample for alignment.
In reality, the algorithm is slightly more complicated than this, because the quality of the match greatly depends on the quality of the template. Thus we first have to answer the question of what template to use, before we get to where the template matches. This is especially true on chips, because there are often large, repeated regions that are impossible to uniquely match, and there is no general rule that can guarantee where a unique feature might end up within a frame.
Thus, the actual implementation also searches for the “best” template using brute-force: it divides the nominally overlapping region into potential template candidates, and computes the template match score for all of them, and picks the template that produces the best alignment of all the candidates. This is perhaps the most computationally intensive step in the whole stitching process, because we can have dozens of potential template candidates, each of which must be convolved over many of the points in the reference image. Computed sequentially on my desktop computer, the search can take several seconds per tile to find the optimal template. However, Python makes it pretty easy to spawn threads, so I spawn one thread per candidate template and let them duke it out for CPU time and cache space. Fortunately I have a Ryzen 7900X, so with 12 cores and 12MiB of L2 cache, the entire problem basically fits entirely inside the CPU, and the multi-threaded search completes in a blink of the eye.
This is another one of those moments where I feel kind of ridiculous writing code like this, but somehow, it’s a reasonable thing to do today.
The other “small asterisk” on the whole process is that it works not on the original image, but it works on a Gaussian-filtered, Laplacian-transformed version of the images. In other words, instead of matching against the continuous tones of an image, I do the template match against the edges of the image, making the algorithm less sensitive to artifacts such as lens flare, or global brightness gradients.

Above is an example of the output of the template matching algorithm. Most of the region is gray, which indicates a poor match. Towards the right, you start to see “ripples” that correspond to the matching features starting to line up. As part of the algorithm, I extract contours to ring regions with a high match, and pick the center of the largest matching region, highlighted here with the pink arrow. The whole contour extraction and center picking thing is a native library in OpenCV with pretty good documentation examples.
Minimum Squared Error (MSE) Cleanup
Template matching usually gets me a solution that aligns images to within a couple of pixels, but I need every pixel I can get out of the alignment, especially if my plan is to do a gate count census on features that are just a few pixels across. So, after template alignment, I do a “cleanup” pass using a minimum squared error (MSE) method.

Above: example of the MSE debugging output. This illustrates a “good match”, because most of the image is gray, indicating a small MSE. A poor match would have more image contrast.
MSE basically takes every pixel in the reference image and subtracts it from the sample image, squares it, and sums all of them together. If the two images were identical and exactly aligned, the error would be zero, but because the images are taken with a real camera that has noise, we can only go as low as the noise floor. The cleanup pass starts with the initial alignment proposed by the template matching, and computes the MSE of the current alignment, along with candidates for the image shifted one pixel up, left, right and down. If any of the shifted candidates have a lower error, the algorithm picks that as the new alignment, and repeats until it finds an alignment where the center pixel has the lowest MSE. To speed things up, the MSE is actually done at two levels of shifting, first with a coarse search consisting of several pixels, and finally with a fine-grained search at a single pixel level. There is also a heuristic to terminate the search after too many steps, because the algorithm is subject to limit cycles.
Because each step of the search depends upon results from the previous step, it doesn’t parallelize as well, and so sometimes the MSE search can take longer than the multi-threaded template matching search, especially when the template search really blew it and we end up having to search over a dozens of pixels to find the true alignment (but if the template matching did it’s job, the MSE cleanup pass is barely noticeable).
Again, the MSE search works on the Gaussian-filtered Laplacian view of the image, i.e., it’s looking at edges, not whole tones.
After template matching and MSE cleanup, the final alignment goes through some basic sanity checks, and if all looks good, moves on to the next tile. If something doesn’t look right – for example, the proposed offsets for the images are much larger than usual, or the template matcher found too many good solutions (as is the case on stitching together very regular arrays like RAM) – the algorithm stops and the user can manually select a new template and/or move the images around to find the best MSE fit. This will usually happen a couple of times per chip, but can be more frequent if there were focusing problems or the chip has many large, regular arrays of components.

Above: the automatically proposed stitching alignment of the two images in this example. The bright area is the overlapping region between the two adjacent tiles. Note how there is a slight left-right offset that the algorithm detected and compensated for.
Once the stitching is all finished, you end up with a result that looks a bit like this:

Here, all the image tiles are properly aligned, and you can see how the Jubilee machine (Jubilee is the motion platform on which IRIS was built) has a slight “walk off” as evidenced by the diagonal pattern across the bottom of the preview area.
Potential Hardware Improvements
The Jubilee uses a CoreXY belt path, which optimizes for minimum flying mass. The original designers of the Jubilee platform wanted it to perform well in 3D printing applications, where print speed is limited by how fast the tool can move. However, any mismatch in belt tension leads to the sort of “walk off” visible here. I basically need to re-tension the machine every couple of weeks to minimize this effect, but I’m told that this isn’t typical. It’s possible that I might have defective belts or more likely, sloppy assembly technique; or I live in the tropics and the room has 60% relative humidity even with air conditioning, causing the belts to expand slightly over time as they absorb moisture. Or it could be that the mass of the microscope is pretty enormous, and that amplifies the effect of slight mismatches in tensioning.
Regardless of the root cause, the Jubilee’s design intent of performing well in 3D printing applications incurs some trade-off in terms of maintenance level required to sustain absolute accuracy. Since in the IRIS application, microscope head speed is not important, tool mass is already huge, and precision is paramount, one of the mods I’m considering for my version of the platform is redoing the belt layout so that the drive is Cartesian instead of CoreXY. That should help minimize the walk-off and reduce the amount of maintenance needed to keep it running in top-notch condition.
Edge Blending for Aesthetics
You’ll note that in the above image the overlap of the individual tiles is readily apparent, due to slight variations in brightness across the imaging field. This can probably be improved by adding some diffusers, and also improving the alignment of the lights relative to the focal point (it’s currently off by a couple of millimeters, because I designed it around the focal point of a 20x objective, but these images were taken with a 10x objective). Even then, I suspect some amount of tiling will always be visible, because the human eye is pretty sensitive to slight variations in shades of gray.
My working hypothesis is that the machine learning driven standard cell census (yet to be implemented!) will not care so much about the gradient because it only ever looks at regions a few dozen pixels across in one go. However, in order to generate a more aesthetically pleasing image for human consumption, I implemented a blending algorithm to smooth out the edges, which results in a final image more like this:

Click the image to browse a full resolution version, hosted on siliconpr0n.
There’s still four major stitch regions visible, and this is because OpenCV’s MultiBandBlender routine seems to be limited to handle 8GiB-ish of raw image data at once, so I can’t quite blend whole chips in a single go. I tried running the same code on a machine with a 24GiB graphics card, and got the same out of memory error, so the limit isn’t total GPU memory. When I dug in a bit, it seemed like there was some driver-level limitation related to the maximum number of pointers to image buffers that I was hitting, and I didn’t feel like shaving that yak.
The underlying algorithm used to do image blending is actually pretty neat, and based off a paper from 1983(!) by Burt and Adelson titled “A Multiresolution Spline with Application to Image Mosaics”. I actually tried implementing this directly using OpenCV’s Image Pyramid feature, mainly because I couldn’t find any documentation on the MultiBandBlender routine. It was actually pretty fun and insightful to play around with image pyramids; it’s a useful idiom for extracting image features at vastly different scales, and for all its utility it’s pretty memory efficient (the full image pyramid consumes about 1.5x of the original image’s memory).
However, it turns out that the 1983 paper doesn’t tell you how to deal with things like non power of 2 images, non-square images, or images that only partially overlap…and I couldn’t find any follow-up papers that goes into these “edge cases”. Since the blending is purely for aesthetic appeal to human eyes, I decided not to invest the effort to chase down these last details, and settled for the MultiBandBlender, stitch lines and all.
Touch-Ups
The autostitching algorithm isn’t perfect, so I also implemented an interface for doing touch-ups after the initial stitching pass is done. The interface allows me to do things like flag various tiles for manual review, automatically re-stitch regions, and visualize heat maps of MSE and focus shifts.
The above video is a whistlestop tour of the stitching and touch-up interface.
All of the code discussed in this blog post can be found in the iris-stitcher repo on github. stitch.py contains the entry point for the code.
That’s a Wrap!
That’s it for my blog series on IRIS, for now. As of today, the machine and associated software is capable of reliably extracting reference images of chips and assembling them into full-chip die shots. The next step is to train some CNN classifiers to automatically recognize logic cells and perform a census of the number of gates in a given region.
Someday, I also hope to figure out a way to place rigorous bounds on the amount of logic that could be required to pass an electrical scan chain test while also hiding malicious Hardware Trojans. Ideally, this would result in some sort of EDA tool that one can use to insert an IRIS-hardened scan chain into an existing HDL design. The resulting fusion of design methodology, non-destructive imaging, and in-circuit scan chain testing may ultimately give us a path towards confidence in the construction of our chips.

And as always, a big shout-out to NLnet and to my Github Sponsors for allowing me to do all this research while making it openly accessible for anyone to replicate and to use.

Andrew Huang's Blog
- Andrew Huang's profile
- 32 followers
