When is a ground not a ground?

The control board PCB design was finished but I had a nagging feeling there was something wrong with it. So one last time, I checked that every single trace was connected to all the right places. I found a nasty collision where some weeks back I had moved a chip sideways and not noticed that two traces were now overlapping. I fixed it and eventually got to the end of the checking and had the board manufactured. When it arrived, I started at the oscillator section, and populated it bit by bit, testing each bit as I went. If something was wrong, I would have a good chance of knowing where the problem was. But everything was working, so far. In the end I just went ahead and added all the remaining components. I tested it with just the memory board connected and confirmed that it could fetch instructions from ROM. I could see the PC and instruction register on the lights, confirm that the other signals were doing the right things, and I could see it doing jumps and branches. It was time to connect the whole thing together and see what happened.

I was using the test routine that I had written for the simulator. It seemed to be working, but sometimes it would halt, meaning a test had failed. This was random. Sometimes it would run from reset and sometimes it would halt, or crash. I couldn’t understand what was going on. It was so close to working, but randomly didn’t. Around this time I noticed that two of the lights, data bus bits 0 and 1 were flickering in a way that was not normal. I discovered while single stepping, that these two bits were doing something very weird. If I pressed reset, they would go off, but after a few seconds, one of them would come on. Or not. It was random. I couldn’t understand how this was possible when the machine was not freely running as nothing should happen until I pressed the step button. I removed some chips and did some multimeter testing, and that was when I noticed that there was a current path between the reset line and those two bits. There is no deliberate connection between the reset signal and the data bus, but I was seeing a 300kΩ resistance between them. And that was when I finally noticed that the reset signal was running right alongside the data bus ribbon cable connector, so close that it actually passed under the plastic connector block. I knew there had to be some foreign body under this component that I had missed while soldering. I had to take the thing off. Luckily, these are split into two sections and one is only 8 pins wide. I desoldered it, and found a tiny pool of brown goo which was touching both the data bus pins (0, and 1) and the reset line. This goo was conductive! It turned out to be dirt and flux that had oozed under the connector out of sight while I was soldering it. What was happening was that the reset signal (which is high normally and low during reset) was applying a certain amount of voltage to the data bus pins; just enough to upset the input to whatever chip was reading them. I cleaned the goo off, resoldered the connector, put everything back together — and the glitching was gone. Unbelievable! The lesson here is never to run a trace so close to a connector that it goes under the plastic, because then it is obscured and you can’t see if there’s something bad touching it.

Things started to get better after that, but the machine still sometimes crashed after reset. It was like starting a car; if it got started it ran for ages, but if not, it didn’t work at all. I started writing some more test routines, including a memory test. But I was still seeing some weird random problems. This time I homed in on register R7 which I was using as a stack pointer for subroutine calls. It seemed as if it was randomly being reset to zero. I was able to confirm this with the scope. It seems that sometimes, R7 got cleared when it was not supposed to. I soon discovered that this happened when R0 was counting down in a loop, in particular it happened when R0 went from 0000 to FFFF. I put a scope on R7 ‘s reset pin and this is what I saw:

DS1Z_QuickPrint6.png

Bear in mind that this pin is supposed to be high all the time! If you know how to look at this, you can actually see the fact that something is counting down, and at the point where that goes from 0000 to FFFF there is a sudden jump in the interference. That was just enough to trigger the reset pin. At this time, I couldn’t understand why there was so much interference on this line. There was much less on R1 and R2, so why was R7 so bad? It was almost as if the physical distance of the register from the top of the board had something to do with it. In fact, I discovered that R6 was suffering from the same problem. At this point, the simplest solution seemed to be to get rid of the reset function on the registers. There’s no need for it anyway and I don’t know why I bothered with it. Much safer is to just tie them high so they can never be activated. So I cut the traces and added little bridge wires to tie all the reset pins high. The problem went away. But I had fixed a bug without understanding it, and this was going to come back and bite me in the arse one more time.

I had almost got the memory test working but something odd was still going on. I narrowed this one down to R7 being cleared again, except this time only the high byte of it. And this time, it wasn’t being reset, it was being written to when a different register was the destination. But only the high byte. I put the scope on its write pin and saw the same kind of noise as earlier. Where was this noise coming from? I had been extremely careful with the write strobes because they are clocks and you have to be careful to keep clocks away from other traces so they don’t pick up noise. But here was a ton of noise coming out of nowhere. Could it be power supply noise? Ground bounce? Something else? I had organized the chips in groups of eight, with careful separation of power and grounds, each group having its own supply capacitor and each chip having its own bypass capacitor. I thought I had followed all the best practices. But I’d missed something. It was something to do with the distance of R7 from the other end of the board, I was sure. Perhaps the clock trace was too long? Well, no. It was only about 5 inches away from the chip that creates the clock signal. I looked at that chip and put the scope on the output pin that generated the noisy signal. It was clean. But when I measured it at R7, it wasn’t. But it was the same signal. How could it be clean at its source yet dirty 5 inches away? Then it hit me. It all depended on where I connected the scope’s ground; which ground I was referencing it to. The generator creates the signal referenced to its own ground, but when R7 receives it, it’s referenced to R7’s ground. These grounds are not the same, not at all the same. I’d seen this before on breadboards but thought it was just because breadboards are crappy and have bad connections. I’d assumed that proper PCBs would have a solid clean consistent ground. What a naive fool!

I had a flash of inspiration. I was able to see the problem happening in real time and I would know if it was fixed. What would happen if I just connected a wire from R7’s ground pin to the source chip’s ground pin? I tried it. Instantly, the problem was gone. Even though these two chip grounds were already electrically connected together, another dedicated connection solved the problem. And then, finally, I remembered something I’d read once but didn’t understand. When you have a clock signal that has to go a long way, it’s a good idea to run the ground return alongside it. I had been concentrating on power ground rather than signal ground. The ground return path from R7 back to the clock source had to go all the way around the edge of the board and then back down into the middle. Providing a direct path back fixes the problem. Now I think that this was exactly the reason for the reset problem as well. If I had fixed the reset glitch by doing this, I wouldn’t have had this problem at all.

Later I started getting a similar problem on R2 which drove me almost nuts, until I realized that just fixing R7 was a bit dumb. So I added dedicated ground returns to all the register chips. Since that moment, LEO-1 has been working flawlessly, even after reset. It just doesn’t crash anymore. And it’s working at 4MHz when my design goal was only 1MHz.

I almost gave up on this project so many times and all for the want of a bit more knowledge. I started doing it because I thought I knew enough about electronics to pull it off. But now I feel like I’ve learned everything I know about electronics by doing it!

LEO-1 working

LEO-1 working

Advertisements

The Good, the Bad and the Ugly

The next thing I got obsessed with was the keyboard. At first I had planned to get an old keyboard off eBay and reverse engineer it. So I bought an old VT-260 keyboard and removed its circuit board to expose the raw key matrix connections and I spent a fair bit of time with a multimeter working out how that matrix was organized. I got as far as having my own microcontroller reading the keyboard and displaying simple scan codes. But by the time I had done that, I had decided that the keyboard itself was not really what I wanted LEO-1 to have. I wanted something special.

You may recall my admiration of the ICL 2900 series mainframes. Well, the keyboards those things had were absolutely gorgeous and nothing like anything I’ve seen before or since. I wanted LEO-1 to have a keyboard like that, but sadly, not everything is available on eBay. The closest I could come to that would be to make my own keyboard with a similar layout and similar keycaps. So I started reading about custom keyboards and found that there is a whole subculture of people who are really into keyboards. I found a great YouTube channel by this funny bloke who calls himself Chyrosran22 and watched all his videos. After a month or two of this I think I knew more about keyboards than I know about my own mother so it was time to take a step back and regroup.

ICL 2900 keyboard

ICL 2900 keyboard

While looking for keyboards on eBay, I had discovered a cool device called a Rubidium Frequency Standard which piqued my clock-obsession — and so I spent a few months building my own atomic clock! I already have enough clocks in this room and didn’t need another one, but I couldn’t resist. I had lost my way. It seemed that I was just finding excuses to postpone finishing LEO-1 and looking back I can now see that I was terrified that I wouldn’t be able to make it work. If it didn’t work, it would be the biggest waste of time and money and I would look like such an idiot for even attempting something as absurd as trying to build a contraption as complicated as this.

Even so, I had been working on the PCB design for the Memory Board and the Control Board during the autumn and they were pretty much finished. Eventually after obsessively checking it over and over again, I had the Memory Board manufactured and I populated it. My initial tests with switches were promising, but then something terrible happened. In 2015 I had seen a very weird problem when testing the ROMs where the data bus would go into continual oscillation for no apparent reason. I thought I’d solved it with bus termination resistors, but now this problem came back to bite me in the arse once again, this time on my nicely made PCB. I couldn’t understand why all these people out there could build entire CPUs on breadboards with loads of untidy wires and have it all work perfectly, but I couldn’t even get a ROM to work on a neat PCB. At one point I just considered picking the whole thing up and placing it into the trash, but of course that was only a fleeting thought. I tried all kinds of hacks and guesses but really had no idea what was going on. This problem only happened on ROM not on RAM or any of the other devices. I blamed the ROM chips; they must be buggy. I managed to crowbar an old-school EPROM into the socket and would you believe it, the problem happened on those too. I checked the data sheet over and over again and couldn’t see anything that could explain it. Then I realized what I was doing. I was using switches to change the address bus for testing, and I was doing it while the ROM was chip-enabled and also output-enabled. My VDU does that and doesn’t have any trouble, but I wondered what would happen if I followed the process that the real CPU would. The address bus would only change when the ROM was not enabled, and the output-enable would only be on for a moment. Well, when I started testing it like that, the problem went away. I visualized the problem as this: when you change the address, the internal circuitry of the ROM will ‘rattle’ as the device finds the correct cell to output. If the device is output-enabled, this rattling will appear on the data bus as a randomly changing mess lasting some tens of nanoseconds. This is high-frequency business; you don’t want that noise getting onto the data bus in the first place. Possibly, the length and layout of the PCB traces could cause frequencies like that to start an oscillation, and if the changes are ‘evil’ enough, it could create power supply glitches, ground bounce or such, which would upset the ROM, cause it to rattle some more… and around you go, ad infinitum. This is all speculation. The real cause is still somewhat of a mystery, but the important thing was that I found that as long as the rattling doesn’t get onto the data bus, you’re safe. Unfortunately, I also found that this happens if the ROM’s chip-enable and output-enable are asserted at the same moment, and my CPU design actually did that. The only way it would work properly would be if the chip-enable came first, then the output-enable, on a different tick. I was going to have to change the design of the CPU’s sequencer. It was an easy fix, and a lucky catch at the eleventh hour, especially since I had not ordered the PCB yet.

Oscillation

Ugly things on the data bus

 

You may recall that I have a terrible habit of going off at a tangent and ordering things that I hope to be able to use in the future, even when I’m not sure that any of it will work. Well, I had ordered a small thermal receipt printer a month before, so with the memory board working, I decided to try it out. This also enabled me to test the device expansion ports. I hooked up a UART to the memory board and the printer, and manipulated it as a device using the test switches. I was able to manually make the printer print a single character. So this means LEO-1 will almost certainly be able to use that printer too. Finally happy with these results, I decided to press on and finish the control board.

“You’re all clear kid, now let’s blow this thing and go home!” — Han Solo