When is a ground not a ground?

The control board PCB design was finished but I had a nagging feeling there was something wrong with it. So one last time, I checked that every single trace was connected to all the right places. I found a nasty collision where some weeks back I had moved a chip sideways and not noticed that two traces were now overlapping. I fixed it and eventually got to the end of the checking and had the board manufactured. When it arrived, I started at the oscillator section, and populated it bit by bit, testing each bit as I went. If something was wrong, I would have a good chance of knowing where the problem was. But everything was working, so far. In the end I just went ahead and added all the remaining components. I tested it with just the memory board connected and confirmed that it could fetch instructions from ROM. I could see the PC and instruction register on the lights, confirm that the other signals were doing the right things, and I could see it doing jumps and branches. It was time to connect the whole thing together and see what happened.

I was using the test routine that I had written for the simulator. It seemed to be working, but sometimes it would halt, meaning a test had failed. This was random. Sometimes it would run from reset and sometimes it would halt, or crash. I couldn’t understand what was going on. It was so close to working, but randomly didn’t. Around this time I noticed that two of the lights, data bus bits 0 and 1 were flickering in a way that was not normal. I discovered while single stepping, that these two bits were doing something very weird. If I pressed reset, they would go off, but after a few seconds, one of them would come on. Or not. It was random. I couldn’t understand how this was possible when the machine was not freely running as nothing should happen until I pressed the step button. I removed some chips and did some multimeter testing, and that was when I noticed that there was a current path between the reset line and those two bits. There is no deliberate connection between the reset signal and the data bus, but I was seeing a 300kΩ resistance between them. And that was when I finally noticed that the reset signal was running right alongside the data bus ribbon cable connector, so close that it actually passed under the plastic connector block. I knew there had to be some foreign body under this component that I had missed while soldering. I had to take the thing off. Luckily, these are split into two sections and one is only 8 pins wide. I desoldered it, and found a tiny pool of brown goo which was touching both the data bus pins (0, and 1) and the reset line. This goo was conductive! It turned out to be dirt and flux that had oozed under the connector out of sight while I was soldering it. What was happening was that the reset signal (which is high normally and low during reset) was applying a certain amount of voltage to the data bus pins; just enough to upset the input to whatever chip was reading them. I cleaned the goo off, resoldered the connector, put everything back together — and the glitching was gone. Unbelievable! The lesson here is never to run a trace so close to a connector that it goes under the plastic, because then it is obscured and you can’t see if there’s something bad touching it.

Things started to get better after that, but the machine still sometimes crashed after reset. It was like starting a car; if it got started it ran for ages, but if not, it didn’t work at all. I started writing some more test routines, including a memory test. But I was still seeing some weird random problems. This time I homed in on register R7 which I was using as a stack pointer for subroutine calls. It seemed as if it was randomly being reset to zero. I was able to confirm this with the scope. It seems that sometimes, R7 got cleared when it was not supposed to. I soon discovered that this happened when R0 was counting down in a loop, in particular it happened when R0 went from 0000 to FFFF. I put a scope on R7 ‘s reset pin and this is what I saw:

DS1Z_QuickPrint6.png

Bear in mind that this pin is supposed to be high all the time! If you know how to look at this, you can actually see the fact that something is counting down, and at the point where that goes from 0000 to FFFF there is a sudden jump in the interference. That was just enough to trigger the reset pin. At this time, I couldn’t understand why there was so much interference on this line. There was much less on R1 and R2, so why was R7 so bad? It was almost as if the physical distance of the register from the top of the board had something to do with it. In fact, I discovered that R6 was suffering from the same problem. At this point, the simplest solution seemed to be to get rid of the reset function on the registers. There’s no need for it anyway and I don’t know why I bothered with it. Much safer is to just tie them high so they can never be activated. So I cut the traces and added little bridge wires to tie all the reset pins high. The problem went away. But I had fixed a bug without understanding it, and this was going to come back and bite me in the arse one more time.

I had almost got the memory test working but something odd was still going on. I narrowed this one down to R7 being cleared again, except this time only the high byte of it. And this time, it wasn’t being reset, it was being written to when a different register was the destination. But only the high byte. I put the scope on its write pin and saw the same kind of noise as earlier. Where was this noise coming from? I had been extremely careful with the write strobes because they are clocks and you have to be careful to keep clocks away from other traces so they don’t pick up noise. But here was a ton of noise coming out of nowhere. Could it be power supply noise? Ground bounce? Something else? I had organized the chips in groups of eight, with careful separation of power and grounds, each group having its own supply capacitor and each chip having its own bypass capacitor. I thought I had followed all the best practices. But I’d missed something. It was something to do with the distance of R7 from the other end of the board, I was sure. Perhaps the clock trace was too long? Well, no. It was only about 5 inches away from the chip that creates the clock signal. I looked at that chip and put the scope on the output pin that generated the noisy signal. It was clean. But when I measured it at R7, it wasn’t. But it was the same signal. How could it be clean at its source yet dirty 5 inches away? Then it hit me. It all depended on where I connected the scope’s ground; which ground I was referencing it to. The generator creates the signal referenced to its own ground, but when R7 receives it, it’s referenced to R7’s ground. These grounds are not the same, not at all the same. I’d seen this before on breadboards but thought it was just because breadboards are crappy and have bad connections. I’d assumed that proper PCBs would have a solid clean consistent ground. What a naive fool!

I had a flash of inspiration. I was able to see the problem happening in real time and I would know if it was fixed. What would happen if I just connected a wire from R7’s ground pin to the source chip’s ground pin? I tried it. Instantly, the problem was gone. Even though these two chip grounds were already electrically connected together, another dedicated connection solved the problem. And then, finally, I remembered something I’d read once but didn’t understand. When you have a clock signal that has to go a long way, it’s a good idea to run the ground return alongside it. I had been concentrating on power ground rather than signal ground. The ground return path from R7 back to the clock source had to go all the way around the edge of the board and then back down into the middle. Providing a direct path back fixes the problem. Now I think that this was exactly the reason for the reset problem as well. If I had fixed the reset glitch by doing this, I wouldn’t have had this problem at all.

Later I started getting a similar problem on R2 which drove me almost nuts, until I realized that just fixing R7 was a bit dumb. So I added dedicated ground returns to all the register chips. Since that moment, LEO-1 has been working flawlessly, even after reset. It just doesn’t crash anymore. And it’s working at 4MHz when my design goal was only 1MHz.

I almost gave up on this project so many times and all for the want of a bit more knowledge. I started doing it because I thought I knew enough about electronics to pull it off. But now I feel like I’ve learned everything I know about electronics by doing it!

LEO-1 working

LEO-1 working

Advertisements

Slow motion

I can’t believe it’s almost the end of May and the memory board is still not finished. LEO-1 has taken a bit of  a back-burner position, it seems. This started in January when my wife and I moved to a new apartment. For several weeks before and after the move, I had precious little time for soldering, with packing and organizing taking priority. During the move I managed to hurt my back very badly by lifting my AKAI X-355 tape recorder incorrectly. The thing is built like a tank and weighs a ton. I was in terrible pain for about 3 weeks and couldn’t face any hobby stuff at all.

Somehow, I managed to not do any LEO-1 work during March either, having got out of the habit of even thinking about it. Then, towards the end of March I bought an old Conn organ from a charity shop and spent several weeks fixing it up. During April, I decided I needed a proper amplifier and speakers in the living room, and took it upon myself to build the amplifier myself. It came out beautifully and sounds great, but this project once again kept me away from the LEO-1.

I got back to it a couple of weeks ago, doing something that I should have done right at the start. I went through the schematics and labelled all the ICs with their numbers. The software I’m using, ExpressSCH, doesn’t do this in a very useful way when there are gates involved, because it numbers all the gates as separate ICs, which they are not. So I had to do it manually. When I finally finished, I found that I had 282 ICs in total, including the RAM, ROM and ZIF sockets. At the start, I had estimated there would be about 200. Believe it or not, I had actually gone ahead and ordered all the chips I thought I was going to need back last autumn, without actually doing a proper count. Obviously I had counted certain chips like the bus drivers and registers, but for AND, OR, NOT gates, etc., I had not bothered; I just bought a load of them cheap on eBay. This means that I now have a large quantity of surplus chips, in particular 74HC14. I had bought 50 of them for about 6 bucks and it turns out I only need three. I laughed out loud at that mistake and I still can’t really work out how it happened.

I also found I needed to change an important design decision, the decision to not use IC sockets for the numerous small chips. I had initially decided to not use sockets because they are so expensive and I need a couple of hundred of them. But while working on the memory board, I saw the glaring truth. If one of those 74HC32s for example were to fail, either due to an error on my part or for some other reason, I would simply not be able to replace it without desoldering all the wires connected to it first, and soldering them back afterwards. What is the likelihood that a chip would fail? Well, probably not very high, but accidents do happen; it’s easy to drop a screwdriver and short something out. I decided it just wasn’t worth the risk, so I bit the bullet and ordered sockets for all the chips on the Control, ALU and Register boards. I don’t trust the really cheap sockets that use blades so I had to get machine tooled ones that have nice round holes with grips inside. I also decided to get the best quality I could afford so I got ones with gold-plated contacts. By buying Jameco ValuePro parts I managed to cut the cost in half over what Mouser wanted for the real brand-name parts. It sucks that in some cases a single socket can cost over a dollar when I was able to buy nine ICs for the same money, so I wanted to try and cut the cost without sacrificing quality too much.

Design decisions: Electronics

By the time I had finished the first draft of the simulation, I really felt like it would be possible to implement LEO-1 with real electronics. I had tested the simulator by spending a few days writing an assembler so that it would be easier to test by copying and pasting the assembler output into Logisim instead of having to work it out on paper and type the 16-bit instruction codes in. Now I had a simple assembler which I could use to program the real thing. I had to try and build it for real.

As I’ve mentioned before, I’m not totally new to digital electronics. I designed and built a digital clock in 1978 or so and I designed and built a decoder and display for the (now defunct) ‘Maplin Rugby Clock receiver’. I felt confident that I could get back into this fairly easily. I already had a soldering iron, multimeter, wire, basic tools and a cabinet full of spare electronic parts that I’d collected over the years. How hard could it be? I decided to find out by seeing if I could come up with a design for a single one of the 16-bit registers LEO-1 needs.

I had already decided that the 74 series chips was the way to go, just because I’ve always liked them (after I got over hating them), I’ve never really felt comfortable with the 4000 series CMOS parts, and many mini and mainframe computers like the ones I worked with in the 80s were made out of them. I read through this page to refamiliarise myself with some of the parts I was going to need and started to realise that in the last 20 years, things have changed a bit. Originally, in the 70s, I used plain old 74xx parts. These were the real TTLs. They got warm during use and were rather robust. Some were even made of ceramic instead of plastic.

Pictures I took of the clock project I did in 1993 show that I used 74HCxx parts. I’m not sure that at the time I knew that these parts are actually implemented with CMOS and are not really TTL compatible. The project worked because I used all the same kind (HC) and 5 volts (the standard TTL supply)  works for CMOS as well. What’s changed for the better is that there now also exist 74HCTxx parts which are implemented as CMOS but have a completely TTL-compatible interface. By using these, you get compatibility with old TTL parts like 74LSxx but they use less power. I needed to choose between HC and HCT. My decision was influenced by something (that I should have, but didn’t) expect: many of the 74 series are now obsolete and either very difficult or impossible to get. If I went with HC parts (which are ‘preferred’ for new designs), I would be locked into them and if a part was not available, I would be royally screwed. If however, I went with HCT parts, I would have the option of falling back on LS for difficult-to-get items. I decided to go with HCT for this reason. The only downside I could see was HCT’s ‘lower immunity to noise’. I’m not sure if that will affect what I’m doing; I hope not. In retrospect I think I may have made a mistake as I probably won’t need to interface to any LS chips and HC would have been a better choice, but I’m not going back now.

Anyway, after choosing 74HCT chips I ordered a few parts for prototyping the register board. While I was at it I looked into the kind of memory (RAM, ROM) I might want to try. I found a static RAM chip which provides 512k x 8 bits and figured I could use two of those to make a 16-bit memory. As for ROM, I’d like to try EEPROM (i.e., Flash memory in a chip) but I’m still doing research on that because of the need for a programming device to get the code into the chip. There are two options for this: build my own, or buy one. I think I’m going to have to go for option 2 on that. Still thinking about it though. Once I know which device I want to use, I can try and find an affordable programmer that supports it.

So, what do I need for a single register? It turns out that a pair of 74273 octal registers will do the trick. They latch the data in on a positive-going clock edge which is what my design needs, and they have two-state outputs. LEO-1 has four internal register buses which I called RIN, ABUS, BBUS and CBUS. The RIN bus is the register input bus and it will be constantly connected to all register inputs. The other three are the register output buses. ABUS and BBUS go to the inputs of the ALU and CBUS is used for writing a register to memory. This means that the output of every register has to be connected to three bus drivers which will enable any register value to be output to any of the A, B or C buses. The instruction decoder will ensure that only one register at a time can get onto a bus by selecting only one of the eight registers for each case of A, B and C. I chose the 74244 octal bus driver chip for this purpose. The 273 and 244 being octal (8-bit) chips means I have to ‘bit-slice’ to get 16-bits from pairs of 8-bit chips. So, one half (low 8 bits) of a register will need a 74273 register and three 74244 bus drivers, and the other half (high 8 bits) will need the same. This gives a total of eight chips per register for a total of 64 chips across all eight registers. While I was designing LEO-1 in the simulator, I didn’t give this kind of thing much thought. I’m now glad I didn’t try to design a 32-bit machine! Here’s a picture of the chips attached to a bit of static-proof foam:

Register chips

Register chips