How to find an error in a microprocessor released thirty-five years ago

K1801VM1A







It's hard to believe, but sometimes errors in the processors, in fact, live longer than the processors themselves. Recently I happened to be convinced of this by the example of the 16-bit microprocessor 18011 , on the basis of which the BK-0010 / 11M family of household computers was created in the USSR at one time. About this family on Habré repeatedly wrote.







The period of BeKashek's active life falls on the end of the 80s - the beginning of the 90s of the last century. In these years, through the efforts of numerous single enthusiasts, as well as groups of circle members and cooperators, the main array of application programs for the BC was developed: games, utilities, various "DOSs" (disk operating systems). In parallel with the development of software, peripherals were created under which their system software was written. In general, the ecosystem of these 16-bit PDP-like computers developed according to similar principles, as, for example, the early 8-bit open architectures based on the Intel 8080 and the S-100 bus developed. Later, as we move away from the utilitarian role of the CD, the focus in programming has shifted towards the demoscene.







The volume of software for BC can be estimated by visiting public sites with collections of programs . Of course, in comparison, for example, with the ZX-Spectrum, this volume is much more modest. Nevertheless, even such a volume, it would seem, should have been enough to circumvent all conceivable nooks and crannies of machine code. Is it possible to find something unusual in the behavior of the processor, after more than thirty years of practice in using it? As it turned out - yes! This will be discussed below.







Perhaps it makes sense to tell this story in chronological order. First of all, I must immediately note that I am not at all a “programmer with experience”, neither by occupation, nor by belonging to a cohort of enthusiasts of the BC, about whom I wrote above. I came to BC in a roundabout way, partly through nostalgia for the hobbies of childhood and youth (analog and digital electronics, the magazine Young Technician , UT-88 and other crafts and imperfections), and partly through my interest in the architecture and command system PDP-11 . I don’t have BK “in hardware” and I usually run programs under BK and debug it in the bkemu emulator on a tablet for Android.







Some time ago, I became interested in the Kaleidoscope program , authored by Li-Chen Wang-a . The program was written in machine code in 1976, for an Intel 8080 microprocessor as part of an Altair 8800 computer with a Cromemco Dazzler graphics adapter. I wanted to analyze the Li-Chen Wang-a algorithm in detail and, at the same time, port it to the BC. It must be said that the desire to port the Kaleidoscope under BC was expressed among demosceners earlier, and there were even attempts to parse the algorithm, but they were unsuccessful.







In my next article, I will probably analyze this algorithm in detail (and for the impatient, I’ll post a link to the sources of the cross-platform Kaleidoscope under libSDL in C). For the future, it will be enough to indicate that the problem was solved, and the Kaleidoscope was successfully ported to BC. Moreover, sound generation was added to the algorithm on the CD, and, since both the picture and the sound are generated by the same code, we can say that the picture itself sounds (the whole demo fits in less than 256 bytes of machine code, and, I hope it will be presented to the public at CAFe Demoparty 2019 in Kazan in late October).







Having finished writing and debugging my program in the emulator, I turned to Damir ("Adamych") Nasyrov (he is one of the organizers of CAFe Demoparty and a very well-known person among demosceners) with a request to check the program execution on a real BC. I was especially interested in sound reproduction, since timings in the emulator could differ from timings on real hardware. Imagine my disappointment when Damir informed me that there is an image on a real BC, but there is no sound!







The next few nights were spent trying to subtract from the system documentation on the BK-0011M and the circuit diagram , where could there be an error with the sound. The sound in the BC is organized quite simply: the 6th bit in the I / O register with the octal address 177716 (tape recorder control register) is output through a buffer to a piezoelectric speaker (beeper). In addition to the 6th category, bits 2 and 5 of the same register are connected to the simplest digital-to-analog converter with 4 resistors. From the output of this converter, sound can go to the tape recorder. Everything is exceptionally clear and logical, but there was no stubborn sound on a real BK, regardless of the combinations of bit masks that I tried to apply to the data output to this register. In parallel, all the BK emulators I knew were installed and tested - and the sound worked in everyone!







At some point, I almost managed to convince Damir that his BK was faulty, but the behavior was repeated on another live BK-0011M, as well as on BK-0010. I ran out of ideas, and the inhabitants of the telegram channel on the BC-theme, too, could not tell anything ... However, the incident helped, as usual. In the course of one of the experiments, Damir launched a demo on the emulator to make sure that there is sound in the emulator. And here he managed to notice that not only there is sound in the emulator, but not on the BC, but also the pictures in the emulator and on the live BC are different! Here I must remind you that in my program both the picture and the sound are generated by one code. Accordingly, all this time I was looking for a reason in the wrong place: the reason was in the code that generated the data for the screen contents.







Damir sent me a screenshot, and it became clear that the algorithm produces bytes with zero contents of the highest 4 bits, and, by coincidence, these bits were output to the sound (i.e., always zeros). However, the reason why the algorithm behaved this way remained vague. This is the place in the code (assembler macro11 from PDP-11, registers r0-r5 renamed!):







; renamed registers a = %0 b = %1 c = %2 d = %3 e = %4 h = %5 ... ... asr b ; sets CF bic #177760, b bis b, c bis (h)+, c ; screen address in c movb (c), a ; get a byte from screen RAM bcc 1$ ; check CF bic #177760, a ; keep bits 0-3, clear rest bisb d, a ; fill bits 4-7 br 2$ 1$: bic #177417, a ; keep bits 4-7, clear rest bisb e, a ; fill bits 0-3 2$: ... ...
      
      





For some reason, on a real BC, a conditional jump at the $ 1 mark was always performed. That is, the bcc instruction always perceived the carry flag as cleared, although the ASR shift instruction could set this flag to either 0 or 1. How could this be, because according to the processor documentation, neither BIC, nor BIS, nor MOVB should affect the carry flag ?!







Moreover, in all emulators (which were written according to the documentation for the processor!) It is so: these instructions do not touch the flag C. It became clear that the real processor 1801BM1A does not work in this case according to the documentation. It remains to confirm this.







For starters, an obvious quick fix:







  ... asr b ; sets CF mfps -(sp) ; store PSW on stack bic #177760, b bis b, c bis (h)+, c ; screen address in c movb (c), a ; get a byte from screen RAM mtps (sp)+ ; restore PSW from stack bcc 1$ ; check CF ...
      
      





Saving flags on the stack immediately after the shift instruction and restoring them before the conditional jump immediately solved the problem, which showed that I was on the right track. It remains to narrow the "circle of suspects." To test the hypothesis, such a synthetic test was first written (the registers were not renamed here; the initialization was omitted so as not to clutter up the code; emt 64 is a program interrupt for printing a line):







  ... mov #1, r1 jsr pc, test clr r1 jsr pc, test halt test: mov #40000, r2 ; r2 points to screen RAM mov #dummy, r5 ; r5 points to dummy = 200 ; *** begin *** asr r1 ; affects CF bic #177760, r1 bis r1, r2 bis (r5)+, r2 movb (r2), r0 ; *** end *** jsr pc, prt rts pc prt: mov #msg1, r0 bcs l1 mov #msg2, r0 l1: emt 64 rts pc msg1: .asciz /Flag CF set/ msg2: .asciz /Flag CF clear/ dummy: .word 200 ...
      
      





And the test ... didn't work! Program printed on screen







Flag CF set

Flag CF clear







What turned out? It turned out that the initial assumption that the code fragment between begin and end just spoils the C flag is wrong and needs to be clarified. What is the difference between this test and the source code? And the fact that other instructions appeared between the block of "suspicious" commands and the conditional jump. Not affecting flag C, but nonetheless changing the internal state of the processor. Therefore, the following test was like this:







  ... mov #1, r1 jsr pc, test clr r1 jsr pc, test halt test: mov #40000, r2 mov #dummy, r5 ; *** begin *** asr r1 ; affects CF bic #177760, r1 bis r1, r2 bis (r5)+, r2 movb (r2), r0 bcc l1 ; *** end *** mov #msg1, r0 emt 64 rts pc l1: mov #msg2, r0 emt 64 rts pc msg1: .asciz /Flag CF set/ msg2: .asciz /Flag CF clear/ dummy: .word 200 ...
      
      





And this test has already been printed on a real BK-0011M:







Flag CF clear

Flag CF clear







On the emulator, as before,







Flag CF set

Flag CF clear







Further is a matter of technology. By means of gradual simplifications, such a minimal test was obtained on which a bug is reproduced (I quote the entire source):







  .title test .psect code .=.+1000 mov #15, r0 emt 63 sec jsr pc, test clc jsr pc, test halt test: movb r0, r0 bcc l1 mov #msg1, r0 emt 64 rts pc l1: mov #msg2, r0 emt 64 rts pc msg1: .asciz /Flag CF set/ msg2: .asciz /Flag CF clear/ .end
      
      





On a real BK-0011M, this test displays







Flag CF clear

Flag CF clear







That is, the MOVB instruction that was directly in front of the conditional branch instruction was to blame, and the appearance of the first operand is not important. If, for example, NOP is inserted between MOVB and BCC, the behavior will return to the documented one, and the program will print







Flag CF set

Flag CF clear







That made it possible to formulate a refined hypothesis (I quote myself from a telegram channel):







... Regarding the bug: the behavior seems to have cleared up. As I imagine, MOVB src, dst (by the way, it seems that the operands are not important), due to some architectural features, it temporarily spoils the C flag inside the processor, but not fatally, because the percent seems to save a copy of this flag. As a result, if between the MOVB and the conditional branch there are other commands (not affecting C), for example, NOP, then the behavior is as described in the documentation.

What happened next? Further, colleagues from the channel helped bring Vyacheslav (@ K1801BM1, the legendary person who previously reversed this processor at the transistor level) to the discussion. The reaction of Vyacheslav (Yuot) when he tested the behavior at the stand with the real 1801BM1A (spelling and punctuation preserved):







Stanislav Maslovski:

minimum reproduction requires two teams

movb and conditional jump on C

Well, before that, set flag C to a known state



Yuot:

Flag with always cleared turns



Stanislav Maslovski:

Yes

now insert nop



Yuot:

Now never



Yuot:

Alternating 0 1

This is some shame

With the help of Vyacheslav, the details were found out, namely, that the reason for the bug is that in the processor, in addition to the PSW, there is another 4-bit register, which normally stores a copy of the flags from the PSW. This register is connected with the automatic firmware and conditional transitions take the flag values ​​from it. When executing the instructions MVB, SXT, MFPS, due to the peculiarities of processing the sign extension, and due to an error in the microcode, a copy of flag C in this register is discarded and conditional transitions using this flag do not work correctly. However, by following the instructions below, the temporary register value is restored from the PSW. That is why the NOP insertion restores the correct behavior.







In conclusion, I would also like to thank the subscribers of the telegram channel "BK0010 / 11M World" for participating in the discussion of this bug, and for the comments made on the text of the article. The title photo for the article is courtesy of Manwe_SandS . More interestingly, Manwe was close to discovering the same bug, almost at the same time that Damir and I were struggling to solve the sound problem!







Now it's up to the small (just kidding) - bring all emulators in line with the actual behavior of the processor. After all, the processor itself, alas, can no longer be fixed.







On this I will end. I hope it was interesting.








All Articles