This project is a really old dream of mine. I first heard of the Transputer back in 1987, and promptly bought the INMOS book, "Occam Programming Manual". I fantasized about getting hold of one of the INMOS Transputer boards for PCs, but the price, of course, was way out of reach. Ever since then, I've looked at that book from time to time, and thought about how great it would be to get to play with real Transputers. I've even done a quick eBay search now and then, just in case...
...and then, quite recently, all of a sudden, there it was: a genuine INMOS B008 board for sale on eBay, and at a price I could convince myself that I could afford — even if I'd have to buy an actual Transputer processor module to plug into it, which it seemed would cost as much as the board itself. (As it turned out, that was not the case. More on this in a moment.)
The board purchased, I started planning for its actual use. I found an old Pentium III motherboard in my stash, with PCI and ISA buses, and 512 MiB of RAM. Perfect! Additionally, getting this thing up and running would once again give me somewhere to mount my old modified 5.25" Teac floppy drive and Adaptec AHA-1542B SCSI controller. The 1542B has a proper floppy controller, of the same kind that was in the original IBM PC, so it's perfect for handling just about any floppy format from old systems, like my Osborne 1 and DEC PDP-11 computers. I quickly decided that I'd want to have MINIX 3 running on the machine, because MINIX 1 was the first Unix I ever did any kernel hacking on, and I'd like to get a closer look at what's been happening there since.
Along the way, I discovered a couple of great web sites. Michael Brüstle has a truly impressive collection of Transputer related documentation. Axel Muhr has lots of interesting material on his site, including his own Transputer module (TRAM) for INMOS boards like the one I purchased, which he sells for a fraction of the going price for used, original modules on eBay. As for software, Ram Meenakshisundaram has a very useful collection, and there's another one at the WoTUG site.
Studying the B008 documentation on Michael's site, I decided that the right way to set it up in a reasonably modern machine would be at I/O address 0x150 (the traditional default for B004 compatible hardware), IRQ 5, and DRQ 1. Consequently, I configured the BIOS autoconfiguration to leave these available for the ISA bus, disabled the on-board floppy controller, and set up the 1542B as a floppy controller only. (MINIX doesn't do SCSI, so I had to go with IDE disk for the CD and the system disk.)
Booting from a DOS 3 floppy, I could then look through other old stuff I had on diskettes, and run some tests of the hardware. I also downloaded the basic INMOS test and debugging tools (ispy and mtest) from Alex's site, and got them onto DOS floppies, to be ready when the B008 and TRAMs arrived.
The B008 is a really nice, flexible, way to operate Transputers in a PC. It contains a minimum of hardware, and is mostly a bus for supplying power to the TRAMs, and connecting the Transputers on them as a daisy chain. (Each Transputer has four bi-directional links on it, and two of these are used to build this chain through up to ten TRAMs on a B008.) Additionally, it has two special chips: the C004 chip is a programmable switch that lets you build a grid using the remaining two links on each Transputer, and the C012 handles the ISA interface, letting you talk to the first Transputer in the chain, either a byte at a time through I/O ports, or in chunks, using DMA.
On an IBM PC AT, which was contemporary with the B008, you could fit four of these boards, with the first one actively on the ISA bus, and the other three only taking power from the bus, while being internally interconnected so as to continue the daisy chain (and programmable grid), all under control of software on the host computer, which would typically serve as a console and storage server for software running on the Transputers. As each T800 Transputer was about ten times more powerful than the 80286 in the AT, and you could fit ten of those on each B008, this meant you could build a parallel supercomputer inside the AT, with about 400 times the computing power of the host system. Not too shabby!
As it turned out, the INMOS utilities were written for the hardware of their time, and my 350 MHz Pentium III was a bit on the fast side. As a result, the tools were decidedly flaky in use, but I did manage to verify that my B008 was working as it should. No worries; I was going to write a driver for, and port the tools to, MINIX 3.4, anyway.
Getting MINIX 3.4 going turned out to be a simple affair. Using the latest ISO image, and following the installation guide, I quickly had the system installed. The postinstall tasks completed, I could use the instructions for tracking current to get the source code onto the disk (with the small exception that I forked the repo on github, because I knew I was going to be adding to the system). I then rebuilt everything, to verify that all was working as expected.
MINIX 3.4 is an interesting system. It's come a long way, and the reliability features are really nice. I had a rather fun experience of that during installation: while pulling down the source code from github, there was what looked like a hardware glitch in the old ethernet interface I was using, and the driver crashed. Naturally, I thought I would have to reboot, delete the partially cloned repo, and start over — but after a very brief hang, git kept going, and completed correctly. The reincarnation server had restarted the ethernet driver, and the network software simply continued working as before.
Where this very latest MINIX is less than perfect, is userland. It has a partially integrated NetBSD 8 userland, and it's hard to tell what's working, and what isn't. The header files and man pages are there, but finding functionality seemingly supported doesn't mean it is; you have to try it and see. The internal APIs for the kernel, servers, and driver infrastructure are undocumented in the system itself (no manual section 9), leaving the book (which is out of date), and wiki articles on the web site. Programming for it can be frustrating. I have half a mind to set up MINIX 3.1 in parallel with my MINIX 3.4 installation, to see if the older version is more comfortable. That would bring the added bonus of having a system that agrees with the latest (third) edition of the book, too. (3.1.8 is the last version before the import of NetBSD started, BSD make having been adopted just before. The last version of MINIX 3 with the old MINIX make is 3.1.6.)
INMOS has a standard API that host system software expects to use, and it is quite simple. There must be functions to read and write, of course, and these must have support for optional timeouts. Functions must also exist to check whether the hardware is ready to receive or transmit, and to perform a couple of special operations, like resetting the board, and dropping the first Transputer into a special "analyse" mode for debugging:
int OpenLink(const char *Name); int CloseLink(int LinkId); int ReadLink(int LinkId, char *Buffer, int Count, int Timeout); int WriteLink(int LinkId, char *Buffer, int Count, int Timeout); int ResetLink(int LinkId); int AnalyseLink(int LinkId); int TestError(int LinkId); int TestRead(int LinkId); int TestWrite(int LinkId);
Other makers of host side software may have other requirements. For instance, the Helios I/O Server additionally needs functions to read and write single bytes. Its expected return values from the functions are also different. A single implementation of this little library will work with all INMOS software, and other software that adheres to their standard. Others, like Helios, will need variations.
The driver needs to have enough functionality to support these requirements, so the various library functions can be implemented for the client applications to use. This means it must support read and write with (optional) timeouts, and it must have some ioctl calls to support the more specialized functions.
MINIX 3 drivers are userland programs that operate by receiving, and answering, messages from client processes. These messages represent the basic operations (open, close, read, write, ioctl...), and there are nice libraries that automate the communication for you, so that all you have to do is populate a struct with pointers to your handler functions, and fire off the interaction loop. Each of your functions gets called on behalf of a received message, and is expected to handle the requested operation, and return a value to be sent back to the client process.
This is, of course, familiar to anyone who's worked with device drivers on more tradional Unix systems, and this can be a little bit misleading. I initially worried about my interrupt handler function being called while one of the other functions was busy manipulating shared data structures. This can't happen, though: the messages are handled one at a time, so if an interrupt occurs while you're handling a client request, you won't know about it until you're done with the request, and your handler function has returned. Your interrupt handler will then be invoked to receive the message from the real interrupt handler in the kernel. This is important to realize, because many devices need to notify the driver, using an interrupt, that some sort of progress has occurred. In particular, this is the case with the early revision B008 I have: while it can operate polled in plain B004 mode, the only way you can know that a B008 DMA operation is done is by fielding the interrupt.
Because the driver is operating at a distance from the hardware, a function to read or write using DMA cannot return anything to its client until that interrupt has been received (in fact, it may need to go multiple rounds with the DMA controller if a large amount of data is to be moved) — but you won't know about the interrupt until you've returned from the handler function. This is solved by having a special return value (EDONTREPLY), that the library function calling your handler will take to mean that the responsibility for the request has been taken over, but no reply is to be sent. In this case, you have to keep track of the ongoing job yourself, so you can generate a response to the client when the operation really is completed.
Thus, while my driver is quite straightforward for the other operations, read and write are both implemented twice. If DMA is not available (the plain B004 situation), the whole requested operation is completed in a polled fashion before returning. If it is, the details are recorded, and a simple step function is called that performs one stage of the DMA process. On this first call, it will start the first (and only, if the transfer is short enough) DMA operation. The handler then returns EDONTREPLY. After this, the step function will be called one or more times from the interrupt handler, until the operation is complete, and it sends a reply to the waiting client process.
Timeouts have to be handled differently for the two cases, of course. During the polled operation, time is kept track of, and the task aborted if it runs out. When using DMA, on the other hand, the clock task in the kernel is asked to send a message when the timeout period is over. As we're mostly waiting for interrupt messages during this time, we may receive the timeout message before the transfer is complete, and can then cancel the rest of it, reset the DMA controller, and send a response to the client informing of the outcome.
My modified MINIX 3.4 is on github. The driver proper is in the file b004.c, and should hopefully not be too hard to read. In the following, I'll try to clarify some of the less obvious bits.
First, though, do note that the driver operates under the simplifying constraint that communicating with the Transputers take place over a single, bi-directional, link, this communication is, in practice, half duplex, and data transfer is always initiated by the host. The link will always be in one of three states: idle, sending data to the board, or receiving data from the board. This means we can keep global state that says where we're at, and we can assume that we'll have only a single client using the board through the driver at any one time.
I've also simplified things by hardcoding the above mentioned choices for I/O address, IRQ, and DRQ. They make sense as they are, and I don't expect to have more than one daughterboard in my machine. Modifying the driver to handle more than one board by making these parameters configurable would be cool, though. If I do this, I'll probably encode other configurations in the minor device number.
There is code in the driver for the MINIX 3 driver live update functionality. This is completely untested, because it requires cross compiling the system under Linux, using Clang, and I don't have a setup for that.
Notes on individual functions:
This allocates buffer space for both polled I/O and DMA, and the allocation of the DMA buffer does a special dance, ideally allocating a 64 kiB block that's aligned to a 64 kiB boundary (because DMA can't cross these boundaries), but willing to accept smaller blocks, successively trying for halved sizes, down to a single kiB. At each step, it'll prefer a properly aligned block, but will, alternatively, take a larger, non-aligned, block, out of which it can use half. It checks whether the first half of the allocated block is all between the same 64 kiB boundaries, and, if so, uses that. Otherwise, it finds the boundary that's inside the allocated block, and places its buffer there.
Only after the DMA buffer is allocated, does it probe for the presence of the board. The reason for this is that the probe will attempt to initiate a DMA transfer, this being the only certain way to find out whether the board is a plain B004 compatible one, or a B008. Later B008 boards can be observed to be DMA capable just from their status information, but the early revisions simply have to be recognized by seeing if DMA will work.
Note how the probe uses simple byte I/O to check for the presence of a B004 compatible device. Device found, it then verifies that it can enable interrupts properly, before disabling them, and flagging the device as available for use. Finally, it fires off a DMA write of one byte. If the board is a B008, this'll work, and generate an interrupt that will be received by b004_intr(), which in turn will enable DMA use in the driver.
If DMA is available, and the read request is for more than a few bytes, b004_read() will copy the information about the transfer, and call dma_read() to handle it. Only one DMA transfer can be ongoing at any one time, so there is a single data structure, named dma, which holds the information. Before calling dma_read(), b004_read() will request an alarm from the system, so a hanging DMA transfer can be aborted after the requested timeout.
If DMA is not available, or if the read request is very small, b004_read() will do polled I/O, a single byte at a time, using the B004 data and status registers. In this case, it will keep an eye on the time itself, aborting the polling loop if the timeout is reached.
This is very similar to b004_read(), except for the direction of the I/O operations.
In this little step function, dma.size is the requested number of bytes to read, dma.done the number of bytes read so far, and dma.chunk the number of bytes currently being transfered. Note that it is called once from b004_read(), before any DMA transfer has been initiated (so dma.chunk is 0), and then from b004_intr() each time a requested DMA operation is completed.
The four main if() blocks of the function perform these tasks:
This is very similar to dma_read(), except for the direction of the I/O operations.
The initiation of a DMA transfer has four parts. First of all, the host DMA controller is initialized, preparing it to let the B008 board perform the transfer. Next, the B008 has its correct interrupt enable bits set: either for input or output, depending on direction of transfer, and, in either case, the one for DMA. With this in place, the IRQ can be enabled (the board is now actively de-asserting the interrupt), and, finally, the B008 DMA controller is told to perform the appropriate operation. It will generate an interrupt when it is done.
Interrupts are only generated for completed DMA operations. The handler has to acknowledge them in order for the board to be able to generate further interrupts for later requests, so that's done first of all. Next, we have to recognize the interrupt that signals that the experimental DMA transfer started by b004_probe() has completed, and note that we have a B008, and DMA may thus be used.
Once the probe has been completed, further interrupts will be caused by actual DMA transfers, so the dma_read() and dma_write() step functions will be called from the interrupt handler as appropriate. Note how these calls are followed by a check for a completed read or write operation (dma.endpt == 0), and the system alarm for the operation timeout is then cancelled.
I ported the most important supporting applications to MINIX, and they may be found in in minix/commands/inmos. The implementation of the INMOS standard API mentioned above is in the file b004link.c in that directory.
iserver is the standard INMOS host server for supporting Transputer applications, and while other versions exist, what's supplied here is a simple port of the official INMOS version of it.
The other two utilities, ispy and mtest, are tools for observing and testing the Transputer network on the board. Additionally, ispy can modify the C004 configuration, changing the link network between the TRAMs. These utilities are the ones written by Andy Rabagliati of INMOS, later modified by Jeremy Thorp and Øyvind Teig.
The utilities are reputed to work well, so any bugs in this MINIX port are probably mine.
The complete set of patches to add all this to MINIX 3.4.0rc6 may be downloaded from this web site. The patch kit is quite comprehensive, including the changes to Makefiles, system header files, and set lists, that are needed to get the driver and tools properly built and installed along with the rest of the system.