I closed the last blog article with the remark that testing dasm3.py was hairy. This is owed to the fact that I tested dasm3.py against DEBUG.EXE, in itself a messy and inconsistent program that injects its own irritations into the infamously contrived 8086 architecture. dasm3.py is now a google-code project and can be found at http://code.google.com/p/dasm3/.
The scope and ambition of dasm3.py is modest, but I believe that I lived up to an important vision of Tunes celebrity François-René (“Fare”) Rideau that he expressed here. An important quote:
Literate Programming, and D.E. Knuth‘s attempts with WEB and C/WEB (see this interview of D.E. Knuth http://www.clbooks.com/nbb/knuth.html) are actually ways to pass more information about programs. To pass information that programming languages themselves don’t/can’t/can’t-efficiently pass, through well-organized human-readable documentation. This is A GOOD THING, because there will ALWAYS be things that humans can (already) express that machines cannot express (yet). But this is NOT THE PANACEA, because there ARE things that the machines ACTUALLY COULD express with high-level languages, that pure literate programming over low-level languages require the human to not only to write, but to check, when a computer is much better suited to check them. Knuth completely ignores the meta-capabilities of computers.
[Emphasis in the original]
The operative verbs here are “express” and “check”. Fare’s vision is that better programming languages
- can express and pass more meaning than current programming languages, thereby making their inner workings more transparent
- facilitate proof of their correct functioning by formal methods
dasm3.py is written in the conventional programming python, so I had to provide
- several wiki-pages for explaining how the program works, in order to instruct interested parties in text (as Fare suggests with “literate programming”)
- an extra computer program to verify dasm3.py’s output
In other words, since Fare’s vision of a sufficiently futuristic programming language is not here (“yet”), I had to go that extra mile and make a wiki- and test-prosthesis for dasm3.py. dasm3.py is not a literate program, but this is a superficial formality. Michael Heyeck, dasm3.py’s original author, did a good job in providing the literature for his program. I supplemented what I was missing when building on his work, in the hope that future programmers interested in dasm3.py don’t have to duplicate my head-scratching, experiments and research. For demonstrating dasm3.py’s correctness, I had to resort to old-fashioned test programs and explain how the tests work and why I believe they are convincing. (As for the lack of “literacy” of Michael Heyeck’s and my programming: I believe that literate programming is extremely hard for most cases; a wiki is much more fun and easy and almost as good. I will defend this thesis in a future blog article.)
The desired (but subtle) connection between literate programming and correctness did not escape Chris Lee, author of Literate Programming — Propaganda and Tools:
From a purist standpoint, a program could be considered a publishable-quality document that argues mathematically for its own correctness. A different approach is that a program could be a document that teaches programming to the reader through its own example. A more casual approach to literate programming would be that a program should be documented at least well enough that someone could maintain the code properly and make informed changes in a reasonable amount of time without direct help from the author.
What could Fare’s future look like?
In contrast to some other 8086 disassemblers on the internet, dasm3.py is largely data-driven. Most of dasm3.py’s reflection of the 8086′s instruction set is in the opcode sheet. It could be even more data-driven, thereby making much of the dispatching, if-elif ladders and many of the sets of mnemonics and opcodes obsolete. This introduction of extra columns and values into the opcode table would expand the scope of a conventional opcode sheet, but it would work out nicely and make the actual code more compact. Note that, like all “data formats”, an opcode table is an elementary form of a domain-specific language.
The smart authors of the New Jersey machine code toolkit have built a spiritual empire on this idea and developed a generic framework for implementing all sorts of assemblers and disassemblers for arbitrary CPUs. What would delight Fare is the newer experimental version programmed in ML — a language for expressing mathematical proofs. I have only a vague understanding how this magic works; I only skimmed the papers — that material is way out of my league. The overview of the project is exciting though, and I take it as evidence that the task of reflecting instruction sets one way or the other is an interesting field and a productive fruit fly for testing and developing ideas on domain-specific languages, and can, by extension, even show the way to sufficiently “high-level” languages that Fare had in mind.
Have you ever invented a domain-specific language? Do you like the 8086 instruction set architecture? Have you ever written a literate program?