Date: Tue, 01 Aug 89 15:16:17 +0100 From: dcw@ukc.ac.uk To: rde@ukc.ac.uk Subject: Superoptimizer I just came across a copy of the news item about the "superoptimizer" that we were talking about recently. David From eagle!ukc!mcvax!uunet!husc6!mailrus!ames!ncar!oddjob!mimsy!chris Mon Jul 4 09:45:25 BST 1988 Article 4992 of comp.arch: Path: eagle!ukc!mcvax!uunet!husc6!mailrus!ames!ncar!oddjob!mimsy!chris >From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.arch Subject: ASPLOS: `Superoptimiser' Message-ID: <12296@mimsy.UUCP> Date: 3 Jul 88 05:14:51 GMT Reply-To: chris@mimsy.umd.edu (Chris Torek) Organization: University of Maryland, Dept. of Computer Sci. Lines: 200 Now that I have it in front of me: Second International Converence on Architectural Support for Programming Languages and Operating Systems (ASPLOS II) October 5-8, 1987 Palo Alto, California Computer Society Order Number 805 Library of Congress Number 87-80438 IEEE Catalog Number 87CH2440-6 ISBN 0-8186-0805-6 SAN 264-620X ACM Order Number 556870 ISBN 0-89791-238-1 (no, I have no idea why there are two ISBNs) The paper itself is (pp. 122--126): Superoptimizer -- A Look at the Smallest Program Henry Massalin Department of Computer Science Columbia University New York, NY 10027 ABSTRACT Given an instruction set, the superoptimizer finds the shortest program to compute a function. Startling programs have been generated, many of them engaging in convoluted bit-fiddling bearing little resemblance to the source programs which defined the functions. The key idea in the superoptimizer is a probabilistic test that makes exhaustive searches practical for programs of useful size. The search space is defined by the processor's instruction set, which may include the whole set, but is typically restricted to a subset. By constraining the instructions and observing the effect on the output program, one can gain insight into the design of instruction sets. In addition, superoptimized programs may be used by peephole optimizers to improve the quality of generated code, or by assembly language programmers to improve manually written code. [Here it seems appropriate to attach the ACM copyright: Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. (c) 1987 ACM 0-89791-238-1/87/1000-0122 $00.75 `$00.75'? :-) ] I think it is important to note section 4.1, Current Limitations: Even with the probabilistic test, the exhaustive search still grows exponentially with the number of instructions in the generated program. The current version of superoptimizer has generated programs 12 instructions long in several hours running time on a 16MHz 68020 computer. Therefore, the superoptimizer has limited usefulness as a code generator for a compiler. Here are the 68020 routines for the various functions (I use the MIT/Sun mnemonics below; I like them better): signum: | x in d0: addl d0,d0 subxl d1,d1 negxl d0 addxl d1,d1 | signum(x) in d1 abs: | x in d0: movl d0,d1 addl d1,d1 subxl d1,d1 eorl d1,d0 subl d1,d0 | abs(x) in d0 max: | d0=X, d1=Y subl d1,d0 subxl d2,d2 orl d2,d0 addxl d1,d0 | max(X,Y) in d0 min: | d0=X, d1=Y subl d1,d0 subxl d2,d2 andl d2,d0 addl d1,d0 | min(X,Y) in d0 minmax: | d0=X, d1=Y subl d1,d0 subxl d2,d2 andl d0,d2 eorl d2,d0 addl d1,d0 addl d2,d1 | d0=max(X,Y); d1=min(X,Y) logicals: | X in d0 negl d0 subxl d0,d0 | d0 = X==0 ? 0 : -1 negl d0 | d0 = X==0 ? 0 : 1 | X in d0 negl d0 subxl d0,d0 notl d0 | d0 = X==0 ? -1 : 0 | X in d0 negl d0 subxl d0,d0 addql #1,d0 | d0 = X==0 ? 1 : 0 8 digit BCD to binary (done in three steps by Superoptimizer): | 8 digit BCD number in d0 movl d0,d1 andl #0xf0f0f0f0,d1 lsrl #3,d1 subl d1,d0 subl d1,d0 subl d1,d0 movl d0,d1 andl #0xff00ff00,d1 lsrl #1,d1 subl d1,d0 lsrl #2,d1 subl d1,d0 lsrl #3,d1 addl d1,d0 movl d0,d1 swap d1 mulu #0xd8f0,d1 subl d1,d0 | binary equivalent in d0 Multiplication by constants: | d0 *= 29 movl d0,d1 lsll #4,d0 subl d1,d0 addl d0,d0 subl d1,d0 | d0 *= 39 movl d0,d1 lsll #2,d0 addl d1,d0 lsll #3,d0 subl d1,d0 | d0 *= 156 movl d0,d1 lsll #2,d1 addl d1,d0 lsll #5,d0 subl d1,d0 | d0 *= 625 movl d0,d1 lsll #2,d0 addl d1,d0 lsll #3,d0 subl d1,d0 lsll #4,d0 addl d1,d0 and an interesting note on division: A general divide by constant that works for all 32-bit arguments is too long to realize any time gain over the divide instruction, and is certainly not shorter. Additionally, there doesn't seem to be any nifty arithmetic-logical operations [sic] that simplify the process. The generated programs just multiply by the reciprocal of the constant. Since we do an exhaustive search, this negative result can be seen as a confirmation of the inherent high cost of divisions for the instruction sets [68020 and 80?86] considered.