(page 1) ocelot is a ruby compiler I've been trying to write. or, more properly speaking it's a ruby-to-c translator. my goal is a fairly efficient implementation of ruby. (page 2) so let's look at an example of an efficient language - c. c is efficient because the compiler knows what the types of things are. here's a fibonacci calculator. when it comes time to emit the instructions for the minus operators, the compiler can look at the types of the operands and see that they are both integers, and thus use the integer version of the subtract instruction, rather than float or char subtract. likewise with the plus instruction, the operands are what's returned from fib, an int, so the plus instruction is the int version. furthermore, when emitting the call to fib, the compiler can see where calls to fib will go. even if c were polymorphic -- if multiple fibs were allowed -- the compiler would still know where all the targets of a call fib could end up. (page 3) here's a ruby version of the same algorithm. ruby is several orders of magnitude slower than c when running this code. the major difference between the two versions -- other than syntactical changes -- is that ruby has no type declarations, and thus its very hard for a potential compiler to get a grip on what things might be at runtime. in this code, we'd normally expect n to be a Fixnum, but it could be a String, an Array, anything. to support the full semantics of the language, an implementation has to allow all those types at runtime. that makes it very difficult to generate efficient callsites -- and callsites are everywhere in ruby code. so, since ruby is so dynamic and has no type declarations, it makes writing a compiler is pretty hopeless. the compiler can't get a grip on what the types of expressions are and can't emit efficient code for them. or, can it? (page 4) now, the culture in ruby is very test-heavy; no other language community emphasises tests so much. here's an example test for the fib method. now, there's something very interesting about this test -- notice that the arguments to fib in the test are all integers -- Fixnums. Likewise with the results from fib. so, maybe a compiler could actually run the unit tests of the code its compiling, see what the types of the expressions are as they go flying past, and use pragmatically as a kind of type declaration. I call this technique (page 5) type induction. type induction is to be contrasted with ocaml-style type inference, in which a compiler sees what methods are being used on a specific variable and infers a declaration for that variable of the type (or types) which allow those methods. I should mention that I didn't invent this concept. To my knowledge, two other people independantly came up with it. They are rich morin and josh susser. I'm just the one trying to implement it. I did invent the name type induction, tho. (page 6) now, I've been using this controversial and religiously loaded term, 'type'. before i go any further I should try to more formally define type in terms or a ruby program. even tho this is a dangerous word, which has been the cause of flamewars in the past, I believe I can come up with definitions of type which are both useful for writing a compiler and something we can all agree on. so, what is a type? (page 7) here are some wrong answers. some might say that ruby has no types. While it has no type declarations, values in ruby programs certainly do have types. a better answer might be that ruby classes are the types. this is closer. but there's still something missing, because altho the birth type of an object is certainly its class, it's not the whole story. it neglects the effect of singleton classes; the type, the behavior, of an object can be changed at runtime. so, maybe these singleton classes are types. this is the closest definition yet, but it's still not quite good enough. this is actually a correct definition of types, but it's not very useful. The problem is that there are too many singleton classes. every object, if it has a singleton class at all, has a unique singleton class. in a program which uses singleton classes, there are potentially a very large or even infinite number of singleton classes. that's too many for a compiler to be able to analyze or allocate statically. (page 8) so, here's a good definition of types: type is class + decorators. so, when an object is born, its type and class are the same thing. but over its lifetime, it might get decorated with various mixins to its current behavior. these decorators come in the 3 forms: extension by a module, singleton classes, and singleton methods. the birth class of an object, plus the list of decorators that have been applied to it over its lifetime together define its type. (page 9) now, here's another definition of type, perhaps even a little better than the last one. type is the object's set of name to method body mappings. that is, type is a hash from names to method implementations. this is the most exact definition; it captures the essence of what a type is. it also results in the fewer types overall than the previous definition. that is to say, there may be multiple paths thru the graph of decorators to a single unique set of method name to method body mappings. (page 10) now, type inductance is a powerful technique, but it does have problems. one is the issue of coverage (page 11) here's the fibonacci code I showed before. (page 12) but what if I had written the tests like this instead? as you can see, i'm now testing fewer inputs to fib, only 0 and 1 in fact. as a result only the first line of fib will ever be exercised. since the 2nd line is never run, the type inducer will never get the chance to see what the types of its expressions are, and so no information about them can be gathered. so, to type induce a program you must first have good code coverage. (page 13) but there are other types of coverage. imagine we have this situation. there are 3 animal classes, each of which has a call method. there's also a zoo class, which is a collection of animals. when all the animals in the zoo make their call at once, we call that a cacophony. and, there's also a test for this cacophony method. but there's a problem with this test; I forgot to put a bird into the zoo I tested with. as a result, the type inducer will not see that the variable animal could have a type of bird inside the cacophony method. that will cause the compiler to generate incorrect or suboptimal code for that callsite. this is a coverage problem, but code coverage isn't the issue. the test covers all expressions in the cacophony method. the problem is that some expressions can have types which aren't covered by the unit test. there's a gap in what i call type coverage. (page 14) another problem with type induction is mocks. mocks are basically fake objects, with fake types. any mock objects in your tests will pollute the information obtained by the type inducer, will cause it to believe some of your expression can have types that aren't possible at runtime. even worse, since the actual real type that could be present at that point in your program may not be exercised, use of mocks can lead (again) to lack of proper type coverage. (page 15) (skip if time > 9:45) don't use mocks. in general, I dislike mocks because they divorce your tests from reality; use of a mock means you're testing your code in artificial conditions which don't correspond to how the code will be used in the real world. in a few circumstances mocks may be justified in order to be able to test code which interacts with something external, like a server or a piece of hardware. but even then, you should be sure to write another test which tests against the actual, real type that could occur at that point in the program. (page 16) (skip if time > 9:45) ok, we've talked about type induction and some of its problems; I'll be covering some solutions I've come up with for those problems in a bit, but first let's explore some of the c code generated by the compiler. keep in mind that the examples I am giving here are pseudocode; reality is usually more complicated. so what happens, for instance, when the compiler encounters a callsite? (page 17) here's a callsite snipped from one of the examples I showed before. what should that look like in c? (page 18) this is one possibility. this is the way that c++ compilers handle calls to polymorphic functions. each object has a hidden field in it; c++ calls that the vtable, but i'm calling it klass here, and that's also the term used by MRI internally. hanging off of that klass field are a table of pointers to functions actually in use by that particular object. now, this is an ok way to do it, c++ gets a lot of mileage out of this method, and it's certainly not known as a slow language. but's there's also another way to represent polymorphism. (page 19) instead of jumping thru a pointer, we could switch off the klass field, and when it has a type known to the compiler, just call straight to the implementation of the current method being called for that type. some of you are no doubt looking at this and saying, "whaaaaa?" "you want to take my nice clean little ruby callsites and turn them into that big ugly switch statement? why?" well, there are some good reasons to prefer this way. by avoiding a call thru a pointer, you're also avoiding the pipeline stall such indirect calls usually cause. a direct call, even preceded by several branches, can be predicted by the microprocessor better, and thus is usually faster. (page 20) (and flip back/forth) another advantage is that this form enables inlining. if the compiler knows exactly what the target or targets of a specific call are, then it's fairly obvious how to inline the corresponding method bodies right into the appropriate branch of the switch statement. whereas with the vtable-style polymorphic call (go back 2) it's not clear how the compiler can inline at all here, without turning the callsite into the switch form. (go forward 2) (page 21) in short, there are many benefits to having the target or targets of callsites be as obvious as possible. i should mention that i stole this idea from a paper that i read about smalleiffel, the open source eiffel compiler. (go back 1) now, there's actually something interesting going on in the default case of this switch statement. the default will be entered when a type is seen at runtime that the compiler didn't know about. this indicates a gap in type coverage of the unit tests. one thing that i could do is raise an exception or abend or something like that at this point. but this creates a problem. if there's a gap in your type coverage, all of a sudden the compiler will be generating code which has a bug in it... a bug which wasn't present when the program was interpreted. it's much, much better for the compiler to handle this case correctly. which, it can do by falling back to the interpreter. this rb_funcall is the interpreter's internal function for handling callsites; it's every bit as slow as a call in the interpreter usually is. but it does work. the other thing happening in this default clause is a warning is being printed. remember, executing a default indicates a gap in type coverage. and that is something that the programmer ought to know about; gaps in type coverage should be fixed by writing more tests. also, the compiler can collect such warnings emitted by previous runs of the program and use them as further type hints on subsequent recompiles, resulting in those previously slow cases being made faster. (go forward 2 -- page 22) (skip this section if >=10:00) now, let's try to talk as briefly as possible about object representation. (page 23) here's an example of a ruby class. as you can clearly see, it has 3 instance variables: foo, bar, and baz. how is this represented in c? (page 24) here's more or less what the interpreter does. all objects are represented by an instance of this RObject struct, which has 3 fields: a set of flags, a klass, which we talked about before, and a pointer to hash table holding the instance variables. This is ok, but it means that every instance variable reference requires a lookup in a hash table. is there a better way? (page 25) since we can clearly see what the instance variables of this class are, why not allocate them directly in the class's struct and turn all those hash lookups into array lookups, which are faster and use less space. the ivar table is still present for compatibility with the interpreter's way of representing objects and for storing any instance variables which are accessed dynamically. i believe that ruby 1.9 already does something like this. (page 26) there's a similar process when it comes to representing stackframes. (page 27) for the most part, methods are readily analyzed for the set of local variables used in them. (page 28) so instead of just using a hashtable to store all the locals, we can also create a structure which stores them, with fallback to a hash table for locals that are referenced dynamically. (page 29) now, other than dynamic polymorphic method calls, let's talk about some of the other issues that a ruby compiler runs into. (page 30) ruby has these singleton methods (page 31) and singleton classes (page 32) and modules you can extend your objects by. all 3 present the same problem. they cause an object to change type, change behavior, change its set of method name to method body mappings at runtime. (page 33) how is this to be accomplished in c? c is completely static; c types don't even really exist at runtime. (page 34) but remember that the type of a ruby object is stored in this klass field in the object header... there's no reason that field can't be changed as needed. (page 35) (pause) (page 36) all right so what about method_missing? that's another difficult dynamic feature that makes life hard for compilers. the issue here is that method_missing makes it really hard to follow flow of control. in a program which uses method_missing, there's no way to tell if some innocent-seeming method call is actually going to end up calling method_missing. I should add that if not for method_missing, we probably would have had a compiler 5 years ago, based on ocaml-style type inference. method_missing inhibits that. But it is useful, and used a lot in all kinds of neat ways. (page 37) so let's go back to this example method call I had before. say it could also call to a delegate at this point. Once the compiler figures out that a call to method missing is possible, it just generates a callsite for the appropriate version of method_missing at that point. You do have to have coverage for the Delegate type over every callsite which can result in a method_missing in your tests, so the type inducer can know which callsites might be method_missing callsites, but really, type induction handles this problem beautifully. (page 38) and then there's eval. or evil, as some like to pronounce it. this one has got to be impossible, right? after all, eval is the essence of what an interpreter does; it takes a string at runtime and interprets it, right then and there. (page 39) but it actually turns out that most calls to eval are actually static, in that the input passed to it is fixed and can only have one value... or at most a narrow range of a few values. discovering what that value is from static analysis of the program is not necessarily easy, but we can cheat, and use the same trick as with type induction; that is, we run the unit tests and see what strings are passed to eval from where, and then the compiler assumes that those eval callsites will have the same argument(s) at runtime. In other words, eval is converted into something like this: (page 40) check to see if the input was the expected one, and if so, inline the code directly at that point. notice that there is no eval call left here. there's also no quotes around the code being eval'd. now the semantically correct thing to do would be to fall back to regular eval if the input is not the expected one. (page 41) however, I feel that it's actually better to abend outright at this point. if eval has an unexpected input, it may indicate a fairly serious gap in eval coverage. On the other hand, it might also be a sign that user input is able to influence the argument to an eval, which is usually considered A Bad Thing. every eval call is a potential code injection attack, and having a tool which eliminates the possibility of such attacks altogether is probably a good idea. And expecting arguments to eval to be completely covered by your unit tests is not so extreme. This magic way of predicting the arguments to eval is something I call (page 42) eval prescience. (page 43) overall, the process of using the compiler works something like this: type induction and eval prescience are used to narrow down the actual range of dynamic behavior of your programs. The compile uses the information gathered from those stages to produce a static version of your program. But the information it has is only as complete as your tests. and most tests, even for ruby code are not really complete enough. so at runtime the program emits log statements telling you where there are gaps in test coverage. you should use that to make your tests more complete, which makes more information available next time the compiler is run. (page 44) Now, ruby is a dynamic language, and its full range of behavior cannot be captured in a static implementation. There are a few things that cannot ever be really fully compiled, altho it may be possible to compile to a program that falls back to the interpreter in those few cases. (page 45) so, here's one case. does anyone recognize this program? this is basically a very simple version of irb. it's what lisp calls a REPL or read-eval-print-loop. now, no matter how much you test this program, you'll never capture all the possible inputs to that eval. so you'll never really be able to compile irb. on the other hand, does anyone really need irb to be faster than it is now? this kind of use of eval is pretty rare, tho it does occur. (page 46) there's also a dynamic typing equivalent to the dynamic eval case. in this situation, i've got 20 different modules which are being mixed into an object in random order. that's 20! different types which is probably more than the number of atoms in the universe. no amount of testing will enumerate all those types, nor will there ever be enough memory to allocate them all statically. so, this program is also incompilable. but, i believe this case to be even rarer than dynamic eval. in fact, i've never seen or heard of code which uses this capability. (page 47) that's all I have today. thank you very much, everyone. and in the time remaining, i'd like to open the floor up to questions.