Tagging numbers in different languages

TAGGING NUMBERS IN DIFFERENT LANGUAGES FULL

Every time you dereference a pointer you’re going to end up stalling your CPU instruction stream while it copies data from memory in to a register. The last important part of this story before we get to tagging itself is the cost of fetching the type and its method dictionary. In this way, the 3 or 4 most common types of receivers at that call-site get optimised.

TAGGING NUMBERS IN DIFFERENT LANGUAGES FULL

Later, we fill them in with class id tests and jump instructions to the function that would be found if we did the full lookup. In a Just-In-Time compiler, we leave space before the actual call with NOP operations. Inline refers to the nature of the cache, in that it’s typically placed right next to the function call itself. When a function is implemented on many types and you can safely call it across those types without having to first know the type, it’s called polymorphic dispatch. #printString, in particular, is implemented on many different classes. It’s definitely worth breaking that mouthful down before people accuse us of talking like a functional programming! The first word polymorphic refers to the nature of our dispatching. This isn’t a problem in practice because Smalltalk virtual machines used a technique called ‘polymorphic inline cache’. This approach, as you might imagine, costs a lot of time when you’re running. If we know we want to send #printString to an object called foo, first we get the class from foo, then lookup #printString in its method dictionary, then we do a function call like any other language. Classes are our types (sort of, there’s also more information on a class about the classification of its memory layout called indexedType which is usually #objects, but is sometimes #bytes or other exotic things).

In Smalltalk we have a method dictionary for every class and we know the class of every object at runtime. In Smalltalk and many other dynamic languages, there is also the concept of dynamic dispatch which pertains to how you know what function to call at runtime instead of compile time. That’s one reason why additive types are also knows as tagged unions, because unlike a C union, the type of what was stored inside the union is kept. It can only do this if it can check the type at runtime. Pattern matching allows you to cast a generic piece of data back in to a specific form. Any language with tagged unions / additive types / disjoint unions, or in other words: algebraic types is going to also need some type information at runtime. It’s strong in that the piece of data has a fixed type and won’t ever pretend to be what it is not (unlike in languages with type casting), but it’s dynamic in that the type is determined at runtime instead of compile time (which is called static or manifest typing).ĭynamic languages aren’t the only languages interested in runtime type information though. This is called strong typing and dynamic typing combined. What it does know is the type at runtime, all the time.

Your typical C/C++ style language knows all types at compile time (most of the time) and therefore knows what functions can be performed with it as an argument and how it is formatted in memory.ĭynamic languages tend not to know the type of an object at compile time, or if it does know it, it doesn’t know it for long. In some programming languages, the resolution of what to do with a piece of data depends on a type that is only known at runtime. Let’s get back to it with something pretty heavy weight - pointer tagging.