Archive for January 13, 2017

LDef Intermediate Language

January 13, 2017 Leave a comment

The LDEF bytecode engine is some time away from completion, but the IL source format that the assembler reads and turns into bytecode is getting there. At the moment there are only a few tidbits left to explore, like interfaces and generics, but those will be added.

It’s a real brain teaser because – some of the stuff that makes up a compileris not really related to code. When you think about generics you often make the mistake of thinking this is a code feature, like inheritance or virtual methods; it’s something that the code-emitter has to deal with or runtime engine to take height for. But generics is actually implemented higher up. It exists between the parser and code-emitter.

Interfaces is another mirage or technological illusion. When you work with classes and interfaces you get a sense that it’s a solid thing, you know – you write a class and create objects and people imagine these objects as independent, material objects in a virtual space. I tend to think of instances as molecules.

But objects is ultimately an illusion. Im not going to cover the intricate details of object orientation here, but OOP is actually about separating the data (fields) from the code acting on that data (members). So the only objects that really exist in the memory of a computer when you create instances, are buffers representing the fields of your class – combined with references and a list mapping the entrypoints for the members that makes up that instance (VMT). Compilers actually mimic object orientation by adding a secret parameter to all your methods, namely the “self” parameter. This “self” is a pointer to a record that contains all the information pertaining to that instance.


Which is really cool because then you can create as many instances as you like – and they all use the same code. Which ofcourse object orientation is all about. But there is no such thing as an independent instance floating around computer memory. That is an optical illusion of sorts.

LDEF virtual machine

Virtual machines get’s to have all the fun. I mean, writing a compiler that emits real machine code is in itself not that hard, but generating a scheme that makes object orientation work and that keeps track of everything is. The x86 cpu architecture may be powerful but it’s an absolute bitch to work with. It has few registers (compared to ARM and PPC), several instruction sets and is known to be a somewhat unfriendly place. This is the reason that compiler makers tend to stick to a golden mean of instructions. Compilers like C++ builder and Delphi could actually generate faster and more efficient code if they knew exactly what cpu and architecture you used. But since that’s not how PC’s work and there are some 20 different cpu models on the market at any given time – with huge differences between AMD and Intel, it makes sense to use a safe collection of instructions.

LDEF is a bytecode runtime engine. And one of the perks of bytecode is that it’s very much abstracted from the whole issue of real-life assembly. Most programmers think that – if you make bytecodes then you dont have to think about low level stuff, which is cheating. That may be the case for other languages and engines, but not LDEF. In fact the whole point of writing a new engine is because I wanted a bytecode format that was closer to machine code. This is very important because at some point myself or someone else will write a JIT compiler for this, and if the runtime is to high-level or abstract, that is going to be very hard.

LDEF testbed

LDEF testbed

The LDEF instruction-set represents a golden mean of the average processor instruction set. I have taken instructions that most processors have, typical stuff like add, subtract, divide, modulus, multiply (and so on). Testing is where I have given LDEF some advantages, for instance when testing a list of parameters for conditions. Instead of keeping that as two instructions (test, branch [condition]) I have isolated that in a single instruction.

Let’s look at the instruction set so far:

  • move
  • blit – move memory block
  • valloc – allocate variable
  • vfree – free variable
  • malloc – allocate memory
  • mfree – release memory
  • add
  • sub
  • mul
  • muldiv
  • div
  • mod
  • moddiv
  • lsr – logical bit-shift right
  • lsl – logical bit-shift left
  • cmp – compare
  • tst -test
  • bne – branch not equal
  • beq – branch equal
  • ble – branch less
  • bgt – branch greater
  • jsr – jump sub routine
  • jmp – jump absolute
  • push – push to stack
  • pop – pop from stack
  • IFnb – test and branch if not equal
  • IFtb – test and branch if equal
  • throw – cause exception
  • syscall – invoke engine spesific, delphi code

One of the benefits of a virtual machine, is that you can do some magic with variables. In a real machinecode compiler, variable allocation and doing read / write operations can be very complex. But in a virtual machine you thankfully get to do something about that.

So LDEF allows you to move data seamlessly between variables and registers, which really is a massive time saver. The code standard also supports two levels of resources, global and local. The latter meaning class in this case. So there is ample room for high-level languages to store their data and implement classical features (like “resourcestring” in Delphi).

You also have support for constants, these are stored separately in the bytecode and loaded into a lookup-table associated with a class. Constants are different from resources in that they are designed to be quickly referenced. Constants has more infrastructure in the form of lookup-tables – because they are meant to be used with the instructions. Resources are more like resource-strings in Delphi, they are stored in their own place in the bytecode and are loaded on demand. Constants are not. They are loaded into memory and are instantly available.

Having said that, the bytecode compiler also supports in-place data. Meaning that you can chose to write constant data where they are used. So instead of having to load a string constant (for example) before working with it, you can compile the string directly into the move instruction (or any other instruction).So this is actually perfectly valid:

move r0, "Hello world";

Other virtual machines, like the Java engine, force you to do this in two steps, which is slower:

lda r0, cost_id; // reference const by id
ldc r1, (r0); // get actual value into r1

You can also assign variables directly, you dont need to load the effective address first. So there is no extract cost involved in moving data between a register and variable, variable and variable, or resource / cost to a variable. This makes life much easier:

move V[$00F6], "String assignment";
move V[$01F0], 19875.32;
move V[$196], V[$197];


Another thing that is beneficial is that LDEF has 32 registers to work with (you can actually change that, so if you need 64 that’s no problem). How you use these is up to you, but it gives a lot of room for OOP schemes. For instance, if a language has a maximum limit of 6 parameters per method – you actually dont need to pass values on the stack at all (like Java does) but you can map parameters directly to registers.

Delphi, C++ builder and other native solutions tend to use stack-schemes to store information. So a pointer to some list is stored as the fourth item on the stack, and some other entry on the third (and so on). Which is perfectly fine and very effective on a real CPU (you really have no choice on a real x86). In LDEF you can now use spesific registers instead, which is a lot easier. What scheme you chose to use if naturally up to you – but at least the option is there.

Here are the registers LDEF presently supports

  • R0 – R31: general purpose registers
  • SP: stack pointer
  • PC: program control register
  • BC: branch control register
  • ES: exception state register


If you are wondering why a virtual machine would support so much similar stuff, constants and resources, in place data or not, this all have to do with optimalization.

For example, if your compiler decides to collect all assignment values as constants, the codesize of your program might be smaller; it all depends on what your code looks like. You will naturally consolidate identical strings, integer values and so on. So even if you have a program that writes “Hello world!” 10.000 times to the console, only one “Hello world!” will be stored in the bytecode file. So constants gives you smaller code, but at a cost of speed – since constants needs a lookup whenever they are used.

Now constants here are not “cost” declarations. Constants on this level is actually the values you write in plain-text in your program, stuff like this:

FMyText := 'Welcome to my program';
FCount := 100;

Both “welcome to my program” and “100” are constants. These are values that will never change, and compilers usually have to deal with these values in one of two ways. Both of which I have explained above (in place or not).

Source format

The LDEF intermediate format looks pretty much like C++. C is fast and easier to parse than other languages and it made sense to pick that.But this similarity is paper thin, in that only the constructs of C++ is used. Methods only constains the LDEF assembly language code, and variables are isolated in Alloc sections. The virtual machine takes care of the OOP layer for you so you dont have to worry about that.

Here is an example to give you a feel for it:

#include <stdio>;
#include <oop>;

struct CustomType
  uint32 ctMagic;
  uint32 ctSize;
  uint8  ctData[4096];

  dc.s #welcome, "welcome to LDef";

class TBaseObject: object
  /* class-fields */
  alloc {
    uint8 temp;
    uint32 counter;
    CustomType handle;

  /* Parser now handles register mapping */
  public void main(r0 as count, r1 as text) {
    enter { }
    leave { }

    alloc {
      /* method variables */
      uint32 _longs;
      uint32 _singles;
    /* calc longs */
    move r1, count;
    mod r1, 8;
    move r2, count;
    move _longs, r1;

    /* calc singles */
    move r3, r1;
    mul  r3, 8;
    sub  r2, r3;
    move _singles, r2

  /* test multi push to stack */
  private void _cleanup() {
     push [r0, r1, r2];

Keep in mind that the format is not 100% finished yet, there are still a few tidbits that needs to be worked out. But the general system is firmly in place.

One of the cool things is that I get to add a few missing features from my favorite language, Delphi (object pascal). If you look at the first method (main), you will notice that it has two extra sections: enter and leave.

These sections can contain code that execute directly before and after the body of the method. In pascal it would look something like this:

procedure TMyClass.SomeProc;
    writeln('about to execute someproc');

    writeln('Someproc finished');

The purpose of these, especially the before() section, is to make sure the parameters are valid. So before is used to check that the parameters are within a legal range. After is naturally used to ensure that the result (for functions) is valid.

It also opens up for some interesting logging and debugging aspects.

More to come

This is turning into a long rant, but hopefully you found it interesting. I’ll keep you posted as the engine progress. There is still some way to go, but we are getting there. Once LDEF is implemented the fun starts, thats when I’ll code high-level parsers that targets the engine. First stop is naturally and object pascal compiler 🙂