Delphi bytecode compiler

January 10, 2017 Leave a comment Go to comments

This is a pet project I have been playing with on/off for a couple of years now. Since I’m very busy with Smart Mobile Studio these days I havent really had time to write much about this in a while. Well, today I needed a break for JavaScript – and what is more fun than relaxing with a cup of coco and bytecodes?

The LDEF bytecode standard

Do you remember Quartex Pascal? It was an alternative pascal compiler I wrote a couple of years back. The plan was initially to get Smart Mobile Studio into that codebase, and essentially kill 3 birds with one stone. I already have parsers for various languages that I have written: Visual Basic .NET, Typescript, C# and C++. Adding support for Smart Pascal to that library of parsers would naturally give us a huge advantage.

In essence you would just pick what language you wanted to write your Smart apps in, then use the same classes and RTL across the board. You could also mix and match, perhaps write some in typescript (which is handy when you face a class that is difficult to “pascalify” via the import tool).

LDEF test environment

LDEF test environment

But parsing code and building an AST (abstract symbol tree, this is a hierarchy of objects the parser builds from your source-code. Basically your program in object form) is only half the job. People often start coding languages and script engines without really thinking it through (I have done that myself a few times. It’s a great way to learn the ropes, but you waste a lot of time), after a while they give up because it dawns on them that the hard part is still ahead: namely to generate executable code or at least run the code straight off the AST. There is a reason components like dwScript have taken years to polish and perfect (same can be said of PAX script which is probably one of the most powerful engines available to Delphi developers).

The idea behind Quartex Pascal was naturally that you could have more than one language, but they would all build the same AST and all of them emit the same bytecode. Not exactly a novel idea, this is how most scripting engines work – and also what Java and .Net have been doing since day one.

But the hard part was still waiting for me, namely to generate bytecode. This may sound easy but it’s not just a matter of dumping out a longword with the index of a method. You really need to think it through because if you make it to high-level, it will cripple your language. If you make it to low-level – the code will execute slow and the amount of bytecodes generated can become huge compared to normal assembly-code.

And no, I’m not making it compatible with CIL or Java, this is a pure object pascal solution for object pascal developers (and C++ builder naturally).

LDEF source language

I decided to do something a bit novel. Rather than just creating some classes to help you generate bytecode, I decided to create a language around the assembly language. Since C++ is easy to parse and looks that way because of its close connection to assembler, I decided to use C++ as the basic dialect.

So instead of you emitting bytecodes, your program only needs to emit LDEF source-code. Then you can just call the assembler program (command line) and it will parse, compile and turn it into assemblies for you.

Here is an example snippet that compiles without problem:

struct CustomType
{
  uint32 ctMagic;
  uint32 ctSize;
  uint8  ctData[4096];
}

class TBaseObject: object
{
  /* class-fields */
  alloc {
    uint8 temp;
    uint32 counter;
  }

  /* Parser now handles register mapping */
  public void main(r0 as count, r1 as text) {
    alloc {
      /* method variables */
      uint32 _longs;
      uint32 _singles;
    }
    move r1, count;
    jsr @_cleanup;
    push [text];
  }

  /* test multi push to stack */
  private void _cleanup() {
     push [r0, r1, r2];
  }
}

The LDEF virtual machine takes care of things like instance management, VMT (virtual method table), inheritance and everything else. All you have to do is to generate the LDEF assembly code that goes into the methods – and voila, the assembler will parse, compile and emit the whole shabam as assemblies (the -L option in the command-line assembler controls how you want to link the assemblies, so you can emit a single file).

The runtime engine is written in vanilla object pascal. It uses generics and lookup tables to achieve pretty good performance, and there is ample room for a JIT engine in the future. What was important for me was that i had a solution that was portable, require little maintenance, with an instruction set that could easily be converted to actual machine-code, LLVM or another high level language (like C++ that can piggyback on GCC regardless of platform).

LDEF instructions

The instruction set is a good mean of the most common instructions that real processors have. Conceptually the VM is inspired by the Motorola 68020 chip, in combination with the good old Acord Risc cpu. Some of the features are:

  • 32bit (4 byte) opcode size
  • 32 registers
  • 1024 byte cache (can be adjusted)
  • Stack
  • Built in variable management
  • Built in const and resource management
  • Move data seamlessly between registers and variables
  • Support for records (struct) and class-types
  • Support for source-maps (both from assembler to high-level and reverse)
  • Component based, languages and emitters can be changed

There are also a few neat language features that I have been missing from Delphi, like code criteria. Basically it allows you to define code that should be executed before the body of a procedure, and directly after. This allows you to check that parameter values are within range or valid before the procedure is allowed to execute.

Here is the pascal version:

function TMyClass.CalcValues(a,b,c: integer): integer;
begin
  before
    if (a<1) or (b<1) or (c<1) then
      Fail('Invalid values, %classname.%methodname never executed');
  end;

  result := a + b div c;

  after
    if result <23 then
      Fail('%classname.%methodname failed, input below lowest threshold error');
  end;
end;

I decided to add support for this in LDEF itself, including the C++ style intermediate language. Here it looks like this:

  /* Enter + Exit is directly supported */
  public void main(r0 as count, r1 as text) {
    enter {
    }

    leave {
    }

    alloc {
      /* method variables */
      uint32 _longs;
      uint32 _singles;
    }
    move r1, count;
    jsr @_cleanup;
    push [text];
  }

Why Borland/Embarcadero didn’t add this when they gave the compiler a full overhaul and support for generics – is beyond me. C++ have had this for many, many years now. C# have supported it since version 4, and Java I am usure about – but its been there for many years, thats for sure.

Attributes and RTTI

Attributes will be added but support for user-defined attributes will only appear later. Right now the only attributes I have plans for controls how memory is allocated for variables, if registers should be pushed to the stack on entry automatically, if the VMT should be flattened and the inheritance-chain reduced to a single table. More attributes will no doubt appear as I move forward, but right now I’ll stick to the basics.

RTTI is fairly good and presently linked into every assembly. You cant really omit that much from a bytecode system. To reduce the use of pure variables I introduced register mapping to make it easier for people to use the registers for as much as possible (much faster than variables):

  public void main(r0 as count, r1 as text) {
  }

You may have noticed the strange parameters on this method? Thats because it’s not parameters, but definitions that link registers to names (register mapping). That way you can write code that uses the original names of variables or parameters, and avoid allocating variables for everything. It has no effect except making it easier to write code.

LDEF, why should I care?

You should care because, with a bit of work we should be able to compile fairly modern Delphi source-code to portable bytecodes. The runtime or engine that executes these bytecodes can be compiled using freepascal, which means you can execute the program anywhere FPC is supported. So more or less every operative system on the planet.

You should care because you can now write programs for Ultibo, the pascal operative system for Raspberry PI. It can only execute 1 real program (your embedded program), but since you can now run bytecodes, you will be able to run as many programs as you like. Just run each bytecode program in a thread or play it safe and call next() on interval from a TTimer.

You should care because once LDEF is finished, being able to transcode from object pascal to virtually any language will be a lot easier. JavaScript? C++? Python? Take your pick. The point here is standards and a library that is written to last decades rather than years. So my code is very clean and no silly pointer tricks just to get a few extra cycles. Stability, portability and maintainance are the values here.

You should care because in the future, components like HexLicense will implement its security code in LDEF using a randomized instruction-set, making it very, very hard for hackers to break your product.

Advertisements
  1. January 10, 2017 at 6:20 am

    Sign me up for a BETA 🙂

  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: