Archive
Delphi bytecode compiler
This is a pet project I have been playing with on/off for a couple of years now. Since I’m very busy with Smart Mobile Studio these days I havent really had time to write much about this in a while. Well, today I needed a break for JavaScript – and what is more fun than relaxing with a cup of coco and bytecodes?
The LDEF bytecode standard
Do you remember Quartex Pascal? It was an alternative pascal compiler I wrote a couple of years back. The plan was initially to get Smart Mobile Studio into that codebase, and essentially kill 3 birds with one stone. I already have parsers for various languages that I have written: Visual Basic .NET, Typescript, C# and C++. Adding support for Smart Pascal to that library of parsers would naturally give us a huge advantage.
In essence you would just pick what language you wanted to write your Smart apps in, then use the same classes and RTL across the board. You could also mix and match, perhaps write some in typescript (which is handy when you face a class that is difficult to “pascalify” via the import tool).

LDEF test environment
But parsing code and building an AST (abstract symbol tree, this is a hierarchy of objects the parser builds from your source-code. Basically your program in object form) is only half the job. People often start coding languages and script engines without really thinking it through (I have done that myself a few times. It’s a great way to learn the ropes, but you waste a lot of time), after a while they give up because it dawns on them that the hard part is still ahead: namely to generate executable code or at least run the code straight off the AST. There is a reason components like dwScript have taken years to polish and perfect (same can be said of PAX script which is probably one of the most powerful engines available to Delphi developers).
The idea behind Quartex Pascal was naturally that you could have more than one language, but they would all build the same AST and all of them emit the same bytecode. Not exactly a novel idea, this is how most scripting engines work – and also what Java and .Net have been doing since day one.
But the hard part was still waiting for me, namely to generate bytecode. This may sound easy but it’s not just a matter of dumping out a longword with the index of a method. You really need to think it through because if you make it to high-level, it will cripple your language. If you make it to low-level – the code will execute slow and the amount of bytecodes generated can become huge compared to normal assembly-code.
And no, I’m not making it compatible with CIL or Java, this is a pure object pascal solution for object pascal developers (and C++ builder naturally).
LDEF source language
I decided to do something a bit novel. Rather than just creating some classes to help you generate bytecode, I decided to create a language around the assembly language. Since C++ is easy to parse and looks that way because of its close connection to assembler, I decided to use C++ as the basic dialect.
So instead of you emitting bytecodes, your program only needs to emit LDEF source-code. Then you can just call the assembler program (command line) and it will parse, compile and turn it into assemblies for you.
Here is an example snippet that compiles without problem:
struct CustomType { uint32 ctMagic; uint32 ctSize; uint8 ctData[4096]; } class TBaseObject: object { /* class-fields */ alloc { uint8 temp; uint32 counter; } /* Parser now handles register mapping */ public void main(r0 as count, r1 as text) { alloc { /* method variables */ uint32 _longs; uint32 _singles; } move r1, count; jsr @_cleanup; push [text]; } /* test multi push to stack */ private void _cleanup() { push [r0, r1, r2]; } }
The LDEF virtual machine takes care of things like instance management, VMT (virtual method table), inheritance and everything else. All you have to do is to generate the LDEF assembly code that goes into the methods – and voila, the assembler will parse, compile and emit the whole shabam as assemblies (the -L option in the command-line assembler controls how you want to link the assemblies, so you can emit a single file).
The runtime engine is written in vanilla object pascal. It uses generics and lookup tables to achieve pretty good performance, and there is ample room for a JIT engine in the future. What was important for me was that i had a solution that was portable, require little maintenance, with an instruction set that could easily be converted to actual machine-code, LLVM or another high level language (like C++ that can piggyback on GCC regardless of platform).
LDEF instructions
The instruction set is a good mean of the most common instructions that real processors have. Conceptually the VM is inspired by the Motorola 68020 chip, in combination with the good old Acord Risc cpu. Some of the features are:
- 32bit (4 byte) opcode size
- 32 registers
- 1024 byte cache (can be adjusted)
- Stack
- Built in variable management
- Built in const and resource management
- Move data seamlessly between registers and variables
- Support for records (struct) and class-types
- Support for source-maps (both from assembler to high-level and reverse)
- Component based, languages and emitters can be changed
There are also a few neat language features that I have been missing from Delphi, like code criteria. Basically it allows you to define code that should be executed before the body of a procedure, and directly after. This allows you to check that parameter values are within range or valid before the procedure is allowed to execute.
Here is the pascal version:
function TMyClass.CalcValues(a,b,c: integer): integer; begin before if (a<1) or (b<1) or (c<1) then Fail('Invalid values, %classname.%methodname never executed'); end; result := a + b div c; after if result <23 then Fail('%classname.%methodname failed, input below lowest threshold error'); end; end;
I decided to add support for this in LDEF itself, including the C++ style intermediate language. Here it looks like this:
/* Enter + Exit is directly supported */ public void main(r0 as count, r1 as text) { enter { } leave { } alloc { /* method variables */ uint32 _longs; uint32 _singles; } move r1, count; jsr @_cleanup; push [text]; }
Why Borland/Embarcadero didn’t add this when they gave the compiler a full overhaul and support for generics – is beyond me. C++ have had this for many, many years now. C# have supported it since version 4, and Java I am usure about – but its been there for many years, thats for sure.
Attributes and RTTI
Attributes will be added but support for user-defined attributes will only appear later. Right now the only attributes I have plans for controls how memory is allocated for variables, if registers should be pushed to the stack on entry automatically, if the VMT should be flattened and the inheritance-chain reduced to a single table. More attributes will no doubt appear as I move forward, but right now I’ll stick to the basics.
RTTI is fairly good and presently linked into every assembly. You cant really omit that much from a bytecode system. To reduce the use of pure variables I introduced register mapping to make it easier for people to use the registers for as much as possible (much faster than variables):
public void main(r0 as count, r1 as text) { }
You may have noticed the strange parameters on this method? Thats because it’s not parameters, but definitions that link registers to names (register mapping). That way you can write code that uses the original names of variables or parameters, and avoid allocating variables for everything. It has no effect except making it easier to write code.
LDEF, why should I care?
You should care because, with a bit of work we should be able to compile fairly modern Delphi source-code to portable bytecodes. The runtime or engine that executes these bytecodes can be compiled using freepascal, which means you can execute the program anywhere FPC is supported. So more or less every operative system on the planet.
You should care because you can now write programs for Ultibo, the pascal operative system for Raspberry PI. It can only execute 1 real program (your embedded program), but since you can now run bytecodes, you will be able to run as many programs as you like. Just run each bytecode program in a thread or play it safe and call next() on interval from a TTimer.
You should care because once LDEF is finished, being able to transcode from object pascal to virtually any language will be a lot easier. JavaScript? C++? Python? Take your pick. The point here is standards and a library that is written to last decades rather than years. So my code is very clean and no silly pointer tricks just to get a few extra cycles. Stability, portability and maintainance are the values here.
You should care because in the future, components like HexLicense will implement its security code in LDEF using a randomized instruction-set, making it very, very hard for hackers to break your product.
Recent
The vatican vault
- January 2022
- October 2021
- March 2021
- November 2020
- September 2020
- July 2020
- June 2020
- April 2020
- March 2020
- February 2020
- January 2020
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- February 2013
- August 2012
- June 2012
- May 2012
- April 2012
You must be logged in to post a comment.