Home > Amibian.js, Delphi, Object Pascal, Smart Mobile Studio > Smart Pascal assembler, it’s a reality

Smart Pascal assembler, it’s a reality

January 31, 2018 Leave a comment Go to comments

After all these years of activity I guess there is no secret that I am a bit over-active at times. I am usually the most happy when I work on 2-3 things at the same time. I also do plenty of research to test theories and explore various technologies. So it’s never a dull moment – and this project has been no exception.

Bytecode based compilation

For that past 7 years I have worked close to compiler tech of various types and complexity on a daily basis. Script engines like DWScript, PAXScript, PascalScript, C# script, JavaScript (the list continues) – all of these have been used in projects either inhouse or for customers; and each serve a particular purpose.

Now while they are all fantastic engines and deliver fantastic results – I have had this “itch” to create something new. Something that approach the problem of interpreting, compiling and running code from a more low-level angle. One that is more standardized and not just a result of the inventors whim or particular style. Which in my view results in a system  that wont need years of updates and maintenance. I am a strong believer in simplicity, meaning that most of the time – a simple ad-hoc solution is the best.

It was this belief that gave birth to Smart Mobile Studio to begin with. Instead of spending a year writing a classical parser, tokenizer, AST and code emitter – we forked DWScript and used it to perform the tokenizing for us. We were also lucky to catch the interest of Eric (the maintainer) and the rest is history. Smart Mobile Studio was born and made with off the shelves parts; not boring. grey studies by men in lab coats.

The bytecode project started around the summer of 2017. I had thought about it for a while but this is when I finally took the time to sit down and pen my ideas for a portable virtual machine and bytecode based instruction set. A system that could be easily implemented in any language, from Basic to C/C++, without demanding the almost ridicules system specs and know-how of Java or the Microsoft CLR.

I labeled the system LDef, short for “language definition format”; I have written a couple of articles on the subject here on my blog, but I did not yet have enough finished to demo my ideas.

Time is always a commodity, and like everyone else the majority of my time is invested in my day job, working on Smart Mobile Studio. The rest is divided between my family, social obligations, working out and hobbies. Hence progress has been slow and sporadic.

But I finally have a working prototype so the LDEF parser, assembler, disassembler and runtime is no longer a theory but a functional virtual machine.

Power in simplicity

Without much fanfare I have finally reached the stage where I can demonstrate my ideas. It took a long time to get to this point, because before you can even think of designing a language or carve out a bytecode-format, you have to solve quite a few fundamental concepts. These must be in place before you even entertain the idea of starting on the virtual machine – or the project will simply end up as useless spaghetti that nobody understands or wants to work with.

  • Text parsing techniques must be researched properly
  • Virtual machine design must be worked out
  • A well designed instruction-set must be architected
  • Platform criteria must be met

Text parsing sounds easy. Its one of those topics where people reply”oh yeah, that’s easy” on auto pilot. But when you really dig into this subject you realize it’s anything but easy. At least if you want a parser that is fast, trustworthy – and more importantly: that can be ported to other dialects and languages with relatively ease (Delphi, FreePascal, C#, C/C++ are obvious targets). The ideas has to mature quite frankly.

One of my most central criteria when writing this system has been: no pointers in the core system. How people choose to inplement their version of LDEF for other languages is up to them (Delphi and FPC included), but the original prototype should be as clean and down to earth as possible.

Besides, languages like C# are not too keen on pointers anyways. You can use them but you have to mark your assemblies as “unsafe”. And why bother when var and const parameters offers you a safe and portable alternative? Smart Mobile Studio (or Smart Pascal, the dialect we use) doesn’t use pointers either; we compile to JavaScript after all where references is the name of the game. So avoiding pointers is more than central; it’s fundamental.

We want the system to be easy to port to any language, even Basic for that matter. And once the VM is ported, LDEF compiled libraries and assemblies can be loaded and used straight away.

The virtual CPU and it’s aggregates

The virtual machine architecture is the hard part. That’s where the true challenge resides. All the other stuff, be it source parsing, expressions management, building a model (AST), data types, generating jump tables, emitting bytecodes; All those tasks are trivial compared to the CPU and it’s aggregates.

The design and architecture of the cpu (or “runtime” or “virtual machine” since it consists of many parts) affects everything. It especially shapes the cpu instructions (what they do and how). But like mentioned the CPU is just one of many parts that makes up the virtual machine. What about variable handling? How should variables be allocated, addressed and dealt with? The way the VM deals with this will directly reflect how the byte code operates and how much code you need to initialize, populate and dispose of a variable.

Then you have more interesting questions like: how should the VM distinguish between global and local variable identities? We want the assembly code to be uniform like real machine code, we don’t want “special” instructions for global variables, and a whole different set of instructions for local variables. LDEF allows you to pass registers, variables, constants and a special register (DC) for data control as you wish. You are not bound to using registers only for math for instance.

I opted for an old trick from the Commodore days, namely “bit shift marking”. Local variables have the first bit in their ID set. While Global variables have the first bit zeroed. This allows us to distinguish between global and local variables extremely fast.

Here is a simple example that better demonstrates the technique. The id parameter is variable id read directly from the bytecode:

function TExample.GetVarId(const Id: integer;
  var IsGlobal: boolean): integer; inline;
begin
  IsGlobal := ((byte((Id shl 24) shr 24) shr 1) and 1) = 0;
  result := Id shr 1;
end;

This is just one of a hundred details you need to mentally work out before you even attempt the big one: namely how to deal with OOP and inheritance.

So far we have only talked about low-level bytecodes (ILASM as it’s called under the .net regime). In both Java and  dot net, object orientation is intrinsic to the VM. The runtime engine “knows” about objects, it knows about classes and methods and expect the bytecode files to be neatly organized class structures.

LDEF “might” go that way; but honestly I find it more tempting to implement OOP in ASM itself. So instead of the runtime having intrinsic knowledge of OOP, a high level compiler will have to emit a scheme for OOP instead. I still need to think and research what is best regarding this topic,

Pictures or it didn’t happen

The prototype is now 97% complete. And it will be uploaded so that people can play around with it. The whole system is implemented in Smart Pascal first (a Delphi and FreePascal version will follow) which means the whole system runs in your browser.

Like you would expect from any ordinary x86 assembler program (MASM, NASM, Gnu ASM, IAR [ARM] with others) the system consists of 4 parts:

  • Parser
  • Assembler
  • Disassembler
  • Runtime

So you can write source code directly in the browser, compile / assemble it – and then execute it on the spot. Then you can disassemble it and look at the results in-depth.

assembler

The virtual cpu

The virtual CPU sports a fairly common set of instructions. Unlike Java and .net the cpu has 16 data-aware registers (meaning the registers adopt the type of the value you assign to them, a bit like “variant” in Delphi and C++ builder). Variables allocated using the alloc() instruction can be used just like a register, all the instructions support both registers and variables as params – as well as defined constants, inline constants and strings.

  • R[0] .. R[16] ~ Data aware work registers
  • V[x] ~ Allocated variable
  • DC ~ Data control register

The following instructions are presently supported:

  • alloc [id, datatype]
    Allocate temporary variable
  • vfree [id]
    Release previously allocated variable
  • load [target, source]
    Move data from source to target
  • push [source]
    Push data from a register, variable onto the stack
  • pop [target]
    Pop a value from the stack into a register or variable
  • add [target, source]
    Add value of source to target
  • sub [target, source]
    Subtract source from target
  • mul [target, factor]
    Multiply target by factor
  • div [target, facor]
    Divide target by factor
  • mod [target, factor]
    Modulate target by factor
  • lsl [target, factor]
    Logical shift left, shift bits to the left by factor
  • lsr [target, factor]
    Logical shift right, shift bits to the right by factor
  • btst [target, bit]
    Test bit in target
  • bset [target, bit]
    Set bit in target
  • bclr [target, bit]
    Clear bit in target
  • and [target, source]
    And target with source
  • or [target, source]
    OR target with source
  • not [target]
    NOT value in target
  • xor [target]
    XOR value in target
  • cmp  [target, source]
    Compare value in target with source
  • noop
    No operation, used mostly for byte alignment
  • jsr [label]
    Jump sub-routine
  • bne [label]
    Branch not equal, conditional jump based on a compare
  • beq [label]
    Branch equal, conditional jump based on a compare
  • rts
    Return from a JSR call
  • sys [id]
    Call a standard library function

The virtual cpu can support instructions with any number of parameters, but the most common is either one or two.

I will document more as the prototype becomes available.

  1. February 5, 2018 at 12:48 pm

    Any plans on Virtual CPU -> LLVM (including BitCode) ?
    Having LLVM backend output (+BitCode) would allow Linux ARM, AppleTV, AppleWatch, WIndows/Mac native EXE’s..! Yes, Javascript/NodeJS is pretty cool and fast.
    But giving BitCode would allow Pascal/Smart code to run on Apple Watch/TV and other LLVM backends. (which aren’t currently available from FreePascal or Delphi).

    • March 14, 2018 at 11:23 pm

      We started on that a few years ago. The initial pre-project concluded that it would require at least 1 year of work to implement the full object pascal language. So we dropped it. You also need to factor in that the RTL also needs a clean rewrite, so suddenly we are looking at 3-4 years of work. I think llvm is a job for fpc/lazarus to be honest. But yes the thought has crossed my mind

  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: