Archive

Archive for January, 2018

Smart Pascal assembler, it’s a reality

January 31, 2018 2 comments

After all these years of activity I guess there is no secret that I am a bit over-active at times. I am usually the most happy when I work on 2-3 things at the same time. I also do plenty of research to test theories and explore various technologies. So it’s never a dull moment – and this project has been no exception.

Bytecode based compilation

For that past 7 years I have worked close to compiler tech of various types and complexity on a daily basis. Script engines like DWScript, PAXScript, PascalScript, C# script, JavaScript (the list continues) – all of these have been used in projects either inhouse or for customers; and each serve a particular purpose.

Now while they are all fantastic engines and deliver fantastic results – I have had this “itch” to create something new. Something that approach the problem of interpreting, compiling and running code from a more low-level angle. One that is more standardized and not just a result of the inventors whim or particular style. Which in my view results in a system  that wont need years of updates and maintenance. I am a strong believer in simplicity, meaning that most of the time – a simple ad-hoc solution is the best.

It was this belief that gave birth to Smart Mobile Studio to begin with. Instead of spending a year writing a classical parser, tokenizer, AST and code emitter – we forked DWScript and used it to perform the tokenizing for us. We were also lucky to catch the interest of Eric (the maintainer) and the rest is history. Smart Mobile Studio was born and made with off the shelves parts; not boring. grey studies by men in lab coats.

The bytecode project started around the summer of 2017. I had thought about it for a while but this is when I finally took the time to sit down and pen my ideas for a portable virtual machine and bytecode based instruction set. A system that could be easily implemented in any language, from Basic to C/C++, without demanding the almost ridicules system specs and know-how of Java or the Microsoft CLR.

I labeled the system LDef, short for “language definition format”; I have written a couple of articles on the subject here on my blog, but I did not yet have enough finished to demo my ideas.

Time is always a commodity, and like everyone else the majority of my time is invested in my day job, working on Smart Mobile Studio. The rest is divided between my family, social obligations, working out and hobbies. Hence progress has been slow and sporadic.

But I finally have a working prototype so the LDEF parser, assembler, disassembler and runtime is no longer a theory but a functional virtual machine.

Power in simplicity

Without much fanfare I have finally reached the stage where I can demonstrate my ideas. It took a long time to get to this point, because before you can even think of designing a language or carve out a bytecode-format, you have to solve quite a few fundamental concepts. These must be in place before you even entertain the idea of starting on the virtual machine – or the project will simply end up as useless spaghetti that nobody understands or wants to work with.

  • Text parsing techniques must be researched properly
  • Virtual machine design must be worked out
  • A well designed instruction-set must be architected
  • Platform criteria must be met

Text parsing sounds easy. Its one of those topics where people reply”oh yeah, that’s easy” on auto pilot. But when you really dig into this subject you realize it’s anything but easy. At least if you want a parser that is fast, trustworthy – and more importantly: that can be ported to other dialects and languages with relatively ease (Delphi, FreePascal, C#, C/C++ are obvious targets). The ideas has to mature quite frankly.

One of my most central criteria when writing this system has been: no pointers in the core system. How people choose to inplement their version of LDEF for other languages is up to them (Delphi and FPC included), but the original prototype should be as clean and down to earth as possible.

Besides, languages like C# are not too keen on pointers anyways. You can use them but you have to mark your assemblies as “unsafe”. And why bother when var and const parameters offers you a safe and portable alternative? Smart Mobile Studio (or Smart Pascal, the dialect we use) doesn’t use pointers either; we compile to JavaScript after all where references is the name of the game. So avoiding pointers is more than central; it’s fundamental.

We want the system to be easy to port to any language, even Basic for that matter. And once the VM is ported, LDEF compiled libraries and assemblies can be loaded and used straight away.

The virtual CPU and it’s aggregates

The virtual machine architecture is the hard part. That’s where the true challenge resides. All the other stuff, be it source parsing, expressions management, building a model (AST), data types, generating jump tables, emitting bytecodes; All those tasks are trivial compared to the CPU and it’s aggregates.

The design and architecture of the cpu (or “runtime” or “virtual machine” since it consists of many parts) affects everything. It especially shapes the cpu instructions (what they do and how). But like mentioned the CPU is just one of many parts that makes up the virtual machine. What about variable handling? How should variables be allocated, addressed and dealt with? The way the VM deals with this will directly reflect how the byte code operates and how much code you need to initialize, populate and dispose of a variable.

Then you have more interesting questions like: how should the VM distinguish between global and local variable identities? We want the assembly code to be uniform like real machine code, we don’t want “special” instructions for global variables, and a whole different set of instructions for local variables. LDEF allows you to pass registers, variables, constants and a special register (DC) for data control as you wish. You are not bound to using registers only for math for instance.

I opted for an old trick from the Commodore days, namely “bit shift marking”. Local variables have the first bit in their ID set. While Global variables have the first bit zeroed. This allows us to distinguish between global and local variables extremely fast.

Here is a simple example that better demonstrates the technique. The id parameter is variable id read directly from the bytecode:

function TExample.GetVarId(const Id: integer;
  var IsGlobal: boolean): integer; inline;
begin
  IsGlobal := ((byte((Id shl 24) shr 24) shr 1) and 1) = 0;
  result := Id shr 1;
end;

This is just one of a hundred details you need to mentally work out before you even attempt the big one: namely how to deal with OOP and inheritance.

So far we have only talked about low-level bytecodes (ILASM as it’s called under the .net regime). In both Java and  dot net, object orientation is intrinsic to the VM. The runtime engine “knows” about objects, it knows about classes and methods and expect the bytecode files to be neatly organized class structures.

LDEF “might” go that way; but honestly I find it more tempting to implement OOP in ASM itself. So instead of the runtime having intrinsic knowledge of OOP, a high level compiler will have to emit a scheme for OOP instead. I still need to think and research what is best regarding this topic,

Pictures or it didn’t happen

The prototype is now 97% complete. And it will be uploaded so that people can play around with it. The whole system is implemented in Smart Pascal first (a Delphi and FreePascal version will follow) which means the whole system runs in your browser.

Like you would expect from any ordinary x86 assembler program (MASM, NASM, Gnu ASM, IAR [ARM] with others) the system consists of 4 parts:

  • Parser
  • Assembler
  • Disassembler
  • Runtime

So you can write source code directly in the browser, compile / assemble it – and then execute it on the spot. Then you can disassemble it and look at the results in-depth.

assembler

The virtual cpu

The virtual CPU sports a fairly common set of instructions. Unlike Java and .net the cpu has 16 data-aware registers (meaning the registers adopt the type of the value you assign to them, a bit like “variant” in Delphi and C++ builder). Variables allocated using the alloc() instruction can be used just like a register, all the instructions support both registers and variables as params – as well as defined constants, inline constants and strings.

  • R[0] .. R[16] ~ Data aware work registers
  • V[x] ~ Allocated variable
  • DC ~ Data control register

The following instructions are presently supported:

  • alloc [id, datatype]
    Allocate temporary variable
  • vfree [id]
    Release previously allocated variable
  • load [target, source]
    Move data from source to target
  • push [source]
    Push data from a register, variable onto the stack
  • pop [target]
    Pop a value from the stack into a register or variable
  • add [target, source]
    Add value of source to target
  • sub [target, source]
    Subtract source from target
  • mul [target, factor]
    Multiply target by factor
  • div [target, facor]
    Divide target by factor
  • mod [target, factor]
    Modulate target by factor
  • lsl [target, factor]
    Logical shift left, shift bits to the left by factor
  • lsr [target, factor]
    Logical shift right, shift bits to the right by factor
  • btst [target, bit]
    Test bit in target
  • bset [target, bit]
    Set bit in target
  • bclr [target, bit]
    Clear bit in target
  • and [target, source]
    And target with source
  • or [target, source]
    OR target with source
  • not [target]
    NOT value in target
  • xor [target]
    XOR value in target
  • cmp  [target, source]
    Compare value in target with source
  • noop
    No operation, used mostly for byte alignment
  • jsr [label]
    Jump sub-routine
  • bne [label]
    Branch not equal, conditional jump based on a compare
  • beq [label]
    Branch equal, conditional jump based on a compare
  • rts
    Return from a JSR call
  • sys [id]
    Call a standard library function

The virtual cpu can support instructions with any number of parameters, but the most common is either one or two.

I will document more as the prototype becomes available.

TextCraft 1.2 for Smart Pascal

January 26, 2018 Leave a comment

TextCraft is a fast, generic object-pascal text parsing framework. It provides you with the classes you need to write fast text parsers that builds effective data models.

The Textcraft framework was recently moved up to version 1.2 and has been ported from Delphi to both Freepascal and Smart Pascal (the dialect used by Smart Mobile Studio). This is probably the only parsing framework that spans 3 compilers.

Smart Pascal coders can download the framework unit here. This can be placed in their $Install/Library folder  (where $install is where Smart’s library and rtl folder is installed): BitBucket TextCraft Repository

Buffer, parser, model

Textcraft divides the job of parsing into 4 separate objects; each of them representing a concept familiar to people writing compilers; these are: buffer, parser, model and context. If you are parsing a programming language the “model” would be what people call the AST (short for “Abstract Symbol Tree”). This AST is later feed to the code generator, turning it into an executable program (Smart Pascal compiles to JavaScript so there really is no limit to the transformation, just level of complexity).

Note: Textcraft is not a compiler for any particular language, it is a generic text parsing framework that is language-agnostic. Meaning that it makes it easy for you to make parsers with it. We recently used it to parse command-line parameters for Freepascal, so it doesn’t have to be about languages.

The buffer

The buffer has one of the most demanding jobs in the framework. In other frameworks the buffer is often just a memory allocation with a simple read method; but in TextCraft the model is responsible for a lot more. It has to expose functions that makes text recognition simple and effective; it has to keep track of column and row position as you move through the buffer content – and much, much more. So in TextCraft the buffer is where text methodology is implemented in full.

The parser

Like mentioned the parser is responsible for using the buffer’s methods to recognize and make sense of a text. As it makes its way through the buffer content, it creates model-objects that represents each element. Typical for a language would be structures (records), classes, enums, properties and so on. Each of these will be registered in the AST data model.

The Model

The model is a construct. It is made up of as many mode-object instances as you need to express the text in symbolic form. It doesn’t matter if you are parsing a text document or source code, you would still have to define a model for it.

The model obviously reflect your needs. If you just need a superficial overview of the data then you create a simple model. If you need more elaborate information then you create that.

Note: When parsing a text document, a traditional organization would be to divide the model into: chapter, section, paragraph, line and individual words.

The Context

The context object is what links the parser to our model and buffer objects. By default the parser doesn’t know anything about the buffer or model. This helps us abstract away things that would otherwise turn our code into a haystack of references.

The way the context is used can be described like this:

When parsing complex data you often divide the job into multiple classes. Each class deals with one particular topic. For example: if parsing Delphi source code, you would write a class that parses records, a parser that handles classes, another that handles field declarations (and so on).

As a parser recognize mentioned objects, like say a record, it will create a record model object to hold the information. It will then add that to the context by pushing it onto its reference stack.

The first thing a parser does is to grab the model object from the reference to stack. This way the child parsers will always know where to store their model information. It doesn’t matter how deep or recursive something gets, the stack approach and passing the context object to the child parsers – will always make sure each parser “knows” where to store information.

Why is this important?

This is important because it’s cost-effective in computing terms. The TextCraft framework allows you to create parsers that can chew through complex data without turning your project into spaghetti.

So no matter if you are parsing phone-numbers, zip codes or complex C++ source code, TextCraft will make help you get the job done; in a way that is easy to understand and mentain.

Smart Mobile Studio: more cmd tools

January 24, 2018 Leave a comment

Being able to compile and work with projects from the command-line has been possible with Smart Mobile Studio almost since the beginning. But as projects grows, so does the need for more automation.

Toolbox

510242661The IDE contains a few interesting features, like the “Data to picture” function. This takes a datafile (or any file) and place the raw bytes into a png picture as pixels. This is a great way of loading data that the browser would otherwise block or ignore.

People have asked if we could perhaps turn these into command-line tools as well. And I have finally gotten around to doing just that. So our toolbox now contains 3 more command-line tools (not just the smsc compiler)

  • Superglue
  • DataToImage

Superglue

When you work with large JavaScript libraries they often consists of multiple files. This is great for JS developers and no different from how we use multiple unit-files to organize a project.

But it can be problematic when you deploy applications, because if the dependencies are heavy then your application will load slower. A typical example is ACE, the code editor we recently added to Smart. Its a fantastic editor, but it consists of a monstrous amount of files.

Superglue can import files based on a filter (like *.js) or a semi-colon delimited list. It will then merge these files together into a single file.

For example, let’s say you have 35 javascript files that makes up a library. And lets say you have downloaded and unpacked this to “C:\Temp” on your harddisk. To link all the JS files into a single file, you would type:

superglue -mode:filter -root:"C:\temp" -filter:"*.js" -sort -out:"C:\Smart\Libraries\MyLibrary\MyLibrary.js"

The above will enumerate all the files in “C:\Temp” and only keep those with a .JS file extension. It will sort the files since the -sort switch is set, and finally link all the files into a new, single file called MyLibrary.js (in another location).

So instead of shipping 35 files, which means 3d http loads, we ship one file and load the data in ourselves when the application starts.

DataToImage

As the name implies this is the same function that you find in the IDE. It takes a raw data file (actually, any file) and injects the bytes as pixels in a new PNG file. Code for extracting the data again already exist in the RTL – but I will brush up again on this when we add these tools to our toolbox.

Using this is simplicity itself:

datatoimage -input:"mysqldb.sq3" -output:"c:\smart\projects\mymobileapp\res\defaultdata.png"

The above takes a default sqlite database and stores it inside a picture. In the application we load the picture in, extract the data, and then use that as our default data — which is later stores in the browser cache. This saves us having to execute a ton of sql-statements to establish a DB from scratch in memory.

Better parsing

These tools are very simple. They dont take long to make, but they do need to be reliable. And they do need to be in place when you need them.

We actually ported over TextCraft, a parser we use both in Smart Mobile Studio and Delphi, so it would compile under Freepascal. There was a huge bug in the way Lazarus deals with parameters, so we ended up writing a fresh new command-line parser.

Future tools

We have a lot on our plate so I doubt we will focus on our toolbox much after these. They simplify library making and data injection for projects, and you can use a shell script to implement “make-files” that most people do these days.

However, one tool that would be very handy is a “project to xmlhelp” or similar. A command-line program that will scan your Smart project and emit a full overview of your classes, methods and properties in traditional xml-help format.

But we will see when time allows — at least making libraries and merging in data will be easier from now on 🙂

Fixed Header in Smart Applications

January 3, 2018 Leave a comment

Smart Mobile Studio gives you a lot of really cool visual controls to play with. One of them is a header control (also called a navigation panel by some) that traditionally show and hide it’s buttons (back and next) in response to form navigation.

One question that many people have asked is: how can I make a header that remains fixed and doesnt scroll with the forms? So no matter what form I navigate to, the header remains in place. Preferably easily accessed.

The Visual Application

Smart Visual Applications are more than just forms and buttons. The first thing that is created when you run a visual Smart Application, is naturally an instance of TApplication; this in turn creates a display control, and inside that again there is something called a “viewport”. Forms are always created inside the viewport.

If you are wondering why on earth we use two nested containers like this, that has to do with scrolling and keeping our controls isolated in one place. Forms are positioned horizontally inside the viewport. So whenever you are moving from Form1 to Form2, depending on the scroll-effect you have picked, the second form is lined up either before or after the current form. We then execute a CSS3 animation that smoothly scrolls the new form into view, or the previous form out of view – depending on how you look at it.

The display

The root display control, TW3Display, has only one job; and that is to house the view control. It also contains code to layout child controls vertically. Since there is typically only one control present – that means you don’t notice much of what TW3Display does.

The “trick” to a static header that remains un-affected by forms, is simply to create the header control with “Application.Display” as the parent. That is all you have to do. You could also create it on Application.Display.View, but then it would cause problems with scrolling. My point for mentioning that is to underline how the RTL has no special rules for it’s structure. All visual entities that make up your Smart Pascal application follow the same laws and are subject to the same rules as TW3Button or TW3Label might be.

Creating controls that don’t attach to a form

The vertical layout that TW3Display does automatically is very simple. It sorts the child elements based on their Y position and places them directly after each other. This means that all you have to do is create the header and then make sure you give it a negative Y position, and it will always remain fixed on top of the Viewport and it’s forms.

TW3Application has a virtual method called ApplicationStarting() that is perfect for what we want to achieve. As the name says this method fires when the application is starting, so this is perfect for creating controls that don’t attach to a form. It also has an accompanying ApplicationClosing() method where we can release the control.

So let’s start by creating our control. Each visual application has a “unit1” that is created automatically. This contains your application object. While TApplication is a bit anonymous under Delphi or Lazarus, under Smart it serves a more central role. It’s the place you expose global values that should be usable throughout the entire program.

unit Unit1;

interface

uses
  Pseudo.CreateForms, // auto-generated unit that creates forms during startup
  System.Types, SmartCL.System, SmartCL.Components, SmartCL.Forms,
  SmartCL.Application,
  SmartCL.Controls.Header,
  Form1;

type

  TApplication  = class(TW3CustomApplication)
  private
    FHeader:  TW3HeaderControl;
  protected
    procedure ApplicationStarting; override;
    procedure ApplicationClosing; override;
  public
    property  Header: TW3HeaderControl read FHeader;
  end;

implementation

procedure TApplication.ApplicationStarting;
begin
  inherited;
  FHeader := TW3HeaderControl.Create(Display);
  FHeader.SetBounds(0, -10, 100, 46);
end;

procedure TApplication.ApplicationClosing;
begin
  FHeader.free;
  inherited;
end;

end.

Let’s compile and see what we got so far!

static_01

As expected we now have a header outside the form region

Global access

SmartCL, which is the namespace (a collection of units organized under one name) where all visual, DOM based classes live, have a global function for getting the Application object. This is simply Application() and you have probably used it many times.

What is not so well-known is that Application() returns a stock TCustomApplication instance. In other words, if you inspect the instance you will find none of the properties you have defined in TApplication. This is because TApplication is unknown until the application is executed. So in order to access your actual application object, you need to typecast; like I do here:

procedure TForm1.InitializeObject;
begin
  inherited;
  {$I 'Form1:impl'}
  var app := TApplication(Application);
  app.Header.Title.Caption := 'This is my header';
end;

Let’s have a look at the result (note: I added a label as well, just so you don’t think you missed something):

static_02

Now this approach works fine for many types of objects. I tend to isolate my database instance there, static header, global storage — all of it can be neatly exposed via TApplication. Fast, simple and efficient.

The final step

The initial state for the static header should be that both buttons are hidden by default. So when you start the application it just shows a title, nothing more.

When you click something that cause navigation to form2 (or some other second form), the back-button should become visible once form2 has scrolled into view.

When the user click the back-button, the opposite should happen. The back button should be disabled while you navigate back to form1, then completely hidden once you have arrived.

I don’t think I need to demonstrate this. Obviously, if you have forms that leads to more forms – then you probably want to add a “navigation stack” to the application object – an array that holds the previously visited forms.

Then whenever someone hits the “back button” you just pop the previous form off the stack, and navigate to it.

Well, hope it helps!