Archive for April, 2019

Delphi AST, XML and weekend experiments

April 29, 2019 1 comment

One of the benefits of the Delphi IDE is that it’s a very rich eco-system that component writers and technology partners can tap into for their own products. I know that writing your own components is not something everyone enjoy, but knowing that you can in-fact write tools that expands the IDE using just Delphi or C++ builder, opens up for some interesting tools.

Ye old compiler bible

Ye old compiler bible

Delphi has a long tradition of “IDE enhancement” software and elaborate third-party tools that automate or delivers some benefit right in the environment. RemObjects SDK is probably the best example of how flexible the IDE truly is. RemObjects SDK integrates a whole service designer, which will generate source-code for you, update the code if you change something – and even generate service manifests for you.

There are also other tools that show off the flexibility of the IDE, ranging from code migration to advanced code refactoring and optimization.

It was with the last bit, namely code refactoring, that a third-party open-source library received a lot of deserving attention a couple of years back. A package called DelphiAST. This is a low-level syntax parser that reads Delphi source-code, applies fundamental syntax checks, and transforms the code into XML. A wet dream for anyone interested in writing advanced tooling that operates directly on source-code level.

Delphi AST

Like mentioned above, DelphiAST is a parser. Its job is very simple: parse the code, perform language level syntax checking, and convert each aspect of the code to a valid XML element. We are not talking about stuffing source-code into a CDATA segment here, but rather breaking each statement into separate tags (begin, end, if, procedure, param) so you can apply filtering, transformations and everything XML has to offer.

Back when Roman first started on DelphiAST, I got thinking — could we follow this idea further, and apply XML transformation to produce something more interesting? Would it actually be possible to approach the notion of compiling from a whole new angle? Perhaps convert between languages in a more effective way?

The short answer is: yes, everything is possible. But as always there are caveats and obstacles to overcome.

First of all, DelphiAST despite its name doesn’t actually generate a fully functional abstract symbol tree (AST). It generates a data model that is very suitable for AST generation, but not an actual AST. Everything in a programming language that can be referenced, like a method, a class, a global variable, a local variable, a parameter – are all called “symbols”. And before you can even think about processing the code, a fast and reliable AST must be in place.

Who cares?

Before I continue, you might be wondering why re-inventing the wheel is even a thing here? Why would anyone research compilers in 2019 when the world is abundant with compilers for a multitude of languages?

Because the world of computing is about to be hit by a tsunami, that’s why.

Quartex Pascal

Quartex Pascal

In the next 8-10 years the world of computing will be turned on its head. NVIDIA and roughly 100 tech companies have invested in open-source CPU designs, making it very clear that playing by Intel’s rules and bleeding royalties will no longer be tolerated. IBM has woken up from its “patent induced slumber” and is set to push their P9 cpu architecture, targeting both the high-end server and embedded market (see my article last year on PPC). At the same time Microsoft and Apple have both signaled that they are moving to ARM (an estimate of 5 years is probably reasonable). Laptop beta’s are said to be already rolling, with a commercial version expected Q3 this year (I think it wont arrive before xmas, but who knows).

Intel has remained somewhat silent about any long-term plans, but everyone that keeps an eye on hardware knows they are working like mad on next-gen FPGA. A tech that has the potential to disrupt the whole industry. Work is also being done to bridge FPGA coding with traditional code; there is no way of predicting the outcome of that though.

Oh and AMD is usurping the Intel marketshare at a steady rate — so we are in for a fight to the death.

The rise of C/C++

Those that keep tabs on languages have no doubt noticed the spike in C/C++ popularity lately. And the cause of this is that developers are safeguarding themselves for the storm to come.  C as a language might not be the most beautiful out there, but truth be told, it’s tooling requires the least amount of work to target a new platform. When a new architecture is released, C/C++ is always the first language available. You wont see C#, Flutter or Rust shipping with the latest and greatest; It’s always GCC or Clang.

Note: GCC is not just C, it’s actually a family of languages, so ironically, Gnu Basic hits a platform at the same time.

Those that have followed my blog for the past 10 years, should be more than aware of my experiments. From compiling to Javascript, generating bytecodes – and right now, moving the whole development paradigm to the browser. Hopefully my readers also recognize why this is important.

But to make you understand why I am so passionate about my compiler experiments, let’s do a little thought experiment:

Rethinking tooling

Let’s say we take Delphi, implement a bytecode format and streamline the RTL to be platform agnostic. What would the consequences of that be?

Well, first of all the compiler process would be split in two. The traditional compilation process would still be there, but it would generate bytecodes rather than machine code. That part would be isolated in a completely separate process; a process that, just like with the Delphi IDE’s infrastructure, could be outsourced to component-writers and technology partners. This in turn would provide the community with a high degree of safety, since the community itself could approach new targets without waiting for Embarcadero.

Even more, such an architecture would not be limited to machine-code. There is no law that says “you must convert bytecodes to machine code”. Since C/C++ is the foundation that modern operating-systems rest on, generating C/C++ source-code that can be built by existing compilers is a valid strategy.

There is also another factor to include in all of this, and that is Linux. Borland was correct in their assessment of Linux (the Kylix project), but they failed miserably with regards to timing. They also gravely underestimated Linux user’s sense of quality, depending on Wine (a Windows virtualization framework) to even function. They also underestimated Freepascal and Lazarus, because Linux is something FPC does exceptionally well. Competing financially against free products wont work unless you bring outstanding abilities to the table. And Linux have development tools that rival Visual Studio in quality, yet costs nothing.

But no matter how financially tricky Linux might be, we have reached the point in time where Linux is becoming mainstream. 10 years ago I had to setup my own Linux machine. There were no retailers locally that shipped a Linux box. Today I can walk into two major chains and pick dedicated Linux machines. Ubuntu in particular is well established and delivers LTS.

So for me personally, compiler tech has never been more important. And even more important is the tooling being universal and unbound by any specific API or cpu instruction-set. Firemonkey is absolutely a step in the right direction, but I think it’s a disaster to focus on native UI’s beyond a system level binding. Because replicating the same level of support and functionality for ARM, P9, RISC 5 and whatever monstrosity Intel comes up with through FPGA will take forever.

Transformation based conversion

We have wandered far off topic now, so let’s bring it back to this weekends experiment.

In short, XML transformations to convert code does work, but the right tooling have to be there to make it viable. I implemented a poor-man’s symbol table, just collecting classes, types and methods – and yeah, works just fine. What worries me a bit though is the XML parser. Microsoft has put a lot of money into XML file handling on enterprise level. When working with massive XML files (read: gigabytes) you really can’t be bothered to load the file into conventional ram and then old-school traverse the XML character by character. Microsoft operates with pure memory mapping so that you can process gigabytes like they were megabytes — but sadly, there is nothing similar for Linux, Unix or Android, that abruptly ends the fascination for me.

The only place I see using XML transformations to process source-code, is when converting to another language on source-level.

So the idea, although technically sound, gives zero benefits over the traditional process. I am however very interested in using DelphiAST to analyze and convert Delphi code directly from the IDE. But that will have to be an experiment for 2020, im booked 24/7 with Quartex Media Desktop right now.

But it was great fun playing around with DelphiAST! I loved how clean and neat the codebase has become. So if you need to work with source-code, DelphiAST is just the ticket!

Edit: You dont have to emit the code as XML. DelphiAST is perfectly happy to act as a clean parser, just saying.