Archive

Posts Tagged ‘N++’

LDef and bytecodes

July 14, 2017 Leave a comment

LDef, short for Language Definition format, is a standard I have been formulating for a couple of years. I have taken my experience with writing various compilers and parsers, and also my experience of writing RTL’s and combined it all into a standard.

programming-languages-for-iot-e1467856370607LDef is a way for anyone to create their own programming language. Just like popular libraries and packages deals with the low-level stuff, like Gr32 which is an excellent graphics library — LDef deals with the hard stuff and leaves you with the pleasant job of defining what the language should look like.

The idea is to make a language construction kit if you like, where the underlying engine is flexible enough to express the languages we know and love today – and also powerful enough to express new ideas. For example: let’s say you want to create an awesome new game system (just as an example, it applies to any system that can be automated). You have the means and skill to create the actual engine – but how are you going to market it? You will be up against monoliths like Unity and simple “click and play” engines like ClickTeam Fusion, Game Maker and the likes.

Well, the only way to make good games is hard work. There is no two ways about it. You can fake your way only so far – so at the end of the day you want to give your users something solid.

In our example of publishing a game-engine, I think that you would stand a much better chance of attracting users if you hooked that engine up to a language. A language that is easy to use, easy to learn and with commands that are both specific and non-specific to your engine.

There are some flavours of Basic that has produced knock-out games for decades, like BlitzBasic. That language alone has produced hundreds of titles for both PC, XBox and even Nintendo. So it’s insanely fast and not a pushover.

And here is the cool part about LDEF: namely that it makes it easy for you to design your own languages. You can use one of the pre-defined languages, like object pascal or visual basic if that is what you like – but ultimately the fun begins when you start to experiment with new ideas and language features. And it’s fun when you get to that point, because all the nitty gritty is handled. You get to focus on the superficial stuff like syntax and high level functions. So you can shave off quite a bit of development time and make coding fun again!

The paradox of faster bytecodes

Bytecodes used to be to slow for anything substantial. On 16-bit machines bytecodes were used in maybe one language (that I know of) and that was the ‘E’ compiler. The E language was maybe 30 years ahead of its time and is probably the only language I can think of that fits cloud programming like hand in glove. But it was also an excellent system automation language (scripting) and really turned some heads back in the late 80s and early 90s. REXX was only recently added to OS X, some 28 years after the Amiga line of computers introduced it to the general public.

ldef_bytecodes

Bytecode dump of a program compiled with the node.js version of the compiler

In modern times bytecodes have resurfaced through Java and the .NET framework which for some reason caused a stir in the whole development community. I honestly never bought into the hype, but I am old enough to remember the whole story – so I’m probably not the Microsoft demographic anyways. Java hyped their virtual machine opcodes to the point of exhaustion. People really are stupid. Man did they do a number on CEO’s and heads of R&D around the world.

Anyways, end of the story was that Intel and AMD went with it and did some optimizations that could help bytecodes run faster. The stack was optimized with Java, because let’s face it – it is the proverbial assault on the hardware. And the cache was expanded on command from the emper.. eh, Microsoft. Also (if I remember correctly) the “jump to pointer” and various branch instructions were made to execute faster. I remember reading about this in Dr. Dobbs Journal and Microsoft Developer Magazine; granted it was a few years ago. What was interesting is the symbiotic relationship that exists between Intel and Microsoft, I really didn’t know just how closely knit these guys were.

Either way, bytecodes in 2017 is capable of a lot more than they ever were on 16-bit and early 32-bit systems. A cpu like Intel i5 or i7 will chew through bytecodes like a warm knife on butter. It depends on how you orchestrate the opcodes and how much work you delegate to the various instructions.

Modeled instructions

Bytecodes are cool but they have to be modeled right, or its all going to end up as a bloated, slow and limited system. You don’t want to be to low-level, otherwise what is the point of bytecodes? Bytecodes should be a part of a bigger picture, one that could some day be modeled using FPGA’s for instance.

The LDef format is very flexible. Each instruction is ultimately a single 32-bit longword (4 bytes) where each byte holds key information about the command, what data is forward in the cache and how it should be read.

The byte organization is:

  • 0 – Actual opcode
  • 1 – Instruction layout

Depending on the instruction layout, the next two bytes can hold different values. The instruction layout is a simple value that defines how the data for the instruction is passed.

  • Constant to register
  • Variable to register
  • Register to register
  • Register to variable
  • Register to stack
  • Stack to register
  • Variable to variable
  • Constant to variable
  • Stack to variable
  • Program counter (PC) to register
  • Register to Program counter
  • ED (exception data) to register
  • Register to exception-data

As you can probably work out from the information here, this list hints to some archetectual features. Variables are first class citizens in LDef, they are allocated, managed and released using instructions. Constants can be either automatically handled and references by id (a resource chunk is linked to the class binary) or “in place” and compiled directly into the assembly as part of the instruction. For example

load R[0], "this is a test"

This line of code will take the constant “this is a test” and move it into register #0. You can choose to have the text-data stored as a proper resource which is appended to the compiled bytecode (all classes and modules have a resource chunk) or just compile “as is” and have the data read directly. The first option is faster and something you can adjust with compiler optimization options. The second option is easier to work with when you debug since you can see the data directly as a part of the debug memory dump.

And last but not least there are registers. 32 of them in number (so for the low-level coders out there you should have few limitations with regards to register mapping). All operations (like divide, multiply etc) operate on registers only. So to multiply two variables they first have to be moved into registers and the multiplication is executed there – then you can move the result to a variable afterwards.

ldef_asm

LDef assembly code. Simple but extremely effective

The reason registers is used in my runtime system – is because you will not be able to model a FPGA with high-level concepts like “variables” should someone every try to implement this as hardware. Things like registers however is very easy to model and how actual processors work. You move things from memory into a cpu register, perform an action, and then move the result back into memory.

This is where Java made a terrible mistake. They move all data onto the stack and then call the operation. This simplifies execution of instructions since there is never any registers to keep track of, but it just murders stack-space and renders Java useless on mobile devices. The reason Google threw out classical Java (e.g Java as bytecodes) is due to this fact (and more). After the first android devices came out they quickly switched to a native compiler – because Java was too slow, to power-hungry and required too much memory (especially stack space) to function properly. Battery life was close to useless and the only way to save Java was to go native. Which is laughable because the entire point of Java was mobility, “compile once run everywhere” — yeah well, that didn’t turn out to well did it ūüėÄ

Dot net improved on this by adding a “load resource” type instruction, where each method will load in the constant data by number – and they are loaded into pre-defined slots (the variables you have used naturally). Then you can execute operations in typical “A + B to C” style (actually all of that is omitted since the compiler already knows both A, B and C). This is much more stack friendly and places performance penalty on the common language runtime (CLR).

Sadly Microsoft’s platform, like everything Microsoft does, requires a pretty large infrastructure. It’s not simple, elegant and fast – it’s more monolithic, massive and resource hungry. You don’t see .net being the first thing ported to a new platform. You typically see GCC followed by Freepascal.

LDef takes the bytecode architecture one step further. On assembly level you reference data using identifiers just like .net, and each instruction is naturally executed by the runtime-engine – but data handling is kept within the virtual realm. You are expected to use the registers as temporary holding slots for your information. And no operations are ever done directly on a variable.

The benefit of this is:

  • Better payload balancing
  • Easier to JIT since the architecture is closer to real assembly
  • Retains important aspects of how real hardware works (with FPGA in mind)

So there are good reasons for the standard, all of them very good.

C like intermediate language

With assembler so clearly defined you would expect ¬†assembly to be the way you work. In essence that what you do, but since OOP is built into the system and there are structures you are expected to populate — structures that would be tedious to do in raw unbridled assembler, I have opted for a C++ inspired intermediate language.

ldef_app

The LDEF assembler kitchen sink

You would half expect me to implement pascal, but truth be told pascal parsing is more complex than C parsing, and C allows you to recycle parsers more easily, so dealing with sub structures and nested regions is less maintainance and easier to write code for.

So there is no spesific reason why I picked C++ as a intermediate language. I would prefer pascal but I also think it would cause a lot of confusion since object pascal will be the prime citizen of LDef languages. My other language, N++ also used curley brackets so I’m honestly not strict about what syntax people prefer.

Intermediate language features supported are:

  • Class declarations
  • Struct declarations
  • Parameter to register mapping
  • Before mehod code (enter)
  • After method code (leave)
  • Alloc section for class fields
  • Alloc section for method variables

The before and after code for methods is very handy. They allow you to define code that should execute before the actual procedure. On a higher level when designing a new language, this is where you would implement custom allocation, parameter testing etc.

So if you call this method:

function testcode() {
    enter {
      writeln("this is called before the method entry");
    }
    leave { 
      writeln("this is called after the method exits");
    }
  writeln("this is the method body");
}

Results in the following output:

this is called before the method entry
this is the method body
this is called after the method exits

 

When you work with designing your language, you eventually.

Truly portable

Now I have no aspirations in going into competition with neither Oracle, Microsoft or anyone in between. Like most geeks I do things I find interesting and enjoy working within a field of computing that is stimulating and personally rewarding.

Programming languages is an area where things havent really changed that much since the golden 80s. Sure we have gotten a ton of fancy new software, and the way people use languages has changed – but at the end of the day the languages we use havent really changed that much.

JavaScript is probably the only language that came out of the blue and took the world by storm, but that is due to the central role the browser holds for the internet. I sincerely doubt JavaScript would even have made a dent in the market otherwise.

LDef is the type of toolkit that can change all this. It’s not just another language, and it’s not just another bytecode engine. A lot of thought has gone into its architecture, not just notions of “how can we do this or that”, but big ideas about the future of computing and how IOT will sculpt the market within 5-8 years. And the changes will be permanent and irrevocable.

Being able to define new languages will be utmost important in the decade ahead. We don’t even know the landscape yet but we can extrapolate some ideas based on where technology is going. All of it in broad strokes of course, but still – there are some fundamental facts about computers that the timeless and havent aged a day. It’s like mathematics, the Pythagorean theorem may be 2500 years old but it’s just as valid today as it was back then. Principles never die.

I took the example of a game engine at the start of this article. That might have been a poor choice for some, but hopefully the general reader got the message: the nature of control requires articulation. Regardless if you are coding an invoice system or a game engine, factors like time, portability and ease of use will be just as valid.

There is also automation to keep your eye on. While most of it is just media hype at this point, there will be some form of AI automation. The media always exaggerates things, so I think we can safely disregard a walking, self-aware Terminator type robot replacing you at work. In my view you can disregard as much as 80% of what the media talks about (regardless of topic). But some industries will see wast improvement from automation. The oil and gas sector are the most obvious. A the moment security is as good as humans can make them – which means it is flawed and something goes wrong every day around the globe. But smart pumping stations and clever pressure measurements and handling will make a huge difference for the people who work with oil. And safer oil pipelines means lives saved and better environmental control.

The question is, how do we describe programs 20 years from now? Is our current tools up for the reality of IOT and billions of connected devices? Do we even have a language that runs equally well as a 1000 instance server-cluster as it would as a stand alone program on your desktop? When you start to look into parallel computing and multi-cluster data processing farms – languages like C# and C++ makes little sense. Node.js is close, very close, but dealing with all the callbacks and odd limitations of JavaScript is tedious (which is why we created Smart Pascal to begin with).

The future needs new things. And for that to happen we first need tools to create them. Which is where my passion is.

Node, native and beyond

When people create compilers and programming languages they often do so for a reason. It could be that their own tools are lacking (which was my initial motivation), or that they have thought of a better way to achieve something; the reasons can be many. In Microsofts case it was revenge and spite, since they were unsuccessful in stealing Java away from Sun Microsystems (Oracle now owns Java).

LDEF

LDef binaries are fairly straight forward. The less fluff the better

Point is, you implement your idea using the language you know – on the platform you normally use. So for me that is object pascal on windows. I’m writing object pascal because while the native compiler and runtime is written in Delphi – it is made to compile under Freepascal for Linux and OS X.

But the primary work is done in Smart Pascal and compiled to JavaScript for node.js. So the native part is actually a back-port from Smart. And there is a good reason I’m doing it this way.

First of all I wanted a runtime and compiler system that would require very little to run. Node.js has grown fat in features over the past couple of years – but out of the box node.js is fast, portable and available almost anywhere these days. You can write some damn fast and scalable cloud servers with node (and with fast i mean FAST, as in handling thousands of online gamers all playing complex first person worlds) and you can also write some stable and rock solid system services.

Node is turning into a jack of all trades, capable of scaling and clustering way beyond what native software can do. Netflix actually re-wrote their entire service stack using node back in 2014. The old C++ and ASP approach was not able to handle the payload. And every time they did a small change it took 45 minutes to compile and get a binary to test. So yeah, node.js makes so much more sense when you start looking a big-data!

So I wanted to write LDef in a way that made it portable and easy to implement. Regardless of platform, language and features. Out of the box JavaScript is pretty naked stuff and the most advanced high-level feature LDef uses is buffers to deal with memory. everything else is forced to be simple and straight forward. No huge architecture or global system services, just a small and fast runtime and your binaries. And that’s all you need to run your compiled applications.

Ultimately, LDef will be written in LDef itself and compile itself. Needing only a small executable stub to be ported to a new platform. Most of mono C# for Linux is written in C# itself – again making it super easy to move mono between distros and operating systems. You can’t do that with the Visual Studio, at least not until Microsoft wants you to. Neither would you expect that from Apple XCode. Just saying.

The only way to achieve the same portability that mono, freepascal and C/C++ has to offer, is naturally to design the system as such from the beginning. Keep it simple, avoid (operatingsystem) globalization at all cost, and never-ever use platform bound APIs except in the runtime. Be Posix but for everything!

Current state of standard and licensing

The standard is currently being documented and a lot of work has been done in this department already. But it’s a huge project to document since it covers not only LDEF as a high-level toolkit, but stretches from the compiler to the source-code it is designed to compile to the very binary output. The standard documentation is close to a book at this stage, but that’s the way it has to be to ensure every part is understood correctly.

But the question most people have is often “how are you licensing this?”.

Well, I really want LDEF to be a free standard. However, to protect it against hijacking and abuse – a license must be obtained for financial entities (as in companies) using the LDEF toolkit and standard in commercial products.

I think the way Unreal software handles their open-source business is a great example of how things should be done. They never charge the little guy or the Indie developer – until they are successful enough to afford it. So once sales hits a defined sum, you are expected to pay a small percentage in royalties. Which is only fair since Unreal engine is central to the software to begin with.

So LDef is open source, free to use for all types of projects (with an obligation to pay a 3% royalty for commercial products that exceeds $4999 in revenue). Emphasis is on open source development. As long as the financial obligations by companies and developers using LDEF to create successful products is respected, only creativity sets the limit.

If you use LDEF to create a successful product where you make 50.000 NKR (roughly USD 5000) you are legally bound to pay 3% of your product revenue monthly for the duration of the product. Which is extremely little (3% of $5000 is $150 which is a lot less than you would pay for a Delphi license, the latter costing between upwards of USD 3000).

 

Ghost of xmas past, or better known as the folder where old projects and abandoned ideas end up

December 8, 2016 Leave a comment

I think everyone has a folder where they stuff old units, test projects and well – all those ideas that seemed awesome at the time, but you either discovered that it was crap, not enough time, or just forgot all about it. It happens. What can I say.

Yeah, I have one of those as well. And I also have those duplicate folders. You know, that time when you decided to make backups for everything Рbut then you got a new PC and now you dont have a clue which version to keep? And when you finally sit down with beyondcompare to get this sorted, you have managed to add to both copies.

Well, needless to say I had a day I just went apeshit about a year ago and sorted everything. And whatever did not fit into my neat new code-folder(s) was mercilessly stuffed into the dark, obscure folder known only as “stuff” or “misc”.

My first basic compiler, awwwww how cute

Well its not all rubbish! I did find a special version of my kitchen-sink research IDE, the IDE I use when trying out new code for DWScript, PaxCompiler and FreePascal. I think it has 4 compilers in it or something, yet its under 30 megabytes in size! Pretty neat ūüôā

Visual Basic to Smart Pascal, awww.... just want to snuggle it :)

Visual Basic to Smart Pascal, awww…. just want to snuggle it ūüôā

It also features a unified file-system! You can mount packages, FTP folders, network paths or local folders – the IDE could not care less, it is completely abstracted from the actual filesystem and relates only to symbols and identifiers. The actual storage is dealt with by the filesource classes.

Package, ftp, socket, local folder - the IDE could not care less

Package, ftp, socket, local folder – the IDE could not care less

The Basic dialect here is essentially classical Visual Basic. The way you would write classes and deal with instances before dot net came along and forced all languages to look the same. I kinda like old visual basic, it had soul. It was completely useless except for scripting, and Delphi made it look like a joke – but as far as basic dialects go, it’s probably one of the cleanest and easiest to read.

DWScript, QTX Pascal (a custom fork of DWScript with a lot of cool features, some that Eric has added afterwards), Basic - and even freepascal! Thats the mother of all kitchen sinks!

DWScript, QTX Pascal (a custom fork of DWScript with a lot of cool features, some that Eric has added afterwards), Basic – and even freepascal! Thats the mother of all kitchen sinks!

The loss of QTX Pascal is a bit sad. I spent a couple of months doing my own fork of DWScript. Most of the features I added have now been included (although in another form) by Eric. But the codegen produced faster javascript. One of the things I spent some time on was optimization. Like getting rid of “variable = 0” type initialization if an assignment followed. I also added BeforeDestruction() and AfterConstruction() calls. This made an RTL a lot easier to write, but also more overhead. I was about to do conditional testing (so these would only be called if you actually used them) when I had to stop and work on Smart Mobile Studio again.

N++, exotic and functional!

This was one of my favorite language research projects. And dare I say, it is way ahead of it’s time. The idea is simple: with the future of computing being distributed, cloud based and powered by a multitude of computing modules from the US to South Africa¬†— what would a programming language look like if built for 100% asyncronous, distributed execution?

Consider for instance, the execution of a procedure. Simple right? Well, in a truly distributed system – that procedure could execute anywhere. The whole point here is to utilize the combined power of each module (pc); which means a tasks could execute on different computers. A procedure could also be split up by the runtime environment and once again, be executed all over the place – with the dispatcher keeping track of when it’s done and whatever the result was (if any).

Only a spesific and generated on-demand “runtime context” would follow the task. This would contain things like variables it needs to execute, public symbols, basicaly just enough for the code to run successfully. This also includes resource forks (a term i used for accessing resources regardless of where they may be on the network).

The language design was actually done using Smart Mobile Studio (you can read more about the initial ideas here), and the first test modules ran on node.js (so no fancy graphics available sadly).

But it was incredibly fun to play with! But also time consuming to design a language for an execution model that havent really been invented yet. You can read more about some of my ideas for the execution model here.

I dont even think this execution model is out of the MIT labs yet — but it will once quantum compute models become commercially available (click here for a Berkley University¬†published introduction to the basic computational models used in quantum information theory).

An N++ procedure (actually a microservice that consumed a WSDL service and calls it) looks like this:

program("service_test") {

  handshake {

    input {
      /* Consume WSDL Web-Service Endpoint */
      service1 @ service[URI:"http://www.test.com/SOAP/myService/WSDL/",
                 serviceType:SoapService];
    }

    output {
        myProcess @ process("self");
        stdio @ pipe("stdout");
      }
  }

  /* Execute RPC call */
  execute (stdio,service1)  {
    stdio:writeln("Calling webservice");
    execute (*)  {
      var int32 result = 0;
      set result = service1:getUserId("quartex","secret");
      stdio:writelnF("UserID on server={0}", result);
    } fail (e) {
      stdio.writelnF("Calling soap service failed: {0}",e);
      proceed;
    }
  }

  /* Exit code for process */
  set myProcess:exitCode = 0;
}

RML, Reduced markup language parser and runtime

Dont you just hate it when you lose the source-file for a programming language you made in a day, so now you dont have a clue what the syntax really was? I know, I know, but I honestly dont have the syntax or examples. It does with a disk ages ago.

RML was written to simplify generating HTML documents. There are a few things that you can optimize when it comes to HTML. First of all you know that the browser will read the document from top and downwards. This might not mean much at first glance, but it actually means that a language doesnt need the abillity to call emitter-code that has already executed (above). Think of it as being able to omit calls to procedure behind you in the source. That would be a mess for a language like delphi – but it doesnt impact a language that execute linear without any forms of jumps!

I seem to remember it looked something like this:

createobject(TRMLObject)
{
   writestr("testing");
   if(a=12)
   {
       readstr("mer testing");
   }
}
createobject(TRMLHeading)
{
   settitle("testing second object");
   if(a=12)
   {
       writestr("heading object successfully done");
   }
}
createobject(TRMLObject)
{
   writestr("testing third object");
   if(a=12)
   {
       readstr("mer testing from da third object");
   }
}

The idea is ofcourse that you have some pre-defined objects, a bit like custom-controls in Smart Pascal, and you create and populate these in a top-down fashion.
Once defined, generating quite advanced HTML documents can be done extremely fast. Much faster than ASP, DWS or php since no jumps are involved.

Here is how you compile and run a small debug session (drop a memo and button on a form first):

  var
    FContext: TRMLParseContext;
    FText:  String;
  begin
    FText:= 'createobject(TRMLObject)'
    +       '{'
    +       '   writestr("testing");'
    +       '   if(a=12)'
    +       '   {'
    +       '       readstr("mer testing");'
    +       '   }'
    +       '}'
    +       'createobject(TRMLHeading)'
    +       '{'
    +       '   settitle("testing second object");'
    +       '   if(a=12)'
    +       '   {'
    +       '       writestr("heading object successfully done");'
    +       '   }'
    +       '}'
    +       'createobject(TRMLObject)'
    +       '{'
    +       '   writestr("testing third object");'
    +       '   if(a=12)'
    +       '   {'
    +       '       readstr("mer testing from da third object");'
    +       '   }'
    +       '}';

    // double the code, just to get some payload
    FText := FText + FText;

    memo1.text:='';

    If RMLAllocContext(FContext,FText,[TRMLObject,TRMLHeading]) then
    Begin
      try
        if RMLCompile(FContext) then
        Begin
          caption:='Compilation was a success';
          If RMLExecute(FContext) then
          Begin
            Caption:='Executed successfully';
            memo1.text:=FContext.pcOutput;
          end;
        end else
        caption:=Format('compilation failed [%s]',[FContext.pcError]);
      finally
        RMLDisposeContext(FContext);
      end;
    end;

And here is the full code for the parser and runtime (still a few tidbits to work on, but its childs play). If you can make sense of it, knock yourself out ūüôā

  unit rml;

  interface

  uses sysutils, classes, dialogs;

  type

  TRMLObject  = Class;

  TRMLClass     = Class of TRMLObject;
  TRMLOperator  = (opLess,opMore,opEquals,opNotEquals);
  TRMLDataType  = (daString,daNumber,daBoolean,daCustom);
  TRMLEntryType = (etAssignment,etMethod,etRepeat,etCreate,etIf);
  TRMLObjProc   = Function (Const EntryData:Pointer):Boolean of Object;

  (* represents a single condition in an IF statement *)
  PRMLCondition = ^TRMLCondition;
  TRMLCondition = Packed Record
    coSource:   String;
    coTarget:   String;
    coOperator: TRMLOperator;
  End;

  (* represents a parsed code declaration *)
  PRMLEntryDeclaration = ^TRMLEntryDeclaration;
  TRMLEntryDeclaration = Packed Record
    cxToken:      String;
    cxCondition:  String;
  End;

  (* represents a compiled parameter *)
  PRMLParameter = ^TRMLParameter;
  TRMLParameter = Packed Record
    prType:   TRMLDataType;
    prValue:  Pointer;
    prSize:   Integer;
  End;

  (* represents a compiled code entry *)
  PRMLEntryData = ^TRMLEntryData;
  TRMLEntryData = Packed Record
    edtype:         TRMLEntryType;
    edDeclaration:  TRMLEntryDeclaration;
    edObject:       TRMLObject;
    edParent:       PRMLEntryData;
    edMethod:       TRMLObjProc;
    edContext:      Pointer;
    edConditions:   Array of PRMLCondition;
    edParameters:   Array of PRMLParameter;
    edSubEntries:   Array of PRMLEntryData;
  End;

  PRMLParseContext = ^TRMLParseContext;
  TRMLParseContext = Packed Record
    pcSignature:  Integer;
    pcCol:        Integer;
    pcRow:        Integer;
    pcPos:        Integer;
    pcLen:        Integer;
    pcData:       String;
    pcError:      String;
    pcOutput:     String;
    pcRoot:       TRMLEntryData;
    pcClasses:    Array of TRMLClass;
  End;

  TRMLReadProc  = Function  (Var OutData;var Bytes:Integer):Boolean of Object;
  TRMLWriteProc = Function  (Var InData;Const Bytes:Integer):Boolean of Object;
  TRMLProcEntry = Function  (Const Entry:PRMLEntryData):Boolean of Object;

  PRMLObjectIndex = ^TRMLObjectIndex;
  TRMLObjectIndex = Packed Record
    oiMethods:    Array of record
                    omName:   String;
                    omSyntax: Array of TRMLDataType;
                    omEntry:  TRMLProcEntry;
                  end;
    oiProperties: Array of record
                    oiName:   String;
                    oiRead:   TRMLReadProc;
                    oiWrite:  TRMLWriteProc;
                    oiType:   TRMLDataType;
                  end;
  End;

  TRMLObject = Class(TObject)
  Private
    FIndexData: TRMLObjectIndex;
    FIndexPTR:  PRMLObjectIndex;
  Private
    Function    DoWriteStr(Const Entry:PRMLEntryData):Boolean;
    Function    DoReadStr(Const Entry:PRMLEntryData):Boolean;
  protected
    Procedure   Output(Const Context:PRMLParseContext;Const Value:String);
    Procedure   RegisterProperty(Const Name:String;Const DataType:TRMLDataType;
                Const _Read:TRMLReadProc;Const _Write:TRMLWriteProc);
    Procedure   RegisterMethod(Const Name:String;
                Const Syntax: Array of TRMLDataType;
                Const Entry: TRMLProcEntry);
  Public
    Property    ObjectIndex:PRMLObjectIndex read FIndexPTR;
    Constructor Create;virtual;
  End;

  TRMLHeading = Class(TRMLObject)
  Private
    FTitle:     String;
    Function    DoSetTitle(Const Entry:PRMLEntryData):Boolean;
    Function    DoReadTitle(Var OutData;var Bytes:Integer):Boolean;
    Function    DoWriteTitle(Var InData;Const Bytes:Integer):Boolean;
  Public
    property    Title:String read FTitle write FTitle;
    Constructor Create;override;
  End;

  Function  RMLAllocContext(var Context:TRMLParseContext;
            Const Source:String;Const ClsBase:Array of TRMLClass):Boolean;
  Function  RMLDisposeContext(Var Context:TRMLParseContext):Boolean;
  Function  RMLCompile(Var Context:TRMLParseContext):Boolean;
  Function  RMLExecute(Const Context:TRMLParseContext):Boolean;
  Function  RMLParseEntry(Value:String;
            var Declaration:TRMLEntryDeclaration;var Error:String):Boolean;

  implementation

  //###########################################################################
  // TRMLHeading
  //###########################################################################

  Constructor TRMLHeading.Create;
  Begin
    inherited;
    RegisterMethod('settitle',[daString],DoSetTitle);
    RegisterProperty('title',daString,DoReadTitle,DoWriteTitle);
  end;

  Function TRMLHeading.DoSetTitle(Const Entry:PRMLEntryData):Boolean;
  var
    FTemp:  String;
  Begin
    result:=length(Entry^.edParameters)>0;
    If result and (Entry^.edParameters[0].prType=daString) then
    DoWriteTitle(Entry^.edParameters[0].prValue^,
    Entry^.edParameters[0].prSize);
  end;

  Function TRMLHeading.DoReadTitle(Var OutData;var Bytes:Integer):Boolean;
  Begin
    Bytes:=Length(FTitle);
    If Bytes>0 then
    move(FTitle[1],outData,Bytes);
  end;

  Function TRMLHeading.DoWriteTitle(Var InData;Const Bytes:Integer):Boolean;
  Begin
    SetLength(FTitle,bytes);
    if Bytes>0 then
    move(inData,FTitle[1],Bytes);
  end;

  //###########################################################################
  // TRMLObject
  //###########################################################################

  Constructor TRMLObject.Create;
  begin
    inherited;
    FIndexPTR:=@FIndexData;
    RegisterMethod('writestr',[daString],DoWriteStr);
    RegisterMethod('readstr',[daString],DoReadStr);
  end;

  Procedure TRMLObject.Output(Const Context:PRMLParseContext;
            Const Value:String);
  Begin
    Context^.pcOutput:=Context^.pcOutput + Value;
  end;

  Function TRMLObject.DoReadStr(Const Entry:PRMLEntryData):Boolean;
  var
    FText:  String;
  Begin
    result:=True;
    With Entry^ do
    Begin
      FText:='Token: ' + edDeclaration.cxToken + #13#10;
      FText:=FText + 'Condition: ' + edDeclaration.cxCondition + #13#10;
      FText:=FText + #13#10;
      Output(edContext,FText);
    end;
  end;

  Function TRMLObject.DoWriteStr(Const Entry:PRMLEntryData):Boolean;
  var
    FText:  String;
  Begin
    result:=True;
    With Entry^ do
    Begin
      FText:='Token: ' + edDeclaration.cxToken + #13#10;
      FText:=FText + 'Condition: ' + edDeclaration.cxCondition + #13#10;
      FText:=FText + #13#10;
      Output(edContext,FText);
    end;
  end;

  Procedure   TRMLObject.RegisterProperty(Const Name:String;
              Const DataType:TRMLDataType;
              Const _Read:TRMLReadProc;Const _Write:TRMLWriteProc);
  var
    FCount: Integer;
  Begin
    //FCount:=high(FIndexData.oiProperties) - Low(FIndexData.oiProperties) + 1;
    FCount:=Length(FIndexData.oiProperties);
    SetLength(FIndexData.oiProperties,FCount+1);
    FIndexData.oiProperties[FCount].oiName:=Name;
    FIndexData.oiProperties[FCount].oiRead:=_Read;
    FIndexData.oiProperties[FCount].oiWrite:=_Write;
    FIndexData.oiProperties[FCount].oiType:=DataType;
  end;

  Procedure   TRMLObject.RegisterMethod(Const Name:String;
              Const Syntax: Array of TRMLDataType;
              Const Entry: TRMLProcEntry);
  var
    FCount: Integer;
    FTemp:  Integer;
  Begin
    //FCount:=high(FIndexData.oiMethods) - Low(FIndexData.oiMethods) + 1;
    FCount:=Length(FIndexData.oiMethods);
    SetLength(FIndexData.oiMethods,FCount+1);
    FIndexData.oiMethods[FCount].omName:=Name;
    FIndexData.oiMethods[FCount].omEntry:=Entry;

    //FTemp:=high(Syntax) - Low(Syntax) + 1;
    FTemp:=Length(Syntax);
    If FTemp>0 then
    Begin
      SetLength(FIndexData.oiMethods[FCount].omSyntax,FTemp);
      for FTemp:=low(syntax) to high(syntax) do
      FIndexData.oiMethods[FCount].omSyntax[FTemp]:=Syntax[FTemp];
    end;
  end;

  //###########################################################################
  // RML util methods
  //###########################################################################

  Function RMLContainChars(Const Value:String;const Chars:String):Boolean;
  var
    x:  Integer;
  Begin
    result:=True;
    for x:=1 to length(chars) do
    Begin
      if pos(chars[x],Value)<1 then
      Begin
        result:=False;
        Break;
      end;
    end;
  end;

  Function  RMLScanFor(const Value:String;Const Target:CHAR;
            Const Breakers:String;var Len:Integer):Boolean;
  var
    xpos: Integer;
  Begin
    result:=False;
    Len:=-1;
    xpos:=1;
    while xpos<=Length(Value) do     Begin       If Value[xpos]=Target then       Begin         Len:=xpos-1;         Result:=True;         Break;       end else       Begin         if pos(Value[xpos],Breakers)>0 then
        Break;
      end;
      inc(xpos);
    end;
  end;

  Function RMLisNumber(Const Value:String):Boolean;
  const
    CHARSET = '0123456789';
  var
    x:  Integer;
  Begin
    Result:=True;
    for x:=1 to length(Value) do
    Begin
      if pos(Value[x],CHARSET)<1 then
      Begin
        result:=False;
        Break;
      end;
    end;
  end;

  Function RMLIsBoolean(Const Value:String):Boolean;
  var
    FTemp:  String;
  Begin
    FTemp:=lowercase(trim(Value));
    result:=(FTemp='false') or (FTemp='true');
  end;

  Function RMLisString(Const Value:String):Boolean;
  var
    x:      Integer;
    FLeft:  Integer;
  Begin
    result:=False;
    FLeft:=0;
    (* check left side *)
    for x:=1 to length(Value) do
    Begin
      if Value[x]='"' then
      Begin
        FLeft:=x;
        Result:=True;
        Break;
      end else
      if Value[x]<>#32 then
      Break;
    end;
    (* check right side *)
    If result then
    Begin
      for x:=Length(Value) downto 1 do
      Begin
        if Value[x]='"' then
        Begin
          If x>FLeft then
          Break else
          Begin
            Result:=False;
            Break;
          end;
        end else
        if Value[x]<>#32 then
        Break;
      end;
    end;
  end;

  Function  RMLParseEntry(Value:String;
            var Declaration:TRMLEntryDeclaration;
            var Error:String):Boolean;
  var
    xpos: Integer;
  Begin
    fillchar(Declaration,SizeOf(Declaration),0);
    Result:=RMLContainChars(value,'()');
    if Result then
    Begin
      Result:=RMLScanFor(value,'(',')',xpos);
      if result then
      Begin
        Declaration.cxToken:=trim(copy(value,1,xpos));
        delete(value,1,xpos+1);
        Result:=RMLScanFor(value,')','(',xpos);
        if result then
        Begin
          Value:=TrimRight(Value);
          Result:=xpos=(length(value)-1);
          if result then
          Declaration.cxCondition:=trim(Copy(Value,1,xpos));
        end;
      end;
    end;
    If not Result then
    Error:='Invalid entry <' + value + '>';
  end;

  Function  RMLAllocContext(var Context:TRMLParseContext;
            Const Source:String;Const ClsBase:Array of TRMLClass):Boolean;
  var
    FCount: Integer;
  Begin
    If Context.pcSignature=SizeOf(Context) then
    RMLDisposeContext(Context);

    fillchar(Context,SizeOf(Context),#0);
    Context.pcSignature:=SizeOf(Context);
    Context.pcLen:=Length(Source);
    Context.pcData:=Source;

    FCount:=High(clsBase) - low(clsBase)+1;
    If FCount>0 then
    Begin
      SetLength(Context.pcClasses,FCount);
      for FCount:=Low(clsBase) to high(clsBase) do
      Context.pcClasses[FCount]:=clsBase[FCount];
    end;

    result:=True;
  end;

  Procedure RMLDisposeEntryData(Const Data:PRMLEntryData);
  var
    FTemp:  Integer;
    x:      Integer;
  Begin
    (* dispose of condition data *)
    FTemp:=length(Data^.edConditions);
    While FTemp>0 do
    Begin
      Dispose(Data^.edConditions[FTemp-1]);
      dec(FTemp);
    end;
    SetLength(Data^.edConditions,0);

    (* dispose of parameter data *)
    FTemp:=length(Data^.edParameters);
    While FTemp>0 do
    Begin
      If length(Data^.edParameters)>0 then
      Begin
        for x:=Low(Data^.edParameters) to high(Data^.edParameters) do
        If Data^.edParameters[x]^.prSize>0 then
        FreeMem(Data^.edParameters[x]^.prValue);
      end;
      Dispose(Data^.edParameters[FTemp-1]);
      dec(FTemp);
    end;
    SetLength(Data^.edParameters,0);

    (* dispose of sub entries *)
    //Ftemp:=High(Data^.edSubEntries)-Low(Data^.edSubEntries)+1;
    FTemp:=Length(Data^.edSubEntries);
    While FTemp>0 do
    Begin
      RMLDisposeEntryData(Data^.edSubEntries[FTemp-1]);
      Dec(FTemp);
    end;
    SetLength(Data^.edSubEntries,0);

    (* dispose of script object *)
    If  not (Data^.edtype in [etIf,etRepeat,etMethod])
    and (Data^.edObject<>NIL) then
    Data^.edObject.free;

    (* dispose of entry *)
    Dispose(Data);
  end;

  Function  RMLDisposeContext(Var Context:TRMLParseContext):Boolean;
  var
    FCount: Integer;
  Begin
    Result:=Context.pcSignature=SizeOf(Context);
    If Result then
    Begin
      Context.pcSignature:=0;
      Context.pcData:='';
      Context.pcError:='';
      FCount:=Length(Context.pcRoot.edSubEntries);
      //FCount:=High(Context.pcRoot.edSubEntries)
      //- Low(Context.pcRoot.edSubEntries) + 1;
      While FCount>0 do
      Begin
        RMLDisposeEntryData(Context.pcRoot.edSubEntries[FCount-1]);
        dec(FCount);
      end;
      SetLength(Context.pcRoot.edSubEntries,0);
    end;
  end;

  //###########################################################################
  // RML core methods
  //###########################################################################

  Function  RMLImplements(Const MethodName:String;
            Const obj:TRMLObject):Boolean;
  var
    FTable: PRMLObjectIndex;
    FCount: Integer;
  Begin
    Result:=Obj<>NIL;
    If result then
    Begin
      (* get object inex *)
      FTable:=Obj.ObjectIndex;
      //FCount:=High(FTable^.oiMethods) - low(FTable^.oiMethods) + 1;
      FCount:=Length(FTable^.oiMethods);
      Result:=FCount>0;
      If Result then
      Begin
        for FCount:=low(FTable^.oiMethods) to high(FTable^.oiMethods) do
        Begin
          Result:=FTable^.oiMethods[FCount].omName=MethodName;
          If Result then
          Break;
        end;
      end;
    end;
  end;

  Function  RMLGetMethodEntry(Const MethodName:String;
            Const obj:TRMLObject;var outEntry:TRMLObjProc):Boolean;
  var
    FTable: PRMLObjectIndex;
    FCount: Integer;
  Begin
    Result:=Obj<>NIL;
    If result then
    Begin
      (* get object inex *)
      FTable:=Obj.ObjectIndex;
      //FCount:=High(FTable^.oiMethods) - low(FTable^.oiMethods) + 1;
      FCount:=Length(FTable^.oiMethods);
      Result:=FCount>0;
      If Result then
      Begin
        for FCount:=low(FTable^.oiMethods) to high(FTable^.oiMethods) do
        Begin
          Result:=FTable^.oiMethods[FCount].omName=MethodName;
          If Result then
          Begin
            outEntry:=TRMLObjProc(FTable^.oiMethods[FCount].omEntry);
            Break;
          end;
        end;
      end;
    end;
  end;

  Function  RMLCreateObject(var Context:TRMLParseContext;
            Const Objname:String;var outObject:TRMLObject;
            var Error:String):Boolean;
  var
    FCount: Integer;
  Begin
    FCount:=High(Context.pcClasses) - Low(Context.pcClasses) + 1;
    Result:=FCount>0;
    if Result then
    Begin
      For FCount:=Low(Context.pcClasses) to high(Context.pcClasses) do
      Begin
        Result:=lowercase(Context.pcClasses[FCount].ClassName)=lowercase(objName);
        If result then
        Begin
          outObject:=Context.pcClasses[FCount].Create;
          Break;
        end;
      end;
    end;
    If not Result then
    Error:='Unknown class <' + Objname + '>';
  end;

  Function  RMLAddEntry(var Context:TRMLParseContext;
            Var Declaration:TRMLEntryDeclaration;
            Root:PRMLEntryData;var NewEntry:PRMLEntryData;
            var Error:String):Boolean;
  var
    FCount: Integer;
    x:      Integer;
    FTemp:  String;
    FLen:   Integer;
    FPar:   PRMLParameter;
  Begin
    Result:=Root<>NIL;
    If result then
    Begin
      (* create new entry *)
      new(NewEntry);

      (* Reset entry record *)
      NewEntry^.edType:=etAssignment;
      NewEntry^.edObject:=NIL;
      NewEntry^.edMethod:=NIL;
      SetLength(NewEntry^.edConditions,0);
      SetLength(NewEntry^.edParameters,0);
      SetLength(NewEntry^.edSubEntries,0);

      (* Set basic values *)
      NewEntry^.edParent:=Root;
      NewEntry^.edContext:=@Context;
      newEntry^.edDeclaration:=Declaration;

      (* insert entry into parent *)
      FCount:=Length(Root^.edSubEntries);
      SetLength(Root^.edSubEntries,FCount+1);
      Root^.edSubEntries[FCount]:=NewEntry;

      (* tokenize *)
      If declaration.cxToken='createobject' then
      Begin
        NewEntry^.edtype:=etCreate;
        Result:=RMLCreateObject
          (
          Context,declaration.cxCondition,
          NewEntry^.edObject,Error
          );
      end else

      if declaration.cxToken='if' then
      Begin
        NewEntry^.edtype:=etIF;
        NewEntry^.edObject:=NewEntry^.edParent.edObject;
      end else

      if declaration.cxToken='repeat' then
      NewEntry^.edtype:=etRepeat else

      Begin
        (* method call? Make sure entry object supports this *)
        Result:=NewEntry^.edParent.edObject<>NIL;
        If Result then
        Begin
          (* check if object supports the method name *)
          Result:=RMLImplements(declaration.cxToken,NewEntry^.edParent.edObject);
          If Result then
          Begin
            (* Query object for method entry *)
            Result:=RMLGetMethodEntry
              (
              declaration.cxToken,
              NewEntry^.edParent.edObject,
              NewEntry^.edMethod
              );

            If result then
            Begin
              NewEntry^.edtype:=etMethod;
              NewEntry^.edObject:=NewEntry^.edParent.edObject;

              (* now parse the parameter conditions *)
              x:=0;
              While x<Length(declaration.cxCondition) do               Begin                 inc(x);                 If (declaration.cxCondition[x]=',')                 or (x=Length(declaration.cxCondition)) then                 Begin                   If x=Length(declaration.cxCondition) then                   FTemp:=FTemp + declaration.cxCondition[x];                   FTemp:=trim(FTemp);                   If length(FTemp)>0 then
                  Begin
                    (* create a new parameter *)
                    FLen:=length(NewEntry^.edParameters);
                    setlength(NewEntry^.edParameters,FLen+1);

                    New(FPar);

                    If RMLIsString(FTemp) then
                    FPar^.prType:=daString else
                    if RMLIsNumber(FTemp) then
                    FPar^.prType:=daNumber else
                    if RMLIsBoolean(FTemp) then
                    FPar^.prType:=daBoolean else
                    FPar^.prType:=daCustom;

                    Case FPar^.prType of
                    daString:
                      Begin
                        Delete(FTemp,1,1);
                        Delete(FTemp,length(FTemp),1);
                        FPar^.prSize:=Length(FTemp);
                        FPar^.prValue:=AllocMem(FPar^.prSize);
                        move(FTemp[1],FPar^.prValue^,FPar^.prSize);
                      end;
                    daNumber:
                      Begin
                        FPar^.prSize:=SizeOf(Integer);
                        FPar^.prValue:=AllocMem(FPar^.prSize);
                        PInteger(FPar^.prValue)^:=StrToInt(FTemp);
                      end;
                    daBoolean:
                      Begin
                      end;
                    daCustom:
                      Begin
                      end;
                    end;

                    NewEntry^.edParameters[FLen]:=FPar;
                    FTemp:='';
                  end else
                  Begin
                    //Invalid parameter error
                  end;
                end else
                FTemp:=FTemp + declaration.cxCondition[x]
              end;

              {
              Validate parameter datatypes here!
              If  (Length(NewEntry^.edParameters)>0) then
              Begin
                for x:=Low(NewEntry^.edParameters) to
                high(NewEntry^.edParameters) do
                Begin
                  newEntry^.edObject.
                end;
              end;  }

            end;

          end else
          Begin
            // property assignment test here
          end;
        end;
      end;

      (* Failed to tokenize? *)
      If not Result then
      Begin
        (* dispose of entry data *)
        Dispose(NewEntry);
        NewEntry:=NIL;
        SetLength(Root^.edSubEntries,FCount);
        Context.pcError:=Format('Invalid token "%s"',[declaration.cxToken]);
      end;

    end else
    Error:='AddEntry failed, root can not be NIL';
  end;

  Function  RMLParseObject(Var Context:TRMLParseContext;
            Const Root:PRMLEntryData):Boolean;
  var
    FChar:        Char;
    FTemp:        String;
    FDeclaration: TRMLEntryDeclaration;
    FNewEntry:    PRMLEntryData;
  Begin
    Result:=Context.pcSignature=SizeOf(Context);
    If result then
    Begin
      (* update cursor *)
      inc(Context.pcPos);
      inc(Context.pcCol);

      while Result and (Context.pcPos<Context.pcLen) do
      Begin
        FChar:=Context.pcData[Context.pcPos];
        Case FCHAR of
        #13:
          Begin
            inc(Context.pcRow);
            inc(Context.pcPos);
            Context.pcCol:=0;
            Continue;
          end;
        ';':
          Begin
            Result:=RMLParseEntry(trim(FTemp),FDeclaration,Context.pcError);
            If result then
            Result:=RMLAddEntry
              (
                Context,
                FDeclaration,
                root,
                FNewEntry,
                Context.pcError
              );

            If Result then
            Begin
              inc(Context.pcPos);
              inc(Context.pcCol);

              If FNewEntry^.edtype=etIF then
              Result:=RMLParseObject(Context,FNewEntry);
            end;
            FTemp:='';
          end;
        '{':
          Begin
            Result:=RMLParseEntry(FTemp,FDeclaration,Context.pcError);
            If Result then
            Begin
              Result:=FDeclaration.cxToken='if';
              If result then
              Begin
                Result:=RMLAddEntry
                  (
                  Context,
                  FDeclaration,
                  Root,
                  FNewEntry,
                  Context.pcError
                  );
                If Result then
                Result:=RMLParseObject(Context,FNewEntry);
              end;
            end;
            FTemp:='';
          end;
        '}':
          Begin
            inc(Context.pcCol);
            inc(Context.pcPos);
            Break;
          end;
        else
          Begin
            FTemp:=FTemp + FChar;
            inc(Context.pcCol);
            inc(Context.pcPos);
          end;
        end;
      end;
    end;
  end;

  Function RMLExecute(Const Context:TRMLParseContext):Boolean;

    Function RunEntry(Const Item:PRMLEntryData):Boolean;
    var
      FSubCount:  Integer;
      x:          Integer;
    Begin
      result:=True;

      Case Item^.edtype of
      etCreate,
      etAssignment:
        Begin

          FSubCount:=Length(Item^.edSubEntries);
          for x:=1 to FSubCount do
          Begin
            result:=RunEntry(Item^.edSubEntries[x-1]);
            If not result then
            break;
          end;

        end;
      etMethod:
        Begin
          {result:=RMLGetMethodEntry(Item^.edDeclaration.cxToken,
          Item^.edObject,FEntry);
          If result then
          result:=FEntry(Item); }
          Result:=TRMLProcEntry(Item^.edMethod)(Item);
        end;
      etRepeat:
        Begin
          //FSubCount:=Length(Item^.edSubEntries);
        end;
      etIf:
        Begin
          FSubCount:=Length(Item^.edSubEntries);
          for x:=1 to FSubCount do
          RunEntry(Item^.edSubEntries[x-1]);
        end;
      end;
    End;

  Begin
    Result:=Context.pcSignature=SizeOf(Context);
    If result then
    Begin
      result:=length(Context.pcError)<1;
      If result then
      result:=RunEntry(@Context.pcRoot);
    end;
  end;

  Function RMLCompile(Var Context:TRMLParseContext):Boolean;
  var
    FChar:        Char;
    FTemp:        String;
    FDeclaration: TRMLEntryDeclaration;
    FNewEntry:    PRMLEntryData;
  Begin
    Result:=Context.pcSignature=SizeOf(Context);
    If result then
    Begin
      Context.pcCol:=0;
      Context.pcRow:=0;
      Context.pcPos:=1;
      Context.pcError:='';

      while Result and (Context.pcPos<Context.pcLen) do
      Begin
        FChar:=Context.pcData[Context.pcPos];
        Case FCHAR of
        #13:
          Begin
            inc(Context.pcRow);
            inc(Context.pcPos);
            Context.pcCol:=0;
            Continue;
          end;
        '{':
          Begin
            Result:=RMLParseEntry(FTemp,FDeclaration,Context.pcError);
            If Result then
            Begin
              Result:=FDeclaration.cxToken='createobject';
              If result then
              Begin
                Result:=RMLAddEntry
                  (
                  Context,
                  FDeclaration,
                  @Context.pcRoot,
                  FNewEntry,
                  Context.pcError
                  );
                If Result then
                Result:=RMLParseObject(Context,FNewEntry);
              end;
            end;
            FTemp:='';
          end;
        '}':
          Begin
          end;
        else
          Begin
            FTemp:=FTemp + FChar;
            inc(Context.pcCol);
            inc(Context.pcPos);
          end;
        end;
      end;
    end;
  end;

  end.

N++ context parser, part 2

December 29, 2014 Leave a comment
nodeJS rocks!

nodeJS rocks!

Right. In my last article on parsing I introduced a basic buffer class and skeleton parser. In this article I introduce a working 2 level parser, meaning recursive, which implements both the program() and execute() code blocks. Being recursive means that you can now nest several execute() sections inside each other. But at this stage that’s pretty much all I’ve had time to implement — so consider it a rough sketch.

Support

At the moment we have a program parser, which naturally parses a program block which looks like this:

program("name") {
}

Registered for the program parser, is a single sub-parser for the “execute” keyword, this is the class TQTXExecuteParser class. This allows you to have execute() sections within a program block, as such:

program("name") {
  execute (scope) {
    /* Code goes heres */
  }
}

To avoid recursive stack errors (stack overflow), the execute class does a trick. Namely that it registers support for itself (execute keyword) at parse-time, meaning that an execute block will support execute child blocks, like this:

program("name") {
  execute (scope) {
    /* Sub execute block */
    execute (scope) {
      /* your code here */
    }
  }
}

Well, basically this is the base system in place. Adding new keywords and features is now a matter of creating sub parser classes and registering them to either the execute class or the program parser class. The benefit of such a simple system is that it’s fast, easy to understand and use – and easy to maintain. Concrete rules for parsing can be defined, and a lot of code re-cycled between the base-class (TQTXCustomParser) and it’s decendants.
With a bit of work you will have a fine-tuned, fast and efficient language parser.

Supporting other languages

As you probably guessed it’s not that hard to add support for other languages. I mean once you take the time to implement the basic form of a language unit or program-file. But it’s always a time consuming task, where even the tiniest detail matters. If you miss a single language rule (like say, not skippig a comma after it’s been consumed) the error will spread into the next parser and so on – until an exception occurs.

But, once you get the hang of chewing through a source-buffer, supporting enough parsers and keywords (and for Delphi also states, like the “Inerface” keyword state and the “implementation” keyword state) you can pretty much mix and match all you want.

Well, without going to far into the code — here is the updated source. You may want to drop memo control on your form and add the following code:

procedure TForm1.W3Button1Click(Sender: TObject);
var
  mContext: TQTXContext;
  mParser:  TQTXLPPParser;
begin
  mContext:=TQTXContext.Create;
  try
    mParser:=TQTXLPPParser.Create(mContext);
    try
      mContext.Buffer.LoadFromString(w3memo1.text);

      try
        mParser.Parse;
      except
        on e: exception do
        writeln(e.message);
      end;

    finally
      mParser.free;
    end;
  finally
    mContext.free;
  end;
end;

And set the content of the memo to:

procedure TForm1.InitializeForm;
begin
  inherited;
  w3memo1.Text:=#'program("test") {
          execute (*) {
            /* Code here */
          }
        }';
end;

Ok — here is the updated parser code:

 

unit qtxparser;

interface

uses
  SmartCL.System;


const
  TOK_ROUNDOPEN   = 1;  //  "("
  TOK_ROUNDCLOSE  = 2;  //  ")"
  TOK_CUROPEN     = 3;  //  "{"
  TOK_CURCLOSE    = 4;  //  "}"
  TOK_SPACE       = 5;  //  " "

  TOK_ADD         = 6;  //  "+"
  TOK_SUBTRACT    = 7;  //  "-"
  TOK_DIVIDE      = 8;  //  "/"
  TOK_POWER       = 9;  //  "^"

  TOK_COMMA       =10;  //  ","
  TOK_COLON       =11;  //  ";"

  Numeric_Operators: Array [0..3] of Integer =
    (TOK_ADD,TOK_SUBTRACT,TOK_DIVIDE,TOK_POWER);

type

  ENPPException = Class(EW3Exception);

  TLPPApplicationObject = Class;

  TQTXBuffer = Class(TObject)
  private
    FData:    String;
    FIndex:   Integer;
    FLineNr:  Integer;
  protected
    function  getCurrent:String;virtual;
  public

    Property  LineNr:Integer read FLineNr;

    Property  Current:String read getCurrent;
    Function  Back:Boolean;
    function  Next:Boolean;
    function  BOF:Boolean;
    function  EOF:Boolean;

    procedure SkipJunk;

    function  ReadTo(Const aChars:array of string;
              var text:String):Boolean;overload;
    function  ReadTo(aValue:String):Boolean;overload;
    function  ReadTo(aValue:String;var inner:String):Boolean;overload;

    function  ReadWord(var Text:String):Boolean;

    function  ReadToEOL(var text:String):Boolean;

    function  PeekAhead(aCount:Integer;var Text:String):Boolean;
    function  Compare(aText:String):Boolean;

    Property  TextFromCurrent:String read ( Copy(FData,FIndex,length(FData)-FIndex) );

    procedure First;

    procedure LoadFromString(Text:String);

    procedure Clear;
  End;

  TQTXContext = Class(TObject)
  Private
    FBuffer:    TQTXBuffer;
  public
    Property    Buffer:TQTXBuffer read FBuffer;
    Property    ApplicationObject:TLPPApplicationObject;
    Constructor Create;virtual;
    Destructor  Destroy;Override;
  end;

  TQTXCustomParser      = Class;
  TQTXCustomParserClass = Class of TQTXCustomParser;

  TQTXParserInfo = Class(TObject)
  public
    property  Instance: TQTXCustomParser;
    Property  Keyword:String;
  end;

  TQTXCustomParser = Class(TObject)
  private
    FContext:   TQTXContext;
    FParsers:   Array of TQTXParserInfo;
  protected
    function    getParserIndexFor(keyword:String;
                var aIndex:Integer):Boolean;
    function    getParserInstanceFor(keyword:String;
                var aParser:TQTXCustomParser):Boolean;
    procedure   RegisterParser(keyword:String;
                const aParser:TQTXCustomParser);
  protected
    procedure   ParseAsExecuteBlock;virtual;
  protected
    Procedure   ClearParserInstances;virtual;
  public
    Property    Context:TQTXContext read FContext;
    procedure   Parse;virtual;
    Constructor Create(aContext:TQTXContext);virtual;
    Destructor  Destroy;Override;
  end;

  (* Parser for the keyword "program" *)
  TQTXProgramParser = Class(TQTXCustomParser)
  protected
    procedure   ParseProgramName;
    procedure   Parse;override;
  public
    Constructor Create(aContext:TQTXContext);override;
  end;

  TQTXExecuteParser = class(TQTXCustomParser)
  protected
    procedure   ParseScope;
    procedure   Parse;override;
  public
    Constructor Create(aContext:TQTXContext);override;
  end;

  TQTXLPPParser = Class(TQTXCustomParser)
  protected
    procedure   Parse;override;
  public
    Constructor Create(aContext:TQTXContext);override;
  end;

  //##########################################################################
  // In-memory program model classes
  //##########################################################################

  TLPPObject = Class(Tobject)
  end;

  TLPPModule = Class(TLPPObject)
  public
    Property  Name:String;
  end;
  TLPPModuleList = Array of TLPPModule;

  TLPPApplicationObject = Class(TLPPObject)
  private
    FModules: TLPPModuleList;
  public
    Property    ApplicationName:String;
    Property    Modules:TLPPModuleList read FModules;
    function    IndexOfModule(aModuleName:String):Integer;
    Procedure   Clear;
    Constructor Create;virtual;
    Destructor  Destroy;Override;
  end;

implementation

var
  TOK_SYM: Array[TOK_ROUNDOPEN..TOK_COLON] of string =
  ('(',')','{','}',' ','+','-','/','^',',',';');

//###########################################################################
// TLPPApplicationObject
//###########################################################################

Constructor TLPPApplicationObject.Create;
begin
  inherited Create;
end;

Destructor TLPPApplicationObject.Destroy;
Begin
  clear;
  inherited Destroy;
end;

function TLPPApplicationObject.IndexOfModule(aModuleName:String):Integer;
var
  x:  Integer;
Begin
  result:=-1;
  aModulename:=trim(lowercase(aModuleName));
  if length(aModuleName)>0 then
  begin
    for x:=0 to FModules.Count-1 do
    Begin
      if lowercase(FModules[x].Name)=aModuleName then
      begin
        result:=x;
        break;
      end;
    end;
  end;
end;

Procedure TLPPApplicationObject.Clear;
begin
  ApplicationName:='';
  while FModules.Count>0 do
  begin
    FModules[0].free;
    FModules[0]:=NIL;
    FModules.Delete(0,1);
  end;
end;

//###########################################################################
// TQTXLPPParser
//###########################################################################

Constructor TQTXLPPParser.Create(aContext:TQTXContext);
begin
  inherited Create(aContext);
  (* Register the "program" keyword-parser *)
  RegisterParser('program',TQTXProgramParser.Create(context));
end;

procedure TQTXLPPParser.Parse;
var
  mCache: String;
  mSubParser: TQTXCustomParser;
Begin

  repeat
    case Context.Buffer.Current of
    ' ':  Begin
            self.Context.Buffer.SkipJunk;
          end;
    '/':  Begin
            if context.buffer.Compare('//') then
            begin
              self.context.buffer.SkipJunk;
            end;
          end;
    '(':  Begin
            if Context.Buffer.Compare('(*') then
            Context.buffer.SkipJunk else
            Begin

              if getParserInstanceFor(mCache,mSubParser) then
              begin

                (* Invoke sub parser *)
                mSubParser.Parse;

                (* SKip trailing junk if any *)
                Context.Buffer.SkipJunk;

                (* clear cache *)
                mCache:='';

              end else
              raise ENPPException.CreateFmt
              ('Syntax error [%s]',[mCache]);

            end;
          end;
    else
      Begin
        mCache += Context.Buffer.Current;
      end;
    end;

    if not Context.buffer.next then
    break;
  until Context.buffer.EOF;
end;

//###########################################################################
// TQTXExecuteParser
//###########################################################################

Constructor TQTXExecuteParser.Create(aContext:TQTXContext);
Begin
  inherited Create(AContext);
end;

procedure TQTXExecuteParser.Parse;
var
  mDummy: Integer;
Begin
  if not self.getParserIndexFor('execute',mDummy) then
  Begin
    self.RegisterParser('execute',TQTXExecuteParser.Create(Context));
  end;


  (* Expects: Scope param list (), with a bare minumum of (STAR) *)
  //writeln(context.buffer.TextFromCurrent);
  writeln('Enters execute parser');
  context.buffer.SkipJunk;

  (* Check for "(" *)
  if Context.Buffer.Current=TOK_SYM[TOK_ROUNDOPEN] then
  Begin
    // Skip the "(" char
    Context.Buffer.Next;
    ParseScope;
  end;

  if context.buffer.ReadTo('{') then
  Begin
    Context.buffer.next;

    (* Parse content as an execute block *)
    self.ParseAsExecuteBlock;


  end else
  Raise Exception.Create('Syntax error, expected code-block entry error');


  writeln('leaves execute parser');
end;

procedure TQTXExecuteParser.ParseScope;
var
  mScope: String;
Begin
  (* Keep reading until we find ")", or break on others *)
  if context.buffer.ReadTo([')',';','{','}'],mScope) then
  Begin
    (* Valid stop? or one of the breakers? *)
    if context.buffer.Current=')' then
    Begin
      writeln('Defined scope for this execute block is:' + mScope);

      //skip ")" char
      context.buffer.next;

    end else
    Raise Exception.Create('Syntax error, expected scope terminator error');
  end;
end;

//###########################################################################
// TQTXProgramParser
//###########################################################################

Constructor TQTXProgramParser.Create(aContext:TQTXContext);
begin
  inherited Create(aContext);
  RegisterParser('execute',TQTXExecuteParser.Create(context));
end;

procedure  TQTXProgramParser.ParseProgramName;
var
  mTemp:  String;
Begin
  (* Expects:  ("NAME")
     As in Name of program *)
  mtemp:='';

  if Context.Buffer.Current='(' then
  begin
    Context.buffer.next;
    if Context.buffer.current='"' then
    begin
      Context.Buffer.next;

      if Context.buffer.readTo('"',mTemp) then
      begin
        Context.ApplicationObject.ApplicationName:=mTemp;

        Context.Buffer.next;

        Context.Buffer.SkipJunk;
        if Context.buffer.Current=TOK_SYM[TOK_ROUNDCLOSE] then
        Begin
          Context.Buffer.next;


        end else
        Raise Exception.Create('Syntax error, expected ")" param error');
      end else
      Raise Exception.Create('Syntax error, expected string termination error');

      //showmessage(mName);
    end else
    Raise Exception.Create('Syntax error, expected string identifier');
  end else
  Raise Exception.Create('Syntax error, expected parameter');

end;

procedure TQTXProgramParser.Parse;
Begin
  writeln(ClassName);
  context.buffer.SkipJunk;

  (* Check for "(" *)
  if Context.Buffer.Current=TOK_SYM[TOK_ROUNDOPEN] then
  Begin
    ParseProgramName;
    writeln('Program name:' + Context.ApplicationObject.ApplicationName);

    context.Buffer.SkipJunk;
    if context.Buffer.Current=TOK_SYM[TOK_CUROPEN] then
    Begin
      context.buffer.next;

      repeat
        ParseAsExecuteBlock;

        Context.buffer.SkipJunk;
        if context.buffer.Current=TOK_SYM[TOK_CURCLOSE] then
        break;
      until Context.Buffer.EOF;

    end else
    raise exception.create('Syntax error, expected program-block entry');
  end;

  writeln("Exiting TQTXProgramParer");

end;

//###########################################################################
// TQTXCustomParser
//###########################################################################

Constructor TQTXCustomParser.Create(aContext:TQTXContext);
Begin
  inherited Create;
  FContext:=aContext;

end;

Procedure TQTXCustomParser.ClearParserInstances;
Begin
  (* Release sub-parser instances if any *)
  while FParsers.Count>0 do
  Begin
    if FParsers[0].Instance<>NIL then
    Begin
      FParsers[0].Instance.free;
      FParsers[0].Instance:=NIL;
    end;
    FParsers[0].free;
    FParsers.Delete(0,1);
  end;
end;

Destructor TQTXCustomParser.destroy;
begin
  ClearParserInstances;
  inherited;
end;

function TQTXCustomParser.getParserIndexFor(keyword:String;
         var aIndex:Integer):Boolean;
var
  x:  Integer;
begin
  result:=False;
  aIndex:=-1;
  keyword:=lowercase(trim(keyword));
  if keyword.length>0 then
  Begin
    //if FParsers.Count>0 then
    //begin
      for x:=0 to FParsers.count-1 do
      Begin
        if sameText(FParsers[x].Keyword,keyword) then
        begin
          result:=true;
          aIndex:=x;
          break;
        end;
      end;
    //end;
  end;
end;

function TQTXCustomParser.getParserInstanceFor(keyword:String;
         var aParser:TQTXCustomParser):Boolean;
var
  x:  Integer;
begin
  result:=False;
  aParser:=NIL;
  keyword:=lowercase(trim(keyword));
  if keyword.length>0 then
  Begin
    //if FParsers.Count>0 then
    //begin
      for x:=0 to FParsers.count-1 do
      Begin
        if sameText(FParsers[x].Keyword,keyword) then
        begin
          result:=true;
          aParser:=FParsers[x].Instance;
          break;
        end;
      end;
    //end;
  end;
end;

procedure TQTXCustomParser.RegisterParser(keyword:String;
          const aParser:TQTXCustomParser);
var
  mIndex: Integer;
  mInfo:  TQTXParserInfo;
Begin
  keyword:=lowercase(trim(keyword));
  if keyword.length>0 then
  Begin
    if aParser<>NIL then
    begin
      if not getParserIndexFor(keyword,mIndex) then
      Begin
        (* register sub parser *)
        mInfo:=TQTXParserInfo.Create;
        mInfo.Instance:=aParser;
        mInfo.Keyword:=keyword;
        FParsers.Add(mInfo);
      end else
      raise ENPPException.createFmt
      ('Parser for keyword [%s] already registered',[keyword]);
    end else
    raise ENPPException.CreateFmt
    ('Parser class for keyword [%s] was NIL error',[keyword]);
  end;
end;

procedure TQTXCustomParser.Parse;
begin
  raise ENPPException.CreateFmt
  ('No parser implemented for class %s',[classname]);
end;

(* About:
   This method continues the parsing process as if it's parsing from
   within an execute() code block.
   This means that all the rules of  normal 'Execute( * ) "
   applies. After Program() this is the default parsing mode, which
   makes it the fallback of more or less all parse-objects which allows
   ordinary code (e.g "case/switch" sections and more) *)

procedure TQTXCustomParser.ParseAsExecuteBlock;
var
  mCache: String;
  mTemp:  String;
  mParser: TQTXCustomParser;
Begin

  while not Context.Buffer.EOF do
  Begin
     case Context.Buffer.Current of
     ' ',
     #9:  begin
            //Context.buffer.SkipJunk;
          end;
     '(': If context.Buffer.Compare('(*')
          and not Context.buffer.compare('(*)') then
          Begin
            Context.buffer.next;
            context.buffer.next;
            Context.Buffer.ReadTo('*)');
            Context.Buffer.Next;
            Context.Buffer.Next;
          end else
          Begin
            if mCache.length>0 then
            Begin
              writeln('Command is: "' + mCache + '"');

              if getParserInstanceFor(lowercase(mCache),mParser) then
              Begin
                writeln('Parser ' + mParser.classname + ' is registered for keyword ' + mCache);
                try
                  mParser.Parse;
                except
                  on e: exception do
                  raise Exception.Create(e.message);
                end;
              end else
              Raise Exception.create('Syntax error, unknown token:' + mCache);

              mCache :='';

              exit;
            end else
            Raise Exception.Create('Syntax error, unsuspected character');
          end;
     '/': If Context.buffer.compare('//') then
          begin
            Context.buffer.ReadToEOL(mTemp);
          end else
          if Context.buffer.compare('/*') then
          Begin
            Context.buffer.next;
            Context.buffer.Next;
            Context.buffer.ReadTo('*/');
          end;
     '{': Begin
            if mCache.length>0 then
            Begin
              if getParserInstanceFor(lowercase(mCache),mParser) then
              Begin
                showmessage('over here!');

              end else
              Raise Exception.create('Syntax error, unknown token:' + mCache);

            end else
            Raise Exception.Create('Syntax error, unexpected character');
          end;
     '}': Begin
          end;
     else
        Begin
          mCache += Context.buffer.Current;
        end;

     end;

    if not context.buffer.next then
    break;
  end;

  writeln('Exiting CodeBlock parser at:' + mCache);

end;

//###########################################################################
// TQTXContext
//###########################################################################

Constructor TQTXContext.Create;
begin
  inherited Create;
  FBuffer:=TQTXBuffer.create;
  ApplicationObject:=TLPPApplicationObject.Create;
end;

Destructor TQTXContext.destroy;
Begin
  FBuffer.free;
  ApplicationObject.free;
  inherited;
end;

//###########################################################################
// TQTXBuffer
//###########################################################################


procedure TQTXBuffer.First;
Begin
  If FData.length>0 then
  begin
    FLineNr:=1;
    FIndex:=0;
  end;
end;

procedure TQTXBuffer.SkipJunk;
var
  mTemp:  String;
begin
  repeat
    case Current of
    ' ',
    #9  :  Begin
            (* We treat TAB as space *)
            //if not next then
            //break;

          end;
    '/':  Begin
            (* Skip C style remark *)
            if Compare('/*') then
            begin
              if readTo('*/') then
              Begin
                Next;
                Next;
              end else
              raise ENPPException.Create('Comment not closed error');
            end else
            Begin
              (* Skip Pascal style remark *)
              if Compare('//') then
              begin
                if ReadToEOL(mTemp) then
                next else
                raise ENPPException.Create('Expected end of line error');
              end;
            end;
          end;
    '(':  Begin
            (* Skip pascal style remark *)
            if compare('(*')
            and not compare('(*)') then
            Begin
              if readTo('*)') then
              begin
                Next;
                Next;
              end else
              raise ENPPException.Create('Comment not closed error');
            end else
            break;
          end;
    #13:
        begin
          //
        end;
    else
      begin
        Break;
      end;
    end;
    next;
  until EOF;
end;

Procedure TQTXBuffer.Clear;
begin
  FData:='';
  FLineNr:=1;
  FIndex:=0;
end;

procedure TQTXBuffer.LoadFromString(Text:String);
Begin
  FLineNr:=1;
  FData:=trim(text);
  FData:=StrReplace(FData,#10,#13);
  FData:=StrReplace(FData,#13#13,#13);
  FIndex:=1;
  if length(FData)<1 then
  FIndex:=-1;
end;

function TQTXBuffer.getCurrent:String;
Begin
  result:=FData[FIndex];
end;

function TQTXBuffer.ReadWord(var Text:String):Boolean;
begin
  result:=False;
  Text:='';
  if not EOF then
  begin
    repeat
      if (current in ['A'..'Z','a'..'z','0'..'9']) then
      Text += current else
      break;
    until not next;
    result:=length(Text)>0;
  end;
end;

function TQTXBuffer.Compare(aText:String):Boolean;
var
  mData:  String;
Begin
  result:=PeekAhead(length(aText),mData)
  and SameText(lowercase(mData),lowercase(aText));
end;

function TQTXBuffer.PeekAhead(aCount:Integer;var Text:String):Boolean;
var
  mPos: Integer;
Begin
  result:=False;
  Text:='';

  if not EOF then
  Begin
    mPos:=FIndex;
    try
      while aCount>0 do
      begin
        Text+=Current;
        if not Next then
        break;
        dec(aCount);
      end;
      result:=length(text)>0;
    finally
      FIndex:=mPos;
    end;
  end;
end;

function TQTXBuffer.ReadToEOL(var text:String):Boolean;
Begin
  result:=ReadTo([#13,#10],text);
end;

function TQTXBuffer.ReadTo(aValue:String;var inner:String):Boolean;
var
  mText:  String;
begin
  inner:='';
  result:=False;

  aValue:=lowercase(aValue);
  if aValue.length>0 then
  begin

    repeat
      if PeekAhead(aValue.length,mText) then
      begin
        mText:=lowercase(mText);
        result:=SameText(mText,aValue);
        if result then
        break else
        inner += Current;
      end else
      inner += Current;
      Next;
    until EOF;

  end;
end;

function TQTXBuffer.readTo(aValue:String):Boolean;
var
  mText:  String;
begin
  result:=False;
  aValue:=lowercase(aValue);
  if aValue.length>0 then
  begin

    repeat
      if PeekAhead(aValue.length,mText) then
      begin
        mText:=lowercase(mText);
        result:=SameText(mText,aValue);
        if result then
        break;
      end;
      if not Next then
      break;
    until EOF;

    (* repeat
      if PeekAhead(aValue.length,mText)
      and SameText(lowercase(mtext),aValue) then
      Begin
        result:=true;
        break;
      end else
      if not next then
      break;
    until EOF; *)

  end else
  Raise ENPPException.Create
  ('ReadTo() failed, invalid target value error');
end;

function TQTXBuffer.ReadTo(Const aChars:Array of string;
         var text:String):Boolean;
var
  x:  Integer;
Begin
  result:=False;
  text:='';
  if aChars.Length>0 then
  begin
    for x:=FIndex to FData.length do
    Begin
      if (Current in aChars) then
      Begin
        result:=true;
        break;
      end else
      text+=Current;

      if not Next then
      break;
    end;
  end;
end;

Function TQTXBuffer.Back:Boolean;
begin
  result:=FIndex>1;
  if result then
  dec(FIndex);
end;

function TQTXBuffer.Next:Boolean;
begin
  Result:=FIndex<fdata.length; if="" result="" then="" begin="" inc(findex);="" (current="" in="" [#13,#10])="" inc(flinenr);="" end;="" function="" tqtxbuffer.bof:boolean;="" result:="FIndex=1;" tqtxbuffer.eof:boolean;="">=FData.Length;
end;


end.

N++ execution model

December 24, 2014 Leave a comment

When creating programming languages or translation-modules, it’s very important to allow the thought to grow and mature before you jump in and build the final product.

program("service_test") {

  handshake {

    input {
      /* Consume WSDL Web-Service Endpoint */
      service1 @ service[URI:"http://www.test.com/SOAP/myService/WSDL/",
                 serviceType:SoapService];
    }

    output {
        myProcess @ process("self");
        stdio @ pipe("stdout");
      }
  }

  /* Execute RPC call */
  execute (stdio,service1)  {
    stdio:writeln("Calling webservice");
    execute (*)  {
      var int32 result = 0;
      set result = service1:getUserId("quartex","secret");
      stdio:writelnF("UserID on server={0}", result);
    } fail (e) {
      stdio.writelnF("Calling soap service failed: {0}",e);
      proceed;
    }
  }

  /* Exit code for process */
  set myProcess:exitCode = 0;
}

The smallest detail can become either a strength or a weakness for the language. For instance, in the past couple of days I have realized the following:

First: Using “:” instead of “.” as¬†class or method accessor¬†saves time when parsing. Traditionally the¬†dot character is used when accessing a member of a class or object, for example: “myobject.mymethod(<params>);” is how you would call a procedure in nearly all languages. By altering this to “myobject:mymethod(<params>);” we are able to reserve the dot character for other things.

Second: Originally the combination of => and <= (arrow heads) was used to denote direction of data-flow inside the input() and output() blocks (handshake section). This turned out to be a bad idea because web-services and pure objects does not follow the concept of “consumer” or “producer”. A web-service contains both functions and procedures, and as such using data-flow arrowheads would require two registrations rather than one. Replacing the arrow-head symbolism with “@” (meaning “at”) makes more sense, since services and channels are also bi-directional.

The “@” at character is also used to denote “memory address of” (same as object-pascal) and “reference to”. For instance when passing an anonymous-procedure as a parameter you must prefix the in-place execute-block with “@”. See notes on anonymous procedures further down.

Third: To date the execute() program block allows you to define the variables visible to the block and shared between the module and the code inside the block. This has been poorly understood by commenters on this blog. The reason the execute() exists at all, as opposed to just typing the code in it’s place, is because a code block can execute in 4 different modes:

  • Blocking
  • Async
  • Threaded
  • Anonymous

An execute block is by definition anonymous since you cant apply a name to it as you would a method. However the data-scope (the variables visible to the block, or “known” by the block) can also be used when the block is threaded. This is why you have to define shared variables — or simply write use (*) as a parameter – which means “all”.

Linear execution

linear execution means that the block executes in the same thread as the main program. This means you actually dont have to define parameters, since all variables and objects of the program will be available to the block.

Execute (*) {
  stdio:writeln("linear execution");
}

Asyncronous execution

Async execution means that your code is executed in the same thread as the main program, but execution is done through interrupts, so that the program continues immediately and doesnt wait for the block to finish.

Execute async (stdio) {
  stdio:writeln("async execution");
}

Threaded execution

Threaded execution executes the block in it’s own process. Objects and variables you wish to share with the thread must be passed as parameters of the execute block. Note:under JavaScript this involves web-workers. nodeJS however supports real threads which is used by the code emitter.

Execute thread[Priority:Idle] (stdio,mySharedObject1,MySharedObject2)  {

  var msghandler = null;
  var myProcess = null;

  set msgHandler = open PIPE("messages");
  set myProcess = process("self");

  repeat(-1) {
    msghandler:waitfor(1000) {
      stdio:writeln("message recieved!");
    } fail (e) {
      criteria {
        myProcess:terminated = false;
      } execute (*) {
        /* Thread terminated, break free of wait loop */
        break;
      } else {
        /* thread not terminated, continue waiting
           for a message */
        continue;
      }
    }
  }

} finished {
  /* Code to execute when thread finishes */
}

The above code executes a code-block as a thread. The thread creates a pipe and waits for a message on the pipe (typically sent from the main program – but can also be triggered by another N++ application (or any application capable of creating and sending data over a pipe).

Anonymous procedures

Much like javascript N++ supports in-place anonymous procedures. Like other languages the declaration must have a compatible interface (parameters and their types):

Execute (*) {
  someclass:someproc( @execute(*){
      stdlib:writeLn("this execute block is anonymous");
      }, 100);
}

Hopefully this clears up the concept of execute-blocks and it’s parameters. The parameter list for the execute block simply defines what should be known for that block. This makes sense when you realize that the code-block can be executed as a completely separate thread.

Symbols so far

  • “:” Accessor, as in “object:method” or “interface:method”
  • “@” Reference to, address of, as in “myService @ process(‘name’);”
  • “<type | interface>” typecast as in, <ISomeInterface>SomeInstance:InterfaceMethod(Params);
  • “async” execute model, runs execute-block in async mode
  • “thread” execute model, runs execute-block as a separate thread
  • “[ attribute¬†]” accessor for attributes, arrays and dictionary elements. As in “object-factory[attribute-name:value, ..];”
    Also used as array accessors, as in “string name = Array[index]”
    Also used as dictionary accessors, as in object “temp = Dictionary[KEY];”

Attributes

N++ attributes are identical to what in other languages are called class-properties or class-methods, in that no instance needs to be constructed in order to change them. Attributes are typically used to initialize a class before you construct instances from it (an instance or “object” is what is created from a class, a class remains a blueprint or schema describing what you want to construct. This is classical OOP terminology).

* Note: The [ ] accessors are the same as those used for array access.

Under object pascal you would for instance write:

 type
    TMyClass = class
    strict private
    class var
       FRed: Integer;
       FGreen: Integer;
       FBlue: Integer;
     public
       class property Red: Integer read FRed write FRed;
       class property Green: Integer read FGreen write FGreen;
       class property Blue: Integer read FBlue write FBlue;
    end;

You would then initialize these values before you create an object from the class:

procedure TForm1.makeClass;
var
  mInstance: TMyClass;
Begin
  (* Initialize class level values *)
  TMyClass.red:=120;
  TMyClass.green:=19;
  TMyClass.blue:=22;

  (* construct instance from class.
     the values assigned to the class are mirrored on the instance,
     so the "red" property now contains 120 in our instance *)
  mInstance:=TMyClass.Create;
end;

Under N++ such special properties are called attributes and are not decorative like C# and C++ attributes. The word attribute and property (of something) means essentially the same thing; but under N++ we use them to differienciate between class and instance values. Attributes operate on the level of class (global), while properties operate on the level of instance (local).

Setting attributes

Attributes can be set directly or indirectly depending on the language construct. When obtaining service objects, which by nature are external processed without any constructors (external running programs, also running on other servers remotely) attributes function more or less as constructors:

    input {
      /* Consume WSDL Web-Service Endpoint */
      service1 @ service[URI:"http://www.test.com/SOAP/myService/WSDL/",
                 serviceType:SoapService];
    }

In the above example we set the attributes for the service we are going to use. The service can then be used through the identifier (service1). The “service” keyword is actually an object factory, which creates the invocation layer.

  /* create thread manually */
  var Thread myThread = new Thread
      [Priority:highest,name:"this is my thread"](*);

  /* set thread execution block */
  set @myThread:execute (*) {
  }

N++ accessing services

December 24, 2014 Leave a comment

Sometimes you come up with solutions which are a thing of pure beauty and elegance. Im falling in love with the simplicity of my N++ programming language, it’s robust boundaries which¬†creates a situation where errors are quickly identified and dealt with.

Below is the syntax for using a web-service. Notice that setting up an WSDL endpoint is no more difficult than setting up a stdout pipe (command-line output).

program("service_test") {

  handshake {

    input { 
      /* Consume WSDL Web-Service Endpoint */
      service1 @ service("http://www.test.com/SOAP/myService/WSDL/", serviceType:SoapService);
    }

    output {
        myProcess @ process("self");
        stdio @ pipe("stdout");
      }
  }

  /* Execute RPC call */
  execute (stdio,service1)  {
    stdio:writeln("Calling webservice");
    execute (*)  {
      var int32 result = 0;
      set result = service1:getUserId("quartex","secret");
      stdio:writelnF("UserID on server={0}", result);
    } fail (e) {
      stdio.writelnF("Calling soap service failed: {0}",e);
      proceed;
    }
  }

  /* Exit code for process */
  set myProcess:exitCode = 0;
}

For those of you interested in creating programming languages, I hope this wet’s your appetite to learn N++. At the moment the proof-of-concept is being written in Smart Pascal for nodeJS, but a native version running on Linux, Unix, Windows and OS X is the final product.

N++ is free to use. It is not open-source but created as a non-profit language which will be managed through an organization (more or less identical to php, perl and python).

To date, it’s the only language which is fully service oriented from the ground up.

Introduction to N++, a process oriented programming language

December 19, 2014 Leave a comment

For some time now I have been working on implementing various programming languages; writing parsers, sub parsers and dictionaries. To date these languages have been well-known languages, like pascal, visual basic and (to some extent) a subset of C#.

Programming languages are typically born out of two schools of thought: pure necessity, or just good old fun. I remember reading about the E programming language when I was a teenager for instance, which is a language that looks quite impressive – but serves little purpose. To be blunt: it doesnt give you anything in particular in terms of actual tools or advantages over, say, any other language out there.

And then there are languages which are just a complete waste of time, like brainfuck and it’s derived mathematical madness. If your idea of fun is jumping butt-naked¬†into an inverted number generator (read: PI) – then by all means, brainfuck is your language. But for the rest of us — perhaps something more tasteful and useful is in order..

Presenting N++

N++ is a language I am designing at the moment (N stands for Nandi, Shiva’s ox), or adding partial support for in the Quartex IDE. While I cant cover everything in a single blog post (and certantly not at this hour), I can present some concepts you may find interesting.

N is a language designed to collect, process and distribute data in large quantities; hence the strong power of the oxen Nandi. It belongs on the server, although there is no reason why you cant use it from inside a desktop application or as a service.

Let’s start with the classical hello world:

program ("hello_world") {
  handshake {
    input  { void; }
    output { void; }
  }

  execute(*) {
    writeln("hello world");
  }
}

Let’s look at the program in broad terms. Since most of you are programmers you probably have some idea what is going on in the above code. The [pre]program()[/pre] block defines that this textfile is indeed the program file (as opposed to a module). All programs can only have a single program file, but as many modules as you like.

Next comes the handshake. Now this is a very handy (pun intended) subject. What it does is define the input the program can expect (or demand) as well as the output. In the above example we dont expect anything and we dont deliver anything, so both input and output is set to void.

The heart of this little program is within the execute block, where we use the method “writeln” to output the text “hello world”.

If you are wondering what the * character means inside the brackets, that section is called the data-view. In essence the * character means “what is known to the parent object”. In this case the parent is the program itself, which means that all global variables and objects should be regarded as known (or inherited into the scope).

Feedback loop

While hello-world is always fun, let’s do something a bit more complex. Let’s write a program that accepts X number of text arguments from the command-line, and then prints it back out again:

program ("feedback") {
  handshake {
    input {
      string arguments[];
    }
    output { void; }
  }

  criteria (*) {
    arguments <> null;
    arguments.length > 0;
  } execute {
    process (arguments,item) {
      writeln(item);
    }
  }

}

As you can see this example defined a typed handshake, where it expects to receive an string array as input. Since the program or module doesn’t produce anything, the output is again set to void (meaning “nothing”).

The criteria() section is probably the closest thing you will get to an IF statement in N++, what it does is validate X number of conditions, if they result in true the execute block is executed. You can also have an optional “fail” block to deal with scenarios where the criteria are not met.

The process() method is N++ variation of a FOR/NEXT loop. What it does is process the content of an object one-by-one and apply whatever code is inside it’s block on the data. As you can see from the parameters the arguments array from the input is the data we want to process, and as our second parameter we have a variable representing the current item.

Chainable criteria

One of the cool aspect of N++ is that you can chain these code blocks in various patterns. For instance you can chain process() and criteria() as such:

criteria {
  /* do not execute unless list has items */
  list.length>0;
} process (list,item) {
  /* do something to each item in the list */
} fail (e) {
  /* Criteria not met */
  writeln(e);
}

Or, apply criteria per item:


process (list,item) {
  writeln(item);
} criteria {
  /* Skip all items which does not begin with "jo" */
  item.firstname == "jo*";
}

Dynamic structures

While looping through data is fun, it’s not really something new. But being able to shape known data from various sources into new structures is extremely handy, especially for web services. Again it’s not a novelty – but here it’s a fundamental aspect of the language itself:

module ("data_export") {
  handshake {
    input  { void; }
    output { object[]; }
  }

  define dbConn as database("myDB@localhost:8090;user:admin;password:adminpassword");

  execute(dbconn) {
    collect ["users,"info"] from dbConn as data;
    set output = build(data,smAscending) {
      uid:      users.UID;
      username: users.userHash;
      password: users.passHash;
      fullName: info.fullName;
    }
  }
}

The above introduces¬†a couple of new concepts. First there is the “define” keyword, which defined a data-source. In this case we associate “dbconn” as a database which we can obtain information from. By default N++ regards all types of data as either a single object, an array of object or a dictionary of object. You may also use an array as a dictionary – at which point the array object is transformed into a dictionary.

The collect() method does what the name implies, namely to collect data from a data-source, in this case we grab all the records from two tables (users and info) and stuff those into a named container (data).

And finally a magic method, namely the “build” function. In this case we create a completely new structure which will be sorted ascending — and then we map fields from the data-source(s) into the new structure.

When emitting JSON what this module returns is:

[
{
"uid": "1239094",
"userName": "AF965BF1274CCE1",
"passWord": "C009A6487BC120E",
"fullName": "Jon-Lennart Aasenden"
},
{
"uid": "987351",
"userName": "BF982737CD00A92",
"passWord": "AF965BF1274CCE1",
"fullName": "John Calvin"
},
{
"uid": "670941",
"userName": "09F9CA1BF982737",
"passWord": "4CCE1986CDE00AF",
"fullName": "Dave Jones"
}
]

The general idea here ofcourse, is that threading and multi-processor programming should be equally simple. If you have 10 million records and try to join them into a new structure like above, chances are it will take forever before the method returns. In fact, it may even break your database. But for ad-hoc amounts of data, threading it makes sense:


  background-execute ("user-export",tpIdle, => (*) {
      /* Anonymous procedure callback handler */
      signal.send([*].process.owner,MSG_NOTIFICATION,"$DB-EXPORT-READY");
      } ) {
    collect ["users,"info"] from dbConn as data;
    set output = build(data,smAscending) {
      uid:      users.UID;
      username: users.userHash;
      password: users.passHash;
      fullName: info.fullName;
    }
  }
}

In short: when the data export is done and the data is delivered as the module output, a signal is sent to the owner process, of type “notification” with a string message saying that the export is complete.

Expanding the idea

The idea behind N++ is that a language designed exclusively to process data, working with stores (which in this case can be any linear flow of data-items, such as a database table, a folder with files, a text-list or any other vertical media) will be able to take shortcuts traditional languages cannot.

As of writing, N++ is barely out of the idea stage, with only the bare-bones parser and AST implemented, but already we are able to cut corners that are, in computing terms, quite costly.

Take the “IF” statement for instance. In any procedure involving more than 2 variables, you end up writing criteria testing at the beginning of the procedure:

function TMyService.DoSomething(a,b,c:Float):Float;
Begin
  if (a>1.0)
  and ( (b<1.0) and (b>0.0) )
  and (c>0.0) then
  begin
    result:=A * B - C;
  end else
  Begin
   //raise exception
  end;
end;

Which can be more cleanly expressed through n++ direct approach:

method doSomething(demand a,b,c) {
  criteria {
    a > 1.0;
    b < 1.0;
    b > 0.0;
    c > 0.0;
  } execute {
    output float(a * b - c);
  } fail (e) {
    /* express error */
  }
} breach {
  /* parameter contract not met */
}

As you can see from the above code, N++ have some new concepts such as contract based interfaces. The interface of a procedure defined the syntax and expected input (as it does in all languages). This must not be confused with COM interfaces, as interface in this context is refering to header (as in C’s .h file). Object pascal doesn’t need a separate header file like C, since the syntax provides an interface section at the top of a source-file.

N++ doesnt even need that, and it can even (much like javascript) allow calling methods and modules without correct parameters. This is called a breach-of-contract and can be caught much like any error.

Contract based programming is not just limited to data-mapping, but is also used in parallell programming and cluster programming where invocation of a method can be done across the domain. All invocations are actually sent as signals within the cluster and workload can be distributed across several computers (nodes).

While that is not something I have even begun sculpting, putting it into the foundation of the language is important.

Well, I’m off to bed — it’s been a long day.