Home > Delphi, N++, Object Pascal, Smart Mobile Studio > N++ parser written in Smart Pascal (JavaScript)

N++ parser written in Smart Pascal (JavaScript)

December 21, 2014 Leave a comment Go to comments

Since there seems to be doubts (oh ye of little faith) as to the power of Smart Pascal in the marketplace, I figured: what better way to introduce a new programming language – than by writing a completely new programming language (N++) itself in Smart Pascal 🙂 Meaning, that N++ be for nodeJS and the browser exclusively – and it’s written 100% in Smart Pascal.

I think that is some kind of record, and that N++ will probably be the first ever programming language written in JavaScript. Or Smart Pascal and compiled to JavaScript to be more precise.

Either way, let’s start with the beginning..

The source buffer

Everything starts with a source buffer. A good compiler is built from several parts, but in general 90% of all parsers/compilers have the following modules:

  • Buffer class
  • Tokenizer / Lexer
  • Parser
  • Code generator (“codegen” or “emitter”)

The buffer has a single job and that is to provide methods to chew through the source-code as fast as possible. But there is a rule, namely that accuracy and readability should never be compromised over speed. So in this “brief introduction” I have written a buffer class which implements only the basics. And it goes a little something like this:

 

type

  TQTXBuffer = Class(TObject)
  private
    FData:    String;
    FIndex:   Integer;
    FLineNr:  Integer;
  protected
    function  getCurrent:String;virtual;
  public

    Property  LineNr:Integer read FLineNr;

    Property  Current:String read getCurrent;
    Function  Back:Boolean;
    function  Next:Boolean;
    function  BOF:Boolean;
    function  EOF:Boolean;

    function  ReadTo(Const aChars:array of string;
              var text:String):Boolean;

    function  ReadWord(var Text:String):Boolean;

    function  ReadToEOL(var text:String):Boolean;

    function  PeekAhead(aCount:Integer;var Text:String):Boolean;
    function  Compare(aText:String):Boolean;

    procedure LoadFromString(Text:String);

    procedure Clear;
  End;

//###########################################################################
// TQTXBuffer
//###########################################################################

Procedure TQTXBuffer.Clear;
begin
  FData:='';
  FLineNr:=1;
  FIndex:=0;
end;

procedure TQTXBuffer.LoadFromString(Text:String);
Begin
  FLineNr:=1;
  FData:=trim(text);
  FData:=StrReplace(FData,#10,#13);
  FData:=StrReplace(FData,#13#13,#13);
  FIndex:=1;
  if length(FData)<1 then
  FIndex:=-1;
end;

function TQTXBuffer.getCurrent:String;
Begin
  result:=FData[FIndex];
end;

function TQTXBuffer.ReadWord(var Text:String):Boolean;
begin
  result:=False;
  Text:='';
  if not EOF then
  begin
    repeat
      if (current in ['A'..'Z','a'..'z','0'..'9']) then
      Text += current else
      break;
    until not next;
    result:=length(Text)>0;
  end;
end;

function TQTXBuffer.Compare(aText:String):Boolean;
var
  mData:  String;
Begin
  result:=PeekAhead(length(aText),mData)
  and SameText(lowercase(mData),lowercase(aText));
end;

function TQTXBuffer.PeekAhead(aCount:Integer;var Text:String):Boolean;
Begin
  if not EOF then
  Begin
    while aCount>0 do
    begin
      Text+=Current;
      if not Next then
      break;
    end;
    //text:=copy(FData,FIndex,aCount);
    //inc(FIndex,length(text));
    result:=length(text)>0;
  end;
end;

function TQTXBuffer.ReadToEOL(var text:String):Boolean;
Begin
  result:=ReadTo([#13,#10],text);
end;

function TQTXBuffer.ReadTo(Const aChars:Array of string;
         var text:String):Boolean;
var
  x:  Integer;
Begin
  result:=False;
  text:='';
  if aChars.Length>0 then
  begin
    for x:=FIndex to FData.length do
    Begin
      //FIndex:=x;
      //if FData[x] in aChars then
      if (Current in aChars) then
      Begin
        result:=true;
        break;
      end else
      text+=text[x];

      if not Next then
      break;
    end;
  end;
end;

Function TQTXBuffer.Back:Boolean;
begin
  result:=FIndex>1;
  if result then
  dec(FIndex);
end;

function TQTXBuffer.Next:Boolean;
begin
  Result:=FIndex<FData.Length;
  if result then
  Begin
    inc(FIndex);
    if (Current in [#13,#10]) then
    inc(FLineNr);
  end;
end;

function TQTXBuffer.BOF:Boolean;
begin
  result:=FIndex=1;
end;

function TQTXBuffer.EOF:Boolean;
begin
  result:=FIndex>=FData.Length;
end;

Using the buffer class

The buffer class allows you to move horizontally through a source file, meaning that whatever file you load into the buffer, is regarded as a long string. No matter what the formatting may be, that’s the reality of writing a parser.

Here is a small example that can help you get an understanding about how the buffer works:

procedure TForm1.W3Button1Click(Sender: TObject);
var
  mBuffer:  TQTXBuffer;
begin
  mBuffer:=TQTXBuffer.Create;
  try
    // Set source into buffer
    mBuffer.loadFromString(
      #"program(test) {
        criteria (*) {
          test > 0;
        }
      }");

      // traverse through the buffer char by char
      repeat
        writeln('-->' + mBuffer.Current );
        mBuffer.Next;
      until mBuffer.EOF;
  finally
    mBuffer.free;
  end;
end;

Of-course, that just baby-steps when it comes to parsing. You sort of have to build the language rules into the parser class (not the buffer class). For instance, N++ expects the first word in a program to be “program”, followed by a name enclosed in “(” and “)” brackets, followed by “{” and “}” structural segments.

Here is a simple N++ program:

program("hello world") {
  handshake {
    input  { void; }
    output { void; }
  }

  execute(*) {
    writeln("hello world");
  }
}

Parsing this is very, very simple – as is creating the abstract symbol tree. N++ will be a great automation language, one which you can easily place on top of other technology. Take animations for instance.. or tweening. Wouldnt it be nice to have a language you could write effects in? One which is easies than the mess which is javascript?

Well by implementing a language module in JavaScript for N++, you can use N++ to control animation, effects, tweening or whatever you fancy. My personal favorite is databases and data management for nodeJS, but that’s me 🙂

The parser

In my next post we will look at the parser class and also add a lexer, which makes it “sane” to parse large structures and programs.

About N++ what is it?

N++ is a language designed to deal with big data, and I mean “BIG” data, terrabytes of records.

It’s a RISC type language, meaning that it has a reduced instruction set, and it’s designed to get the most amount of work done with the least amount of typing.

The benefits of N++ is:

  • Data sculpting (creating new structured by joining old structures)
  • IO is based on mapping
  • Easy to use, easy to learn, easy to adapt to underlying processes
  • Runs off nodeJS, designed for nodeJS and is written in JavaScript

What is mapping?

In short, mapping allows you to pre-define the IO channels that your N++ program should use. So instead of creating classes for streams, pipe’s and whatever — N++ simplifies this through a collection of mappings, called a “handshake”. For instance, if you plan on writing the “hello world” example above, you need to include stdout in your handshake under the output section, like this:

handshake {
  output {
    stdout => system.io.stdout;
  }
}

A mapping is a shortcut. Instead of having to write system.io.stdout.writeln() every single time, we create an alias called “stdout” locally (read: visible to our code) that we can use instead.

The handshake also serves as a means for the compiler to know precisely what your code uses, and what channels should be reserved.

The input handshake is the same, but with a reversed arrowhead (=> means “data into right”, and the arrowhead <= means "data from right into left".

Other differences

Quite a few! For instance the IF statement is very different, it's called "criteria" and looks like this:

program("test") {
  handshake {
    output {
      stdout => system.io.stdout;
    }
    input {
      params <= application.params;
    }
  }

  execute (*) {

    /* Check if the command-line param "test" and "beta" are true
    criteria (*) {
      input["test"] == true;
      input["beta"] == true;
    } execute {
      stdout.writeln("test and beta params were passed!");
    } fail (e) {
      stdout.write("Something was wrong!:");
      stdout.write(e);
      stdout.write("\n");
    }
  }
}

In the above, the code inside the criteria { } section must evaluate to TRUE in order for the appended EXECUTE section to actually execute. Should the criteria fail then the "fail" section is executed instead.

Oh and the for/next stuff is gone alone:

program("test") {
  handshake {
    output {
      stdout => system.io.stdout;
    }
    input {
      params <= application.params;
    }
  }

  execute (*) {

  var string[] test = new string[10];
  var int x = 0;

  process(test, mItem) {
    mitem = format("this is string #{0}",x);
    x++;
  }

  }
}

The keyword "process" will process anything which has depth, from bottom to top (lower to higher). So it takes the role for both for/next do/while and repeat/until.

Anyways — loads of fun stuff if you like playing with programming languages.
I'll post the full code for N++ when i'm done.

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: