Archive
N++ parser written in Smart Pascal (JavaScript)
Since there seems to be doubts (oh ye of little faith) as to the power of Smart Pascal in the marketplace, I figured: what better way to introduce a new programming language – than by writing a completely new programming language (N++) itself in Smart Pascal 🙂 Meaning, that N++ be for nodeJS and the browser exclusively – and it’s written 100% in Smart Pascal.
I think that is some kind of record, and that N++ will probably be the first ever programming language written in JavaScript. Or Smart Pascal and compiled to JavaScript to be more precise.
Either way, let’s start with the beginning..
The source buffer
Everything starts with a source buffer. A good compiler is built from several parts, but in general 90% of all parsers/compilers have the following modules:
- Buffer class
- Tokenizer / Lexer
- Parser
- Code generator (“codegen” or “emitter”)
The buffer has a single job and that is to provide methods to chew through the source-code as fast as possible. But there is a rule, namely that accuracy and readability should never be compromised over speed. So in this “brief introduction” I have written a buffer class which implements only the basics. And it goes a little something like this:
type TQTXBuffer = Class(TObject) private FData: String; FIndex: Integer; FLineNr: Integer; protected function getCurrent:String;virtual; public Property LineNr:Integer read FLineNr; Property Current:String read getCurrent; Function Back:Boolean; function Next:Boolean; function BOF:Boolean; function EOF:Boolean; function ReadTo(Const aChars:array of string; var text:String):Boolean; function ReadWord(var Text:String):Boolean; function ReadToEOL(var text:String):Boolean; function PeekAhead(aCount:Integer;var Text:String):Boolean; function Compare(aText:String):Boolean; procedure LoadFromString(Text:String); procedure Clear; End; //########################################################################### // TQTXBuffer //########################################################################### Procedure TQTXBuffer.Clear; begin FData:=''; FLineNr:=1; FIndex:=0; end; procedure TQTXBuffer.LoadFromString(Text:String); Begin FLineNr:=1; FData:=trim(text); FData:=StrReplace(FData,#10,#13); FData:=StrReplace(FData,#13#13,#13); FIndex:=1; if length(FData)<1 then FIndex:=-1; end; function TQTXBuffer.getCurrent:String; Begin result:=FData[FIndex]; end; function TQTXBuffer.ReadWord(var Text:String):Boolean; begin result:=False; Text:=''; if not EOF then begin repeat if (current in ['A'..'Z','a'..'z','0'..'9']) then Text += current else break; until not next; result:=length(Text)>0; end; end; function TQTXBuffer.Compare(aText:String):Boolean; var mData: String; Begin result:=PeekAhead(length(aText),mData) and SameText(lowercase(mData),lowercase(aText)); end; function TQTXBuffer.PeekAhead(aCount:Integer;var Text:String):Boolean; Begin if not EOF then Begin while aCount>0 do begin Text+=Current; if not Next then break; end; //text:=copy(FData,FIndex,aCount); //inc(FIndex,length(text)); result:=length(text)>0; end; end; function TQTXBuffer.ReadToEOL(var text:String):Boolean; Begin result:=ReadTo([#13,#10],text); end; function TQTXBuffer.ReadTo(Const aChars:Array of string; var text:String):Boolean; var x: Integer; Begin result:=False; text:=''; if aChars.Length>0 then begin for x:=FIndex to FData.length do Begin //FIndex:=x; //if FData[x] in aChars then if (Current in aChars) then Begin result:=true; break; end else text+=text[x]; if not Next then break; end; end; end; Function TQTXBuffer.Back:Boolean; begin result:=FIndex>1; if result then dec(FIndex); end; function TQTXBuffer.Next:Boolean; begin Result:=FIndex<FData.Length; if result then Begin inc(FIndex); if (Current in [#13,#10]) then inc(FLineNr); end; end; function TQTXBuffer.BOF:Boolean; begin result:=FIndex=1; end; function TQTXBuffer.EOF:Boolean; begin result:=FIndex>=FData.Length; end;
Using the buffer class
The buffer class allows you to move horizontally through a source file, meaning that whatever file you load into the buffer, is regarded as a long string. No matter what the formatting may be, that’s the reality of writing a parser.
Here is a small example that can help you get an understanding about how the buffer works:
procedure TForm1.W3Button1Click(Sender: TObject); var mBuffer: TQTXBuffer; begin mBuffer:=TQTXBuffer.Create; try // Set source into buffer mBuffer.loadFromString( #"program(test) { criteria (*) { test > 0; } }"); // traverse through the buffer char by char repeat writeln('-->' + mBuffer.Current ); mBuffer.Next; until mBuffer.EOF; finally mBuffer.free; end; end;
Of-course, that just baby-steps when it comes to parsing. You sort of have to build the language rules into the parser class (not the buffer class). For instance, N++ expects the first word in a program to be “program”, followed by a name enclosed in “(” and “)” brackets, followed by “{” and “}” structural segments.
Here is a simple N++ program:
program("hello world") { handshake { input { void; } output { void; } } execute(*) { writeln("hello world"); } }
Parsing this is very, very simple – as is creating the abstract symbol tree. N++ will be a great automation language, one which you can easily place on top of other technology. Take animations for instance.. or tweening. Wouldnt it be nice to have a language you could write effects in? One which is easies than the mess which is javascript?
Well by implementing a language module in JavaScript for N++, you can use N++ to control animation, effects, tweening or whatever you fancy. My personal favorite is databases and data management for nodeJS, but that’s me 🙂
The parser
In my next post we will look at the parser class and also add a lexer, which makes it “sane” to parse large structures and programs.
About N++ what is it?
N++ is a language designed to deal with big data, and I mean “BIG” data, terrabytes of records.
It’s a RISC type language, meaning that it has a reduced instruction set, and it’s designed to get the most amount of work done with the least amount of typing.
The benefits of N++ is:
- Data sculpting (creating new structured by joining old structures)
- IO is based on mapping
- Easy to use, easy to learn, easy to adapt to underlying processes
- Runs off nodeJS, designed for nodeJS and is written in JavaScript
What is mapping?
In short, mapping allows you to pre-define the IO channels that your N++ program should use. So instead of creating classes for streams, pipe’s and whatever — N++ simplifies this through a collection of mappings, called a “handshake”. For instance, if you plan on writing the “hello world” example above, you need to include stdout in your handshake under the output section, like this:
handshake { output { stdout => system.io.stdout; } }
A mapping is a shortcut. Instead of having to write system.io.stdout.writeln() every single time, we create an alias called “stdout” locally (read: visible to our code) that we can use instead.
The handshake also serves as a means for the compiler to know precisely what your code uses, and what channels should be reserved.
The input handshake is the same, but with a reversed arrowhead (=> means “data into right”, and the arrowhead <= means "data from right into left".
Other differences
Quite a few! For instance the IF statement is very different, it's called "criteria" and looks like this:
program("test") { handshake { output { stdout => system.io.stdout; } input { params <= application.params; } } execute (*) { /* Check if the command-line param "test" and "beta" are true criteria (*) { input["test"] == true; input["beta"] == true; } execute { stdout.writeln("test and beta params were passed!"); } fail (e) { stdout.write("Something was wrong!:"); stdout.write(e); stdout.write("\n"); } } }
In the above, the code inside the criteria { } section must evaluate to TRUE in order for the appended EXECUTE section to actually execute. Should the criteria fail then the "fail" section is executed instead.
Oh and the for/next stuff is gone alone:
program("test") { handshake { output { stdout => system.io.stdout; } input { params <= application.params; } } execute (*) { var string[] test = new string[10]; var int x = 0; process(test, mItem) { mitem = format("this is string #{0}",x); x++; } } }
The keyword "process" will process anything which has depth, from bottom to top (lower to higher). So it takes the role for both for/next do/while and repeat/until.
Anyways — loads of fun stuff if you like playing with programming languages.
I'll post the full code for N++ when i'm done.
Recent
The vatican vault
- January 2022
- October 2021
- March 2021
- November 2020
- September 2020
- July 2020
- June 2020
- April 2020
- March 2020
- February 2020
- January 2020
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- February 2013
- August 2012
- June 2012
- May 2012
- April 2012