Archive
N++ context parser, part 2
Right. In my last article on parsing I introduced a basic buffer class and skeleton parser. In this article I introduce a working 2 level parser, meaning recursive, which implements both the program() and execute() code blocks. Being recursive means that you can now nest several execute() sections inside each other. But at this stage that’s pretty much all I’ve had time to implement — so consider it a rough sketch.
Support
At the moment we have a program parser, which naturally parses a program block which looks like this:
program("name") { }
Registered for the program parser, is a single sub-parser for the “execute” keyword, this is the class TQTXExecuteParser class. This allows you to have execute() sections within a program block, as such:
program("name") { execute (scope) { /* Code goes heres */ } }
To avoid recursive stack errors (stack overflow), the execute class does a trick. Namely that it registers support for itself (execute keyword) at parse-time, meaning that an execute block will support execute child blocks, like this:
program("name") { execute (scope) { /* Sub execute block */ execute (scope) { /* your code here */ } } }
Well, basically this is the base system in place. Adding new keywords and features is now a matter of creating sub parser classes and registering them to either the execute class or the program parser class. The benefit of such a simple system is that it’s fast, easy to understand and use – and easy to maintain. Concrete rules for parsing can be defined, and a lot of code re-cycled between the base-class (TQTXCustomParser) and it’s decendants.
With a bit of work you will have a fine-tuned, fast and efficient language parser.
Supporting other languages
As you probably guessed it’s not that hard to add support for other languages. I mean once you take the time to implement the basic form of a language unit or program-file. But it’s always a time consuming task, where even the tiniest detail matters. If you miss a single language rule (like say, not skippig a comma after it’s been consumed) the error will spread into the next parser and so on – until an exception occurs.
But, once you get the hang of chewing through a source-buffer, supporting enough parsers and keywords (and for Delphi also states, like the “Inerface” keyword state and the “implementation” keyword state) you can pretty much mix and match all you want.
Well, without going to far into the code — here is the updated source. You may want to drop memo control on your form and add the following code:
procedure TForm1.W3Button1Click(Sender: TObject); var mContext: TQTXContext; mParser: TQTXLPPParser; begin mContext:=TQTXContext.Create; try mParser:=TQTXLPPParser.Create(mContext); try mContext.Buffer.LoadFromString(w3memo1.text); try mParser.Parse; except on e: exception do writeln(e.message); end; finally mParser.free; end; finally mContext.free; end; end;
And set the content of the memo to:
procedure TForm1.InitializeForm; begin inherited; w3memo1.Text:=#'program("test") { execute (*) { /* Code here */ } }'; end;
Ok — here is the updated parser code:
unit qtxparser; interface uses SmartCL.System; const TOK_ROUNDOPEN = 1; // "(" TOK_ROUNDCLOSE = 2; // ")" TOK_CUROPEN = 3; // "{" TOK_CURCLOSE = 4; // "}" TOK_SPACE = 5; // " " TOK_ADD = 6; // "+" TOK_SUBTRACT = 7; // "-" TOK_DIVIDE = 8; // "/" TOK_POWER = 9; // "^" TOK_COMMA =10; // "," TOK_COLON =11; // ";" Numeric_Operators: Array [0..3] of Integer = (TOK_ADD,TOK_SUBTRACT,TOK_DIVIDE,TOK_POWER); type ENPPException = Class(EW3Exception); TLPPApplicationObject = Class; TQTXBuffer = Class(TObject) private FData: String; FIndex: Integer; FLineNr: Integer; protected function getCurrent:String;virtual; public Property LineNr:Integer read FLineNr; Property Current:String read getCurrent; Function Back:Boolean; function Next:Boolean; function BOF:Boolean; function EOF:Boolean; procedure SkipJunk; function ReadTo(Const aChars:array of string; var text:String):Boolean;overload; function ReadTo(aValue:String):Boolean;overload; function ReadTo(aValue:String;var inner:String):Boolean;overload; function ReadWord(var Text:String):Boolean; function ReadToEOL(var text:String):Boolean; function PeekAhead(aCount:Integer;var Text:String):Boolean; function Compare(aText:String):Boolean; Property TextFromCurrent:String read ( Copy(FData,FIndex,length(FData)-FIndex) ); procedure First; procedure LoadFromString(Text:String); procedure Clear; End; TQTXContext = Class(TObject) Private FBuffer: TQTXBuffer; public Property Buffer:TQTXBuffer read FBuffer; Property ApplicationObject:TLPPApplicationObject; Constructor Create;virtual; Destructor Destroy;Override; end; TQTXCustomParser = Class; TQTXCustomParserClass = Class of TQTXCustomParser; TQTXParserInfo = Class(TObject) public property Instance: TQTXCustomParser; Property Keyword:String; end; TQTXCustomParser = Class(TObject) private FContext: TQTXContext; FParsers: Array of TQTXParserInfo; protected function getParserIndexFor(keyword:String; var aIndex:Integer):Boolean; function getParserInstanceFor(keyword:String; var aParser:TQTXCustomParser):Boolean; procedure RegisterParser(keyword:String; const aParser:TQTXCustomParser); protected procedure ParseAsExecuteBlock;virtual; protected Procedure ClearParserInstances;virtual; public Property Context:TQTXContext read FContext; procedure Parse;virtual; Constructor Create(aContext:TQTXContext);virtual; Destructor Destroy;Override; end; (* Parser for the keyword "program" *) TQTXProgramParser = Class(TQTXCustomParser) protected procedure ParseProgramName; procedure Parse;override; public Constructor Create(aContext:TQTXContext);override; end; TQTXExecuteParser = class(TQTXCustomParser) protected procedure ParseScope; procedure Parse;override; public Constructor Create(aContext:TQTXContext);override; end; TQTXLPPParser = Class(TQTXCustomParser) protected procedure Parse;override; public Constructor Create(aContext:TQTXContext);override; end; //########################################################################## // In-memory program model classes //########################################################################## TLPPObject = Class(Tobject) end; TLPPModule = Class(TLPPObject) public Property Name:String; end; TLPPModuleList = Array of TLPPModule; TLPPApplicationObject = Class(TLPPObject) private FModules: TLPPModuleList; public Property ApplicationName:String; Property Modules:TLPPModuleList read FModules; function IndexOfModule(aModuleName:String):Integer; Procedure Clear; Constructor Create;virtual; Destructor Destroy;Override; end; implementation var TOK_SYM: Array[TOK_ROUNDOPEN..TOK_COLON] of string = ('(',')','{','}',' ','+','-','/','^',',',';'); //########################################################################### // TLPPApplicationObject //########################################################################### Constructor TLPPApplicationObject.Create; begin inherited Create; end; Destructor TLPPApplicationObject.Destroy; Begin clear; inherited Destroy; end; function TLPPApplicationObject.IndexOfModule(aModuleName:String):Integer; var x: Integer; Begin result:=-1; aModulename:=trim(lowercase(aModuleName)); if length(aModuleName)>0 then begin for x:=0 to FModules.Count-1 do Begin if lowercase(FModules[x].Name)=aModuleName then begin result:=x; break; end; end; end; end; Procedure TLPPApplicationObject.Clear; begin ApplicationName:=''; while FModules.Count>0 do begin FModules[0].free; FModules[0]:=NIL; FModules.Delete(0,1); end; end; //########################################################################### // TQTXLPPParser //########################################################################### Constructor TQTXLPPParser.Create(aContext:TQTXContext); begin inherited Create(aContext); (* Register the "program" keyword-parser *) RegisterParser('program',TQTXProgramParser.Create(context)); end; procedure TQTXLPPParser.Parse; var mCache: String; mSubParser: TQTXCustomParser; Begin repeat case Context.Buffer.Current of ' ': Begin self.Context.Buffer.SkipJunk; end; '/': Begin if context.buffer.Compare('//') then begin self.context.buffer.SkipJunk; end; end; '(': Begin if Context.Buffer.Compare('(*') then Context.buffer.SkipJunk else Begin if getParserInstanceFor(mCache,mSubParser) then begin (* Invoke sub parser *) mSubParser.Parse; (* SKip trailing junk if any *) Context.Buffer.SkipJunk; (* clear cache *) mCache:=''; end else raise ENPPException.CreateFmt ('Syntax error [%s]',[mCache]); end; end; else Begin mCache += Context.Buffer.Current; end; end; if not Context.buffer.next then break; until Context.buffer.EOF; end; //########################################################################### // TQTXExecuteParser //########################################################################### Constructor TQTXExecuteParser.Create(aContext:TQTXContext); Begin inherited Create(AContext); end; procedure TQTXExecuteParser.Parse; var mDummy: Integer; Begin if not self.getParserIndexFor('execute',mDummy) then Begin self.RegisterParser('execute',TQTXExecuteParser.Create(Context)); end; (* Expects: Scope param list (), with a bare minumum of (STAR) *) //writeln(context.buffer.TextFromCurrent); writeln('Enters execute parser'); context.buffer.SkipJunk; (* Check for "(" *) if Context.Buffer.Current=TOK_SYM[TOK_ROUNDOPEN] then Begin // Skip the "(" char Context.Buffer.Next; ParseScope; end; if context.buffer.ReadTo('{') then Begin Context.buffer.next; (* Parse content as an execute block *) self.ParseAsExecuteBlock; end else Raise Exception.Create('Syntax error, expected code-block entry error'); writeln('leaves execute parser'); end; procedure TQTXExecuteParser.ParseScope; var mScope: String; Begin (* Keep reading until we find ")", or break on others *) if context.buffer.ReadTo([')',';','{','}'],mScope) then Begin (* Valid stop? or one of the breakers? *) if context.buffer.Current=')' then Begin writeln('Defined scope for this execute block is:' + mScope); //skip ")" char context.buffer.next; end else Raise Exception.Create('Syntax error, expected scope terminator error'); end; end; //########################################################################### // TQTXProgramParser //########################################################################### Constructor TQTXProgramParser.Create(aContext:TQTXContext); begin inherited Create(aContext); RegisterParser('execute',TQTXExecuteParser.Create(context)); end; procedure TQTXProgramParser.ParseProgramName; var mTemp: String; Begin (* Expects: ("NAME") As in Name of program *) mtemp:=''; if Context.Buffer.Current='(' then begin Context.buffer.next; if Context.buffer.current='"' then begin Context.Buffer.next; if Context.buffer.readTo('"',mTemp) then begin Context.ApplicationObject.ApplicationName:=mTemp; Context.Buffer.next; Context.Buffer.SkipJunk; if Context.buffer.Current=TOK_SYM[TOK_ROUNDCLOSE] then Begin Context.Buffer.next; end else Raise Exception.Create('Syntax error, expected ")" param error'); end else Raise Exception.Create('Syntax error, expected string termination error'); //showmessage(mName); end else Raise Exception.Create('Syntax error, expected string identifier'); end else Raise Exception.Create('Syntax error, expected parameter'); end; procedure TQTXProgramParser.Parse; Begin writeln(ClassName); context.buffer.SkipJunk; (* Check for "(" *) if Context.Buffer.Current=TOK_SYM[TOK_ROUNDOPEN] then Begin ParseProgramName; writeln('Program name:' + Context.ApplicationObject.ApplicationName); context.Buffer.SkipJunk; if context.Buffer.Current=TOK_SYM[TOK_CUROPEN] then Begin context.buffer.next; repeat ParseAsExecuteBlock; Context.buffer.SkipJunk; if context.buffer.Current=TOK_SYM[TOK_CURCLOSE] then break; until Context.Buffer.EOF; end else raise exception.create('Syntax error, expected program-block entry'); end; writeln("Exiting TQTXProgramParer"); end; //########################################################################### // TQTXCustomParser //########################################################################### Constructor TQTXCustomParser.Create(aContext:TQTXContext); Begin inherited Create; FContext:=aContext; end; Procedure TQTXCustomParser.ClearParserInstances; Begin (* Release sub-parser instances if any *) while FParsers.Count>0 do Begin if FParsers[0].Instance<>NIL then Begin FParsers[0].Instance.free; FParsers[0].Instance:=NIL; end; FParsers[0].free; FParsers.Delete(0,1); end; end; Destructor TQTXCustomParser.destroy; begin ClearParserInstances; inherited; end; function TQTXCustomParser.getParserIndexFor(keyword:String; var aIndex:Integer):Boolean; var x: Integer; begin result:=False; aIndex:=-1; keyword:=lowercase(trim(keyword)); if keyword.length>0 then Begin //if FParsers.Count>0 then //begin for x:=0 to FParsers.count-1 do Begin if sameText(FParsers[x].Keyword,keyword) then begin result:=true; aIndex:=x; break; end; end; //end; end; end; function TQTXCustomParser.getParserInstanceFor(keyword:String; var aParser:TQTXCustomParser):Boolean; var x: Integer; begin result:=False; aParser:=NIL; keyword:=lowercase(trim(keyword)); if keyword.length>0 then Begin //if FParsers.Count>0 then //begin for x:=0 to FParsers.count-1 do Begin if sameText(FParsers[x].Keyword,keyword) then begin result:=true; aParser:=FParsers[x].Instance; break; end; end; //end; end; end; procedure TQTXCustomParser.RegisterParser(keyword:String; const aParser:TQTXCustomParser); var mIndex: Integer; mInfo: TQTXParserInfo; Begin keyword:=lowercase(trim(keyword)); if keyword.length>0 then Begin if aParser<>NIL then begin if not getParserIndexFor(keyword,mIndex) then Begin (* register sub parser *) mInfo:=TQTXParserInfo.Create; mInfo.Instance:=aParser; mInfo.Keyword:=keyword; FParsers.Add(mInfo); end else raise ENPPException.createFmt ('Parser for keyword [%s] already registered',[keyword]); end else raise ENPPException.CreateFmt ('Parser class for keyword [%s] was NIL error',[keyword]); end; end; procedure TQTXCustomParser.Parse; begin raise ENPPException.CreateFmt ('No parser implemented for class %s',[classname]); end; (* About: This method continues the parsing process as if it's parsing from within an execute() code block. This means that all the rules of normal 'Execute( * ) " applies. After Program() this is the default parsing mode, which makes it the fallback of more or less all parse-objects which allows ordinary code (e.g "case/switch" sections and more) *) procedure TQTXCustomParser.ParseAsExecuteBlock; var mCache: String; mTemp: String; mParser: TQTXCustomParser; Begin while not Context.Buffer.EOF do Begin case Context.Buffer.Current of ' ', #9: begin //Context.buffer.SkipJunk; end; '(': If context.Buffer.Compare('(*') and not Context.buffer.compare('(*)') then Begin Context.buffer.next; context.buffer.next; Context.Buffer.ReadTo('*)'); Context.Buffer.Next; Context.Buffer.Next; end else Begin if mCache.length>0 then Begin writeln('Command is: "' + mCache + '"'); if getParserInstanceFor(lowercase(mCache),mParser) then Begin writeln('Parser ' + mParser.classname + ' is registered for keyword ' + mCache); try mParser.Parse; except on e: exception do raise Exception.Create(e.message); end; end else Raise Exception.create('Syntax error, unknown token:' + mCache); mCache :=''; exit; end else Raise Exception.Create('Syntax error, unsuspected character'); end; '/': If Context.buffer.compare('//') then begin Context.buffer.ReadToEOL(mTemp); end else if Context.buffer.compare('/*') then Begin Context.buffer.next; Context.buffer.Next; Context.buffer.ReadTo('*/'); end; '{': Begin if mCache.length>0 then Begin if getParserInstanceFor(lowercase(mCache),mParser) then Begin showmessage('over here!'); end else Raise Exception.create('Syntax error, unknown token:' + mCache); end else Raise Exception.Create('Syntax error, unexpected character'); end; '}': Begin end; else Begin mCache += Context.buffer.Current; end; end; if not context.buffer.next then break; end; writeln('Exiting CodeBlock parser at:' + mCache); end; //########################################################################### // TQTXContext //########################################################################### Constructor TQTXContext.Create; begin inherited Create; FBuffer:=TQTXBuffer.create; ApplicationObject:=TLPPApplicationObject.Create; end; Destructor TQTXContext.destroy; Begin FBuffer.free; ApplicationObject.free; inherited; end; //########################################################################### // TQTXBuffer //########################################################################### procedure TQTXBuffer.First; Begin If FData.length>0 then begin FLineNr:=1; FIndex:=0; end; end; procedure TQTXBuffer.SkipJunk; var mTemp: String; begin repeat case Current of ' ', #9 : Begin (* We treat TAB as space *) //if not next then //break; end; '/': Begin (* Skip C style remark *) if Compare('/*') then begin if readTo('*/') then Begin Next; Next; end else raise ENPPException.Create('Comment not closed error'); end else Begin (* Skip Pascal style remark *) if Compare('//') then begin if ReadToEOL(mTemp) then next else raise ENPPException.Create('Expected end of line error'); end; end; end; '(': Begin (* Skip pascal style remark *) if compare('(*') and not compare('(*)') then Begin if readTo('*)') then begin Next; Next; end else raise ENPPException.Create('Comment not closed error'); end else break; end; #13: begin // end; else begin Break; end; end; next; until EOF; end; Procedure TQTXBuffer.Clear; begin FData:=''; FLineNr:=1; FIndex:=0; end; procedure TQTXBuffer.LoadFromString(Text:String); Begin FLineNr:=1; FData:=trim(text); FData:=StrReplace(FData,#10,#13); FData:=StrReplace(FData,#13#13,#13); FIndex:=1; if length(FData)<1 then FIndex:=-1; end; function TQTXBuffer.getCurrent:String; Begin result:=FData[FIndex]; end; function TQTXBuffer.ReadWord(var Text:String):Boolean; begin result:=False; Text:=''; if not EOF then begin repeat if (current in ['A'..'Z','a'..'z','0'..'9']) then Text += current else break; until not next; result:=length(Text)>0; end; end; function TQTXBuffer.Compare(aText:String):Boolean; var mData: String; Begin result:=PeekAhead(length(aText),mData) and SameText(lowercase(mData),lowercase(aText)); end; function TQTXBuffer.PeekAhead(aCount:Integer;var Text:String):Boolean; var mPos: Integer; Begin result:=False; Text:=''; if not EOF then Begin mPos:=FIndex; try while aCount>0 do begin Text+=Current; if not Next then break; dec(aCount); end; result:=length(text)>0; finally FIndex:=mPos; end; end; end; function TQTXBuffer.ReadToEOL(var text:String):Boolean; Begin result:=ReadTo([#13,#10],text); end; function TQTXBuffer.ReadTo(aValue:String;var inner:String):Boolean; var mText: String; begin inner:=''; result:=False; aValue:=lowercase(aValue); if aValue.length>0 then begin repeat if PeekAhead(aValue.length,mText) then begin mText:=lowercase(mText); result:=SameText(mText,aValue); if result then break else inner += Current; end else inner += Current; Next; until EOF; end; end; function TQTXBuffer.readTo(aValue:String):Boolean; var mText: String; begin result:=False; aValue:=lowercase(aValue); if aValue.length>0 then begin repeat if PeekAhead(aValue.length,mText) then begin mText:=lowercase(mText); result:=SameText(mText,aValue); if result then break; end; if not Next then break; until EOF; (* repeat if PeekAhead(aValue.length,mText) and SameText(lowercase(mtext),aValue) then Begin result:=true; break; end else if not next then break; until EOF; *) end else Raise ENPPException.Create ('ReadTo() failed, invalid target value error'); end; function TQTXBuffer.ReadTo(Const aChars:Array of string; var text:String):Boolean; var x: Integer; Begin result:=False; text:=''; if aChars.Length>0 then begin for x:=FIndex to FData.length do Begin if (Current in aChars) then Begin result:=true; break; end else text+=Current; if not Next then break; end; end; end; Function TQTXBuffer.Back:Boolean; begin result:=FIndex>1; if result then dec(FIndex); end; function TQTXBuffer.Next:Boolean; begin Result:=FIndex<fdata.length; if="" result="" then="" begin="" inc(findex);="" (current="" in="" [#13,#10])="" inc(flinenr);="" end;="" function="" tqtxbuffer.bof:boolean;="" result:="FIndex=1;" tqtxbuffer.eof:boolean;="">=FData.Length; end; end.
You must be logged in to post a comment.