Home > N++, Object Pascal, OP4JS, Smart Mobile Studio > N++ context parser, part 2

N++ context parser, part 2

December 29, 2014 Leave a comment Go to comments
nodeJS rocks!

nodeJS rocks!

Right. In my last article on parsing I introduced a basic buffer class and skeleton parser. In this article I introduce a working 2 level parser, meaning recursive, which implements both the program() and execute() code blocks. Being recursive means that you can now nest several execute() sections inside each other. But at this stage that’s pretty much all I’ve had time to implement — so consider it a rough sketch.

Support

At the moment we have a program parser, which naturally parses a program block which looks like this:

program("name") {
}

Registered for the program parser, is a single sub-parser for the “execute” keyword, this is the class TQTXExecuteParser class. This allows you to have execute() sections within a program block, as such:

program("name") {
  execute (scope) {
    /* Code goes heres */
  }
}

To avoid recursive stack errors (stack overflow), the execute class does a trick. Namely that it registers support for itself (execute keyword) at parse-time, meaning that an execute block will support execute child blocks, like this:

program("name") {
  execute (scope) {
    /* Sub execute block */
    execute (scope) {
      /* your code here */
    }
  }
}

Well, basically this is the base system in place. Adding new keywords and features is now a matter of creating sub parser classes and registering them to either the execute class or the program parser class. The benefit of such a simple system is that it’s fast, easy to understand and use – and easy to maintain. Concrete rules for parsing can be defined, and a lot of code re-cycled between the base-class (TQTXCustomParser) and it’s decendants.
With a bit of work you will have a fine-tuned, fast and efficient language parser.

Supporting other languages

As you probably guessed it’s not that hard to add support for other languages. I mean once you take the time to implement the basic form of a language unit or program-file. But it’s always a time consuming task, where even the tiniest detail matters. If you miss a single language rule (like say, not skippig a comma after it’s been consumed) the error will spread into the next parser and so on – until an exception occurs.

But, once you get the hang of chewing through a source-buffer, supporting enough parsers and keywords (and for Delphi also states, like the “Inerface” keyword state and the “implementation” keyword state) you can pretty much mix and match all you want.

Well, without going to far into the code — here is the updated source. You may want to drop memo control on your form and add the following code:

procedure TForm1.W3Button1Click(Sender: TObject);
var
  mContext: TQTXContext;
  mParser:  TQTXLPPParser;
begin
  mContext:=TQTXContext.Create;
  try
    mParser:=TQTXLPPParser.Create(mContext);
    try
      mContext.Buffer.LoadFromString(w3memo1.text);

      try
        mParser.Parse;
      except
        on e: exception do
        writeln(e.message);
      end;

    finally
      mParser.free;
    end;
  finally
    mContext.free;
  end;
end;

And set the content of the memo to:

procedure TForm1.InitializeForm;
begin
  inherited;
  w3memo1.Text:=#'program("test") {
          execute (*) {
            /* Code here */
          }
        }';
end;

Ok — here is the updated parser code:

 

unit qtxparser;

interface

uses
  SmartCL.System;


const
  TOK_ROUNDOPEN   = 1;  //  "("
  TOK_ROUNDCLOSE  = 2;  //  ")"
  TOK_CUROPEN     = 3;  //  "{"
  TOK_CURCLOSE    = 4;  //  "}"
  TOK_SPACE       = 5;  //  " "

  TOK_ADD         = 6;  //  "+"
  TOK_SUBTRACT    = 7;  //  "-"
  TOK_DIVIDE      = 8;  //  "/"
  TOK_POWER       = 9;  //  "^"

  TOK_COMMA       =10;  //  ","
  TOK_COLON       =11;  //  ";"

  Numeric_Operators: Array [0..3] of Integer =
    (TOK_ADD,TOK_SUBTRACT,TOK_DIVIDE,TOK_POWER);

type

  ENPPException = Class(EW3Exception);

  TLPPApplicationObject = Class;

  TQTXBuffer = Class(TObject)
  private
    FData:    String;
    FIndex:   Integer;
    FLineNr:  Integer;
  protected
    function  getCurrent:String;virtual;
  public

    Property  LineNr:Integer read FLineNr;

    Property  Current:String read getCurrent;
    Function  Back:Boolean;
    function  Next:Boolean;
    function  BOF:Boolean;
    function  EOF:Boolean;

    procedure SkipJunk;

    function  ReadTo(Const aChars:array of string;
              var text:String):Boolean;overload;
    function  ReadTo(aValue:String):Boolean;overload;
    function  ReadTo(aValue:String;var inner:String):Boolean;overload;

    function  ReadWord(var Text:String):Boolean;

    function  ReadToEOL(var text:String):Boolean;

    function  PeekAhead(aCount:Integer;var Text:String):Boolean;
    function  Compare(aText:String):Boolean;

    Property  TextFromCurrent:String read ( Copy(FData,FIndex,length(FData)-FIndex) );

    procedure First;

    procedure LoadFromString(Text:String);

    procedure Clear;
  End;

  TQTXContext = Class(TObject)
  Private
    FBuffer:    TQTXBuffer;
  public
    Property    Buffer:TQTXBuffer read FBuffer;
    Property    ApplicationObject:TLPPApplicationObject;
    Constructor Create;virtual;
    Destructor  Destroy;Override;
  end;

  TQTXCustomParser      = Class;
  TQTXCustomParserClass = Class of TQTXCustomParser;

  TQTXParserInfo = Class(TObject)
  public
    property  Instance: TQTXCustomParser;
    Property  Keyword:String;
  end;

  TQTXCustomParser = Class(TObject)
  private
    FContext:   TQTXContext;
    FParsers:   Array of TQTXParserInfo;
  protected
    function    getParserIndexFor(keyword:String;
                var aIndex:Integer):Boolean;
    function    getParserInstanceFor(keyword:String;
                var aParser:TQTXCustomParser):Boolean;
    procedure   RegisterParser(keyword:String;
                const aParser:TQTXCustomParser);
  protected
    procedure   ParseAsExecuteBlock;virtual;
  protected
    Procedure   ClearParserInstances;virtual;
  public
    Property    Context:TQTXContext read FContext;
    procedure   Parse;virtual;
    Constructor Create(aContext:TQTXContext);virtual;
    Destructor  Destroy;Override;
  end;

  (* Parser for the keyword "program" *)
  TQTXProgramParser = Class(TQTXCustomParser)
  protected
    procedure   ParseProgramName;
    procedure   Parse;override;
  public
    Constructor Create(aContext:TQTXContext);override;
  end;

  TQTXExecuteParser = class(TQTXCustomParser)
  protected
    procedure   ParseScope;
    procedure   Parse;override;
  public
    Constructor Create(aContext:TQTXContext);override;
  end;

  TQTXLPPParser = Class(TQTXCustomParser)
  protected
    procedure   Parse;override;
  public
    Constructor Create(aContext:TQTXContext);override;
  end;

  //##########################################################################
  // In-memory program model classes
  //##########################################################################

  TLPPObject = Class(Tobject)
  end;

  TLPPModule = Class(TLPPObject)
  public
    Property  Name:String;
  end;
  TLPPModuleList = Array of TLPPModule;

  TLPPApplicationObject = Class(TLPPObject)
  private
    FModules: TLPPModuleList;
  public
    Property    ApplicationName:String;
    Property    Modules:TLPPModuleList read FModules;
    function    IndexOfModule(aModuleName:String):Integer;
    Procedure   Clear;
    Constructor Create;virtual;
    Destructor  Destroy;Override;
  end;

implementation

var
  TOK_SYM: Array[TOK_ROUNDOPEN..TOK_COLON] of string =
  ('(',')','{','}',' ','+','-','/','^',',',';');

//###########################################################################
// TLPPApplicationObject
//###########################################################################

Constructor TLPPApplicationObject.Create;
begin
  inherited Create;
end;

Destructor TLPPApplicationObject.Destroy;
Begin
  clear;
  inherited Destroy;
end;

function TLPPApplicationObject.IndexOfModule(aModuleName:String):Integer;
var
  x:  Integer;
Begin
  result:=-1;
  aModulename:=trim(lowercase(aModuleName));
  if length(aModuleName)>0 then
  begin
    for x:=0 to FModules.Count-1 do
    Begin
      if lowercase(FModules[x].Name)=aModuleName then
      begin
        result:=x;
        break;
      end;
    end;
  end;
end;

Procedure TLPPApplicationObject.Clear;
begin
  ApplicationName:='';
  while FModules.Count>0 do
  begin
    FModules[0].free;
    FModules[0]:=NIL;
    FModules.Delete(0,1);
  end;
end;

//###########################################################################
// TQTXLPPParser
//###########################################################################

Constructor TQTXLPPParser.Create(aContext:TQTXContext);
begin
  inherited Create(aContext);
  (* Register the "program" keyword-parser *)
  RegisterParser('program',TQTXProgramParser.Create(context));
end;

procedure TQTXLPPParser.Parse;
var
  mCache: String;
  mSubParser: TQTXCustomParser;
Begin

  repeat
    case Context.Buffer.Current of
    ' ':  Begin
            self.Context.Buffer.SkipJunk;
          end;
    '/':  Begin
            if context.buffer.Compare('//') then
            begin
              self.context.buffer.SkipJunk;
            end;
          end;
    '(':  Begin
            if Context.Buffer.Compare('(*') then
            Context.buffer.SkipJunk else
            Begin

              if getParserInstanceFor(mCache,mSubParser) then
              begin

                (* Invoke sub parser *)
                mSubParser.Parse;

                (* SKip trailing junk if any *)
                Context.Buffer.SkipJunk;

                (* clear cache *)
                mCache:='';

              end else
              raise ENPPException.CreateFmt
              ('Syntax error [%s]',[mCache]);

            end;
          end;
    else
      Begin
        mCache += Context.Buffer.Current;
      end;
    end;

    if not Context.buffer.next then
    break;
  until Context.buffer.EOF;
end;

//###########################################################################
// TQTXExecuteParser
//###########################################################################

Constructor TQTXExecuteParser.Create(aContext:TQTXContext);
Begin
  inherited Create(AContext);
end;

procedure TQTXExecuteParser.Parse;
var
  mDummy: Integer;
Begin
  if not self.getParserIndexFor('execute',mDummy) then
  Begin
    self.RegisterParser('execute',TQTXExecuteParser.Create(Context));
  end;


  (* Expects: Scope param list (), with a bare minumum of (STAR) *)
  //writeln(context.buffer.TextFromCurrent);
  writeln('Enters execute parser');
  context.buffer.SkipJunk;

  (* Check for "(" *)
  if Context.Buffer.Current=TOK_SYM[TOK_ROUNDOPEN] then
  Begin
    // Skip the "(" char
    Context.Buffer.Next;
    ParseScope;
  end;

  if context.buffer.ReadTo('{') then
  Begin
    Context.buffer.next;

    (* Parse content as an execute block *)
    self.ParseAsExecuteBlock;


  end else
  Raise Exception.Create('Syntax error, expected code-block entry error');


  writeln('leaves execute parser');
end;

procedure TQTXExecuteParser.ParseScope;
var
  mScope: String;
Begin
  (* Keep reading until we find ")", or break on others *)
  if context.buffer.ReadTo([')',';','{','}'],mScope) then
  Begin
    (* Valid stop? or one of the breakers? *)
    if context.buffer.Current=')' then
    Begin
      writeln('Defined scope for this execute block is:' + mScope);

      //skip ")" char
      context.buffer.next;

    end else
    Raise Exception.Create('Syntax error, expected scope terminator error');
  end;
end;

//###########################################################################
// TQTXProgramParser
//###########################################################################

Constructor TQTXProgramParser.Create(aContext:TQTXContext);
begin
  inherited Create(aContext);
  RegisterParser('execute',TQTXExecuteParser.Create(context));
end;

procedure  TQTXProgramParser.ParseProgramName;
var
  mTemp:  String;
Begin
  (* Expects:  ("NAME")
     As in Name of program *)
  mtemp:='';

  if Context.Buffer.Current='(' then
  begin
    Context.buffer.next;
    if Context.buffer.current='"' then
    begin
      Context.Buffer.next;

      if Context.buffer.readTo('"',mTemp) then
      begin
        Context.ApplicationObject.ApplicationName:=mTemp;

        Context.Buffer.next;

        Context.Buffer.SkipJunk;
        if Context.buffer.Current=TOK_SYM[TOK_ROUNDCLOSE] then
        Begin
          Context.Buffer.next;


        end else
        Raise Exception.Create('Syntax error, expected ")" param error');
      end else
      Raise Exception.Create('Syntax error, expected string termination error');

      //showmessage(mName);
    end else
    Raise Exception.Create('Syntax error, expected string identifier');
  end else
  Raise Exception.Create('Syntax error, expected parameter');

end;

procedure TQTXProgramParser.Parse;
Begin
  writeln(ClassName);
  context.buffer.SkipJunk;

  (* Check for "(" *)
  if Context.Buffer.Current=TOK_SYM[TOK_ROUNDOPEN] then
  Begin
    ParseProgramName;
    writeln('Program name:' + Context.ApplicationObject.ApplicationName);

    context.Buffer.SkipJunk;
    if context.Buffer.Current=TOK_SYM[TOK_CUROPEN] then
    Begin
      context.buffer.next;

      repeat
        ParseAsExecuteBlock;

        Context.buffer.SkipJunk;
        if context.buffer.Current=TOK_SYM[TOK_CURCLOSE] then
        break;
      until Context.Buffer.EOF;

    end else
    raise exception.create('Syntax error, expected program-block entry');
  end;

  writeln("Exiting TQTXProgramParer");

end;

//###########################################################################
// TQTXCustomParser
//###########################################################################

Constructor TQTXCustomParser.Create(aContext:TQTXContext);
Begin
  inherited Create;
  FContext:=aContext;

end;

Procedure TQTXCustomParser.ClearParserInstances;
Begin
  (* Release sub-parser instances if any *)
  while FParsers.Count>0 do
  Begin
    if FParsers[0].Instance<>NIL then
    Begin
      FParsers[0].Instance.free;
      FParsers[0].Instance:=NIL;
    end;
    FParsers[0].free;
    FParsers.Delete(0,1);
  end;
end;

Destructor TQTXCustomParser.destroy;
begin
  ClearParserInstances;
  inherited;
end;

function TQTXCustomParser.getParserIndexFor(keyword:String;
         var aIndex:Integer):Boolean;
var
  x:  Integer;
begin
  result:=False;
  aIndex:=-1;
  keyword:=lowercase(trim(keyword));
  if keyword.length>0 then
  Begin
    //if FParsers.Count>0 then
    //begin
      for x:=0 to FParsers.count-1 do
      Begin
        if sameText(FParsers[x].Keyword,keyword) then
        begin
          result:=true;
          aIndex:=x;
          break;
        end;
      end;
    //end;
  end;
end;

function TQTXCustomParser.getParserInstanceFor(keyword:String;
         var aParser:TQTXCustomParser):Boolean;
var
  x:  Integer;
begin
  result:=False;
  aParser:=NIL;
  keyword:=lowercase(trim(keyword));
  if keyword.length>0 then
  Begin
    //if FParsers.Count>0 then
    //begin
      for x:=0 to FParsers.count-1 do
      Begin
        if sameText(FParsers[x].Keyword,keyword) then
        begin
          result:=true;
          aParser:=FParsers[x].Instance;
          break;
        end;
      end;
    //end;
  end;
end;

procedure TQTXCustomParser.RegisterParser(keyword:String;
          const aParser:TQTXCustomParser);
var
  mIndex: Integer;
  mInfo:  TQTXParserInfo;
Begin
  keyword:=lowercase(trim(keyword));
  if keyword.length>0 then
  Begin
    if aParser<>NIL then
    begin
      if not getParserIndexFor(keyword,mIndex) then
      Begin
        (* register sub parser *)
        mInfo:=TQTXParserInfo.Create;
        mInfo.Instance:=aParser;
        mInfo.Keyword:=keyword;
        FParsers.Add(mInfo);
      end else
      raise ENPPException.createFmt
      ('Parser for keyword [%s] already registered',[keyword]);
    end else
    raise ENPPException.CreateFmt
    ('Parser class for keyword [%s] was NIL error',[keyword]);
  end;
end;

procedure TQTXCustomParser.Parse;
begin
  raise ENPPException.CreateFmt
  ('No parser implemented for class %s',[classname]);
end;

(* About:
   This method continues the parsing process as if it's parsing from
   within an execute() code block.
   This means that all the rules of  normal 'Execute( * ) "
   applies. After Program() this is the default parsing mode, which
   makes it the fallback of more or less all parse-objects which allows
   ordinary code (e.g "case/switch" sections and more) *)

procedure TQTXCustomParser.ParseAsExecuteBlock;
var
  mCache: String;
  mTemp:  String;
  mParser: TQTXCustomParser;
Begin

  while not Context.Buffer.EOF do
  Begin
     case Context.Buffer.Current of
     ' ',
     #9:  begin
            //Context.buffer.SkipJunk;
          end;
     '(': If context.Buffer.Compare('(*')
          and not Context.buffer.compare('(*)') then
          Begin
            Context.buffer.next;
            context.buffer.next;
            Context.Buffer.ReadTo('*)');
            Context.Buffer.Next;
            Context.Buffer.Next;
          end else
          Begin
            if mCache.length>0 then
            Begin
              writeln('Command is: "' + mCache + '"');

              if getParserInstanceFor(lowercase(mCache),mParser) then
              Begin
                writeln('Parser ' + mParser.classname + ' is registered for keyword ' + mCache);
                try
                  mParser.Parse;
                except
                  on e: exception do
                  raise Exception.Create(e.message);
                end;
              end else
              Raise Exception.create('Syntax error, unknown token:' + mCache);

              mCache :='';

              exit;
            end else
            Raise Exception.Create('Syntax error, unsuspected character');
          end;
     '/': If Context.buffer.compare('//') then
          begin
            Context.buffer.ReadToEOL(mTemp);
          end else
          if Context.buffer.compare('/*') then
          Begin
            Context.buffer.next;
            Context.buffer.Next;
            Context.buffer.ReadTo('*/');
          end;
     '{': Begin
            if mCache.length>0 then
            Begin
              if getParserInstanceFor(lowercase(mCache),mParser) then
              Begin
                showmessage('over here!');

              end else
              Raise Exception.create('Syntax error, unknown token:' + mCache);

            end else
            Raise Exception.Create('Syntax error, unexpected character');
          end;
     '}': Begin
          end;
     else
        Begin
          mCache += Context.buffer.Current;
        end;

     end;

    if not context.buffer.next then
    break;
  end;

  writeln('Exiting CodeBlock parser at:' + mCache);

end;

//###########################################################################
// TQTXContext
//###########################################################################

Constructor TQTXContext.Create;
begin
  inherited Create;
  FBuffer:=TQTXBuffer.create;
  ApplicationObject:=TLPPApplicationObject.Create;
end;

Destructor TQTXContext.destroy;
Begin
  FBuffer.free;
  ApplicationObject.free;
  inherited;
end;

//###########################################################################
// TQTXBuffer
//###########################################################################


procedure TQTXBuffer.First;
Begin
  If FData.length>0 then
  begin
    FLineNr:=1;
    FIndex:=0;
  end;
end;

procedure TQTXBuffer.SkipJunk;
var
  mTemp:  String;
begin
  repeat
    case Current of
    ' ',
    #9  :  Begin
            (* We treat TAB as space *)
            //if not next then
            //break;

          end;
    '/':  Begin
            (* Skip C style remark *)
            if Compare('/*') then
            begin
              if readTo('*/') then
              Begin
                Next;
                Next;
              end else
              raise ENPPException.Create('Comment not closed error');
            end else
            Begin
              (* Skip Pascal style remark *)
              if Compare('//') then
              begin
                if ReadToEOL(mTemp) then
                next else
                raise ENPPException.Create('Expected end of line error');
              end;
            end;
          end;
    '(':  Begin
            (* Skip pascal style remark *)
            if compare('(*')
            and not compare('(*)') then
            Begin
              if readTo('*)') then
              begin
                Next;
                Next;
              end else
              raise ENPPException.Create('Comment not closed error');
            end else
            break;
          end;
    #13:
        begin
          //
        end;
    else
      begin
        Break;
      end;
    end;
    next;
  until EOF;
end;

Procedure TQTXBuffer.Clear;
begin
  FData:='';
  FLineNr:=1;
  FIndex:=0;
end;

procedure TQTXBuffer.LoadFromString(Text:String);
Begin
  FLineNr:=1;
  FData:=trim(text);
  FData:=StrReplace(FData,#10,#13);
  FData:=StrReplace(FData,#13#13,#13);
  FIndex:=1;
  if length(FData)<1 then
  FIndex:=-1;
end;

function TQTXBuffer.getCurrent:String;
Begin
  result:=FData[FIndex];
end;

function TQTXBuffer.ReadWord(var Text:String):Boolean;
begin
  result:=False;
  Text:='';
  if not EOF then
  begin
    repeat
      if (current in ['A'..'Z','a'..'z','0'..'9']) then
      Text += current else
      break;
    until not next;
    result:=length(Text)>0;
  end;
end;

function TQTXBuffer.Compare(aText:String):Boolean;
var
  mData:  String;
Begin
  result:=PeekAhead(length(aText),mData)
  and SameText(lowercase(mData),lowercase(aText));
end;

function TQTXBuffer.PeekAhead(aCount:Integer;var Text:String):Boolean;
var
  mPos: Integer;
Begin
  result:=False;
  Text:='';

  if not EOF then
  Begin
    mPos:=FIndex;
    try
      while aCount>0 do
      begin
        Text+=Current;
        if not Next then
        break;
        dec(aCount);
      end;
      result:=length(text)>0;
    finally
      FIndex:=mPos;
    end;
  end;
end;

function TQTXBuffer.ReadToEOL(var text:String):Boolean;
Begin
  result:=ReadTo([#13,#10],text);
end;

function TQTXBuffer.ReadTo(aValue:String;var inner:String):Boolean;
var
  mText:  String;
begin
  inner:='';
  result:=False;

  aValue:=lowercase(aValue);
  if aValue.length>0 then
  begin

    repeat
      if PeekAhead(aValue.length,mText) then
      begin
        mText:=lowercase(mText);
        result:=SameText(mText,aValue);
        if result then
        break else
        inner += Current;
      end else
      inner += Current;
      Next;
    until EOF;

  end;
end;

function TQTXBuffer.readTo(aValue:String):Boolean;
var
  mText:  String;
begin
  result:=False;
  aValue:=lowercase(aValue);
  if aValue.length>0 then
  begin

    repeat
      if PeekAhead(aValue.length,mText) then
      begin
        mText:=lowercase(mText);
        result:=SameText(mText,aValue);
        if result then
        break;
      end;
      if not Next then
      break;
    until EOF;

    (* repeat
      if PeekAhead(aValue.length,mText)
      and SameText(lowercase(mtext),aValue) then
      Begin
        result:=true;
        break;
      end else
      if not next then
      break;
    until EOF; *)

  end else
  Raise ENPPException.Create
  ('ReadTo() failed, invalid target value error');
end;

function TQTXBuffer.ReadTo(Const aChars:Array of string;
         var text:String):Boolean;
var
  x:  Integer;
Begin
  result:=False;
  text:='';
  if aChars.Length>0 then
  begin
    for x:=FIndex to FData.length do
    Begin
      if (Current in aChars) then
      Begin
        result:=true;
        break;
      end else
      text+=Current;

      if not Next then
      break;
    end;
  end;
end;

Function TQTXBuffer.Back:Boolean;
begin
  result:=FIndex>1;
  if result then
  dec(FIndex);
end;

function TQTXBuffer.Next:Boolean;
begin
  Result:=FIndex<fdata.length; if="" result="" then="" begin="" inc(findex);="" (current="" in="" [#13,#10])="" inc(flinenr);="" end;="" function="" tqtxbuffer.bof:boolean;="" result:="FIndex=1;" tqtxbuffer.eof:boolean;="">=FData.Length;
end;


end.
Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: