Home > Delphi, JavaScript, Object Pascal, OP4JS, Smart Mobile Studio > TextCraft 1.2 for Smart Pascal

TextCraft 1.2 for Smart Pascal

January 26, 2018 Leave a comment Go to comments

TextCraft is a fast, generic object-pascal text parsing framework. It provides you with the classes you need to write fast text parsers that builds effective data models.

The Textcraft framework was recently moved up to version 1.2 and has been ported from Delphi to both Freepascal and Smart Pascal (the dialect used by Smart Mobile Studio). This is probably the only parsing framework that spans 3 compilers.

Smart Pascal coders can download the framework unit here. This can be placed in their $Install/Library folder  (where $install is where Smart’s library and rtl folder is installed): BitBucket TextCraft Repository

Buffer, parser, model

Textcraft divides the job of parsing into 4 separate objects; each of them representing a concept familiar to people writing compilers; these are: buffer, parser, model and context. If you are parsing a programming language the “model” would be what people call the AST (short for “Abstract Symbol Tree”). This AST is later feed to the code generator, turning it into an executable program (Smart Pascal compiles to JavaScript so there really is no limit to the transformation, just level of complexity).

Note: Textcraft is not a compiler for any particular language, it is a generic text parsing framework that is language-agnostic. Meaning that it makes it easy for you to make parsers with it. We recently used it to parse command-line parameters for Freepascal, so it doesn’t have to be about languages.

The buffer

The buffer has one of the most demanding jobs in the framework. In other frameworks the buffer is often just a memory allocation with a simple read method; but in TextCraft the model is responsible for a lot more. It has to expose functions that makes text recognition simple and effective; it has to keep track of column and row position as you move through the buffer content – and much, much more. So in TextCraft the buffer is where text methodology is implemented in full.

The parser

Like mentioned the parser is responsible for using the buffer’s methods to recognize and make sense of a text. As it makes its way through the buffer content, it creates model-objects that represents each element. Typical for a language would be structures (records), classes, enums, properties and so on. Each of these will be registered in the AST data model.

The Model

The model is a construct. It is made up of as many mode-object instances as you need to express the text in symbolic form. It doesn’t matter if you are parsing a text document or source code, you would still have to define a model for it.

The model obviously reflect your needs. If you just need a superficial overview of the data then you create a simple model. If you need more elaborate information then you create that.

Note: When parsing a text document, a traditional organization would be to divide the model into: chapter, section, paragraph, line and individual words.

The Context

The context object is what links the parser to our model and buffer objects. By default the parser doesn’t know anything about the buffer or model. This helps us abstract away things that would otherwise turn our code into a haystack of references.

The way the context is used can be described like this:

When parsing complex data you often divide the job into multiple classes. Each class deals with one particular topic. For example: if parsing Delphi source code, you would write a class that parses records, a parser that handles classes, another that handles field declarations (and so on).

As a parser recognize mentioned objects, like say a record, it will create a record model object to hold the information. It will then add that to the context by pushing it onto its reference stack.

The first thing a parser does is to grab the model object from the reference to stack. This way the child parsers will always know where to store their model information. It doesn’t matter how deep or recursive something gets, the stack approach and passing the context object to the child parsers – will always make sure each parser “knows” where to store information.

Why is this important?

This is important because it’s cost-effective in computing terms. The TextCraft framework allows you to create parsers that can chew through complex data without turning your project into spaghetti.

So no matter if you are parsing phone-numbers, zip codes or complex C++ source code, TextCraft will make help you get the job done; in a way that is easy to understand and mentain.

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: