Home > Delphi, JavaScript, nodeJS, Object Pascal > Smart Pascal: Memory and pointers

Smart Pascal: Memory and pointers

Allocating, manipulating and working with memory is an important part of any programming language. While there are many ways of achieving the same result, comparing operations that work directly with the data using pointers – to code that is purely restricted to high-level, fixed datatype arrays is not even a competition. By using pointers you will be able to copy, fill, move and otherwise do things to large chunks of memory that would otherwise be slow and impractical.

Anyone who has ever coded their own game, or working with large blobs in a dataset knows this. I would even argue that you can’t even write a high-speed game in a native environment without pointers (unless you are using one of those game makers, which is not real programming anyways) – and the same can be said about database engines. Could you imagine writing a database engine without using pointers to move data around in memory? I would hope not.

Im not saying that you cant, I’m simply saying it would be a total waste of time not using pointers when the alternative would be 100 times slower.

The web stack and memory

JavaScript is known for many things, but manual memory management is certainly not one of them. That doesn’t mean JavaScript lacks any features for dealing with memory (those days are thankfully behind us), it simply means that your average JavaScript developer havent explored this aspect to the same extent as a C or Object Pascal developer would. These is no shame in this; If JavaScript is all you know then it would be a field you are unconfortable with.

But the JavaScript Virtual Machine and runtime-environment in a modern browser is actually quite good at memory. A lot has happened to JavaScript these past few years and regardless of what engine you use, be it Spidermonkey under Firefox or V8 under Chrome and Safari (and even Microsoft Edge) they have all been advanced with cool new features that object pascal and C/C++ coders are familiar with.

So fact is that a well written JavaScript function can move data around just as fast as your Object Pascal or C++ code. You just need to know the rules. And also get things into a framework that is easier to work with. JavaScript may have the functionality – but like all things JavaScript its not pretty and takes some getting used to. Which is why the Smart RTL is so nice to have.

How does memory work under JSVM?

First a few words about JSVM (the JavaScript virtual machine). First of all the runtime and the environment are two different things. If you have ever used a script engine in your Delphi or Lazarus applications, like DWScript, PascalScript or Lua for that matter – you will notice that out of the box these engines have no functionality except the fundamental language. They can run functions and create class instances, but the engine itself have no concept of a file or chunk of memory. Not unless you define those features and register the methods that delivers it.

So it is with the JavaScript runtime environment. It is initially empty. Engines like V8 or Spidermonkey have, out of the box, no special functionality at all. It is up to whomever uses the engine (in this case those that make browsers) to write and expose native objects to the engine. Objects that your code can access and consume.

The way memory is handled under Node.js for instance is much faster than how its done in a browser; because node has been optimized for server tasks rather than client tasks. There are also subtle differences between a node.js buffer-object and a browser buffer-object. This is where the strength of the Smart RTL becomes apparent, because it shields you from these differences and gives you a unified API that deals with all of this for you. So no matter if you work with node.js or FireFox – the same code will run identical on both JSVM configurations.

OK, lets look at what JavaScript has to offer. As mentioned above JavaScript is empty by default, and its up to the browser or node to expose a set of common objects you can access and use. And these objects are first and foremost:

  • Buffers
  • Views
  • Blobs

PS: We can look away from the latter since its not really interesting at this point.

Right, as you can probably guess the buffer object is the actual data. Or more correct, it is the container object for your data. It doesn’t have any read or write methods and it exists only to keep track of an allocated chunk of memory. Think of it as TMemoryStream without any read or write functionality.

But how do we access that memory? Well this is where things get interesting because JavaScript has a view object that you attach to your buffer. Its a bit like creating a TStreamWriter or TStreamReader object under object pascal.

The interesting part is not that a view object expose read and write methods, but that the view type defines a fixed datatype. A fixed datatype of your choosing.

Lets say you want to access your buffer as an array of longwords, well there is a view for that (and you can also create something called “typed arrays” connected to a buffer). But you can also access the same buffer using say a word (16 bit) integer if you wish. Since you can have multiple views attached to the same buffer, this opens up for some interesting optimization techniques – not to mention spectacular error messages when you get it wrong (ps: stick to the RTL and it will take care of you).

The rule is that unsigned datatypes (like uint8 for a “normal” byte) is prefixed with a “u”, while signed types have no prefix.

  • So views that access a buffer as unsigned types are prefixed with “u”
    • uint8buffer
    • uint16buffer [and so on]
  • Signed views lack the prefix
    • int8buffer
    • int16buffer

Now the Smart Pascal RTL will shield you from all the nitty-gritty – but it’s important to understand how memory and raw data is dealt with behind the scenes by the JSVM. And abstracting the reader writer methods from the actual data makes sense. In fact I implemented something similar myself earlier (my ByteRage Delphi library).

Once you get to know it, every piece falls into place.

Smart memory

The units we will be looking at are the following:

  • System.Types
  • System.Types.Convert
  • System.Memory
  • System.Memory.Allocation
  • System.Memory.Buffer

Start by opening up the System.Memory unit. Spend a few minutes getting an overview of the classes and functions there. And when you have had a peek focus on the class TAddress.

TAddress is a class that wraps around a JavaScript untyped buffer. It can be compared to a pointer holding the address to the memory you allocated (but also more, like the size). Let’s have a look at it:

  TAddress  = partial class(TObject)
  private
    FOffset:    Integer;
    FBuffer:    TMemoryHandle;
  protected
    function    GetSize: integer;virtual;
  public
    Property    Entrypoint: integer read FOffset;
    Property    Segment: TMemoryHandle read FBuffer;
    Property    Size: integer read GetSize;
    function    Addr(const Index: integer): TAddress;
    Constructor Create(const Segment: TMemoryHandle;
                const Offset: integer); overload; virtual;
    Destructor  Destroy;Override;
  end;

As you can see its very small and compact. It exposes a segment handle property which is a reference to the memory object it represents. It also expose size information, entrypoint and a curious Addr() function.

The Addr() function is for getting a sub address from within the memory buffer. Let’s say you want to write some data at position 1024 in your allocated buffer, well by calling FMyAddress.Addr(1023) it would return a new TAddress instance pointing at that particular place inside the buffer.

This is why we have an entrypoint property. This keeps track of what starting offset a TAddresse should start at. If you start at 1023, then all subsequent offsets will have the entrypoint added to them. They will be relative to this position. Think of it as the position property in TStream if you like. And you can naturally have as many TAddress instances pointing to the same buffer as you like – hence we use offsets to speed things up (instead of creating a new view object for every single reference).

Let’s allocate some memory:

procedure TForm1.ReserveMemory;
var
  LData: TAddress;
begin
  LData := TMarshal.Allocmem(1024 * 1024);
end;

The above code would allocate 1 megabyte of data, and LData now points to it. We can now use the other methods of TMarshal to work with the buffer. Let’s have a look at the methods TMarshal has to offer:

  TMarshal = class static
  public
    class property  UnManaged: TUnManaged;
    class function  AllocMem(const Size:Integer): TAddress;
    class procedure FreeMem(Const Segment: TAddress);

    class procedure Move(const Source:TAddress;
                    const Target:TAddress;
                    const Size:Integer);overload;

    class procedure Move(const Source:TMemoryHandle;
                    const SourceStart:Integer;
                    const Target:TMemoryHandle;
                    const TargetStart:Integer;
                    const Size:Integer);overload;

    class procedure FillChar(const Target:TAddress;
                    const Size:Integer;
                    const Value:char);overload;

    class procedure FillChar(const Target:TAddress;
                    const Size:Integer;
                    const Value:Byte);overload;

    class procedure ReAllocMem(var Segment: TAddress;
                    const Size:Integer);

    class function  ReadMemory(const Segment: TAddress;
                    const Size:Integer):TByteArray;overload;

    class procedure WriteMemory(const Segment: TAddress;
                    const Data:TByteArray);

    class procedure Fill(Const Buffer: TMemoryHandle; Offset: Integer;
                    ByteLen: Integer;const Value: Byte);
  end;

As you can see you pretty much have the same features as Delphi or Freepascal, albeit in a different form. Delphi being a native language doesnt need such intermediate objects, but the above code is actually leaps and bounds easier to work with than anything JavaScript gives you out of the box.

Lets say I want to write a longword 1024 bytes into our allocated memory. You would then use the Addr() function we talked about earlier to simplify things:

procedure TForm1.ReserveMemory;
var
  LData: TAddress;
begin
  LData := TMarshal.Allocmem(1024 * 1024);
  TMarshal.WriteMemory(LData.Addr(1023), TDataType.Int32ToBytes($BABECAFE));
end;

Now you are probably wondering – why waste time on converting the $BABECAFE value to bytes? Well you dont have to, but it’s one way of doing it. You can ofcourse create a uint32 view and write it directly like you would an array. But in order to make you more familiar with the RTL landscape I figured it would be valuable to start with the byte conversion. Which by the way saves you a lot of time. So have a peek at TDataType in System.Types.Convert.pas, the features there will impress you once you realize how little JavaScript gives you- and how much the RTL makes of it.

Moving up to TBinaryData

Now messing around with low-level byte buffers can be rewarding, but its not really the level you want to work on. You want the power of low-level stuff but with the infrastructure of high-level programming.

In the unit “system.memory.buffer” you will find a class that is written to cover all of your binary needs. It is optimized for reading and writing data fast and also writing various datatypes directly into the buffer without TAddress being a factor.
So while you can enjoy the low-level functionality of allocmem – you are actually expected to use TBinaryData. TMemoryStream is also fine, but TBinaryData will always be faster.

And it comes with methods for pretty much everything you could think of. Including portabilty! Be it buffer to stream, buffer to arrays or the other way around. TBinaryData has a lot of cool features. With ZLib just added to the new RTL – compression and zip file stuff will be added to this class (through partial classes, so only when you include the zlib unit will the methods appear in the class):

  TBinaryData = partial class(TAllocation
      ,IBinaryDataReadAccess
      ,IBinaryDataWriteAccess
      ,IBinaryDataReadWriteAccess
      ,IBinaryDataBitAccess
      ,IBinaryDataImport
      ,IBinaryDataExport)
  private
    FDataView:  JDataView;
  protected
    function    GetByte(const Index: integer):Byte;
    procedure   SetByte(const Index: integer;const Value:Byte);
  protected
    function    GetBitCount: integer; virtual;
    function    GetBit(const bitIndex: integer): boolean;
    procedure   SetBit(const bitIndex: integer;const value: boolean);
  protected
    function    OffsetInRange(Offset: integer): boolean;
  protected
    procedure   HandleAllocated; override;
    procedure   HandleReleased; override;
  public
    property    BitCount:integer read GetBitCount;
    property    Bytes[const ByteIndex: integer]: byte read GetByte write SetByte; default;
    property    Bits[const BitIndex: integer]: boolean read GetBit write SetBit;

    function    Allocation: IAllocation;

    procedure   AppendBytes(const Bytes: TByteArray);virtual;
    procedure   AppendStr(const Text: string);virtual;
    procedure   AppendMemory(const Buffer: TBinaryData; const ReleaseBufferOnExit: boolean);virtual;
    procedure   AppendBuffer(const Raw: TMemoryHandle);overload;
    procedure   AppendFloat32(const Value: Float32);virtual;
    procedure   AppendFloat64(const Value: Float64);virtual;

    procedure   CopyFrom(const Buffer: TBinaryData; const Offset: integer; const ByteLen: integer);
    procedure   CopyFromMemory(const Raw: TMemoryHandle;
                Offset: integer; ByteLen: integer);

    function    CutBinaryData(Offset: integer;ByteLen: integer): TBinaryData;
    function    CutStream(const Offset: integer; const ByteLen: integer): TStream;
    function    CutTypedArray(Offset: integer; ByteLen: integer): TMemoryHandle;

    procedure   Write(const Offset: integer; const Data: TByteArray);Overload;
    procedure   WriteFloat32(const Offset: integer; const Data: Float32);
    procedure   WriteFloat64(const Offset: integer; const Data: Float64);

    procedure   Write(const Offset:integer;const Data:TBinaryData);overload;
    procedure   Write(const Offset:integer;const Data:TMemoryHandle);overload;
    procedure   Write(const Offset:integer;const Data:String);overload;
    procedure   Write(const Offset:integer;const Data:integer);overload;
    procedure   Write(const Offset:integer;const Data:Boolean);overload;

    function    ReadFloat32(Offset:integer):Float;virtual;
    function    ReadFloat64(Offset:integer):Float;virtual;
    function    ReadBool(Offset:integer):Boolean;Overload;
    function    ReadInt(Offset:integer):integer;overload;
    function    ReadStr(Offset:integer;ByteLen:integer):String;overload;
    function    ReadBytes(Offset:integer;ByteLen:integer):TByteArray;overload;

    function    Clone: TBinaryData;

    procedure   SaveToStream(const Stream: TStream);
    procedure   LoadFromStream(const Stream: TStream);

    procedure   FromBase64(FileData: string);
    function    ToBase64: string;
    function    ToString: string;
    function    ToTypedArray: TMemoryHandle;
    function    ToBytes: TByteArray;
    function    ToStream: TStream;
    function    ToHexDump(BytesPerRow: integer;
                Options: TBufferHexDumpOptions): string;

    {$IFDEF APPTYPE_VISUAL}
    procedure   LoadFromFile(const aFileURI:String; const OnReady:TNotifyEvent);
    {$ENDIF}

    Constructor Create(aHandle: TMemoryHandle); overload; virtual;
  end;

If you already have allocated memory, or perhaps your server has been given a handle to a buffer for a file – notice that the constructor is overloaded and can work with existing buffer objects. In such a case it will not release the memory even if you free the instance (since it doesnt own it, only borrows it).

I took it one step further and added bit-manipulation as well. So you can allocate a large buffer and process it on bit level if you need to. I actually used that a lot when doing some audio code. It is also very handy when keeping track of visible sprites in a game (“find me an idle sprite” where each bit represents a sprite’s busy state). And its even an important feature when writing database engines.

Well this was a quick introduction to low-level memory stuff in Smart. There is a ton of features I havent even mentioned yet, like being able to turn data into a URI object – that can be attached to anchor (A tag) objects and auto clicked to force a “save as” to appear. This is really neat if you have a web application that generates data or binary files the user can save.

Until next time !

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: