Home > JavaScript, Object Pascal, OP4JS, Smart Mobile Studio > Smart codecs, a draft

Smart codecs, a draft

October 18, 2016 Leave a comment Go to comments

When you hear the word codec, what comes to your mind? For me I get flashbacks to early 2000’s when I worked in visual C++ and Visual Basic on a huge media project. ActiveX was a total cluster fu**, with a proverbial vipers nest of interfaces, vague explanations and more blue-screens than i care to remember.

Working on the RTL for Smart Mobile Studio has forced me to learn a lot of topics that I would, had I chosen not to go down the JavaScript rabbit hole, never have given a second thought. And I have now reached the point where codecs is starting to make more and more sense for the RTL. Let me explain why.

Codec is just short for “encode decode”, and a codec represents just an isolated ability to encode or decode a piece of data. If you have followed the evolution of Smart Mobile Studio and know your way around the RTL – you must have noticed that we have a fair bit of duplicate code hanging around. So far I have found two wildly different CRC implementations, a sexy and fast RC4 encryption routine, RLE compression, URL encoding, Base64 encoding, UTF8 encoding .. the list goes on.

All of these routines, no matter how small or large they may be, does the exact same thing: they encode or decode a piece of data. So you input one type of data and you get something else back.

Part of our RTL update is not just about adding cool new features, it’s also about order, cleaning up and getting the RTL into a form that is better suited for the future. I have already blogged about namespaces and the important role they play in the next version of Smart Mobile Studio. Well, I think its time to put our encoding/decoding capabilities into order as well.

Smart codecs

The first challenge that comes to mind when you sit down to create a codec framework, is the data formats you have to deal with. I mean, what is the point of a codec class if its only capable of processing strings right? And since Smart has all these cool mediums to play with, like streams, buffers and untyped arrays,  we want each codec to support as many such input and output types as possible.

This is where things get’s complicated and the critique I dished out to Microsoft ages ago comes back to haunt me. Because I now see why their codec system had to be so elaborate.

I mean, lets say you have a URL codec. It exists to take strings and replace characters that could be filtered out by the locale on another pc – so all it does is to replace these characters with a %charcode representation. That way the text can be re-mapped on the other side safely.

For a codec designed to process this type of data, you would have to define an input channel that can deal with text – as well as an output channel of the same type. But what if you want the output as a stream or memory buffer? Suddenly input and output gates become more complex, more fluid and data-dependent.

IBinaryTransport to the rescue

In the last update I did something magical. So magical in fact that I was oblivious to the fact that that I had just solved the problem of data mitigation. It was a 2 second mistake that turned out to be a stroke of genius (which I can’t take credit for, because I had no idea I was being genius at the time).

IBinaryTransport is just a super simple interface that gives you info about how much data is available, a reading method and a writing method, and finally the position of the read/write cursor. It’s stupendously simple, yet solves so many problems:

  IBinaryTransport = Interface
    function  DataOffset: integer;
    function  DataGetSize: integer;
    function  DataRead(const Offset: integer; const ByteCount: integer): TByteArray;
    procedure DataWrite(const Offset: integer; const Bytes: TByteArray);
  end;

What I did was to add this interface to both TStream, TAllocation and TBinaryData. I figured it would be handy when calling ToStream() and similar helper functions. But it turned out that with this interface in place – I could throw out TStreamReader, TStreamWriter, TBufferReader, TBufferwriter – and just write a single reader/writer class. TStreamReader actually doesnt have a clue what a stream is, because it speaks to the target via IBinaryTransport.

Same with TReader and TWriter for TBinaryData (memory buffer). They don’t know anything about memory at all, they just know how to communicate with the target through IBinaryTransport.

So how does this affect codecs? It means we can throw out the older model completely. Instead of having to define a whole list of input/output channels, add support for dealing with each of them — all we have to do is make the input and output channels IBinaryTransport (!)

This reduces the basic codec class to this:

  TCustomCodec = class(TObject, ICodecBinding, ICodecProcess )
  private
    FBindings:  TCodecBindingList;
    FCodecInfo: TCodecInfo;
  protected
    /* IMPLEMENTS :: ICodecBinding */
    procedure RegisterBinding(const Binding: TCodecBinding);
    procedure UnRegisterBinding(const Binding: TCodecBinding);
  protected
    /* IMPLEMENTS :: ICodecBinding */
    procedure Encode(const Source: IBinaryTransport;
        const Target: IBinaryTransport); virtual; abstract;

    procedure Decode(const Source: IBinaryTransport;
        const Target: IBinaryTransport); virtual; abstract;
  protected
    function  MakeCodecInfo: TCodecInfo; virtual;
  public
    property Info: TCodecInfo read FCodecInfo;
    constructor Create;virtual;
    destructor Destroy;Override;
  end;

And the work you have to do implementing a new codec to this:

  TURLCodec = class(TCustomCodec)
  protected
    (* IMPLEMENTS :: ICodecProcess *)
    procedure Encode(const Source: IBinaryTransport;
        const Target: IBinaryTransport); override;
    procedure Decode(const Source: IBinaryTransport;
        const Target: IBinaryTransport); override;
  protected
    function MakeCodecInfo: TCodecInfo; override;
  end;

Using a codec

A codec should never be used directly, but via proxy. This gives the framework a chance to create instances on demand, and also makes sure you don’t compile in codecs you never use. A few basic codec classes will be registered when your application starts – but these have always been compiled with your Smart application so there is little or no difference with regards to size or speed.

The benefits however are great: all codecs register with a central manager, they also register their mime-type (the format they work on), making it possible for you to query the codec-manager if it supports a specific encoding/decoding mechanism.

And naturally, it gives us uniformity with regards to data processing.

Here is an example of using the codec-api to URL encode a string:

var
  LBinding: TCodecBinding;
  LList: TCodecList;
  LSource: TStream;
  LTarget: TStream;
  LReader: TReader;
  LWriter: TWriter;
begin

  if CodecManager.QueryByName('url', LList) then
  begin
    LSource := TMemoryStream.Create;
    try
      LTarget := TMemoryStream.Create;
      try

        LWriter := TWriter.Create(LSource as IBinaryTransport);
        try
          LWriter.Options := [woEmulateCursor, woFlexibleBoundaries];
          LWriter.write( TDataType.StringToBytes("This is a string ready for URL-encoding!") );
        finally
          LWriter.free;
        end;
        LSource.position := 0;

        LBinding := TCodecBinding.Create(LList[0]);
        try
          LBinding.Input := LSource as IBinaryTransport;
          LBinding.Output := LTarget as IBinaryTransport;
          LBinding.Encode();
        finally
          LBinding.free;
        end;

        LTarget.position := 0;
        LReader := TReader.Create(LTarget as IBinaryTransport);
        try
          var LBytes := LReader.Read(LTarget.Size);
          writeln( TDataType.bytesToString(LBytes) );
        finally
          LReader.free;
        end;

      finally
        LTarget.free;
      end;
    finally
      LSource.free;
    end;
  end;

The cool part here is that the data can be anything. The source can be a stream, a buffer, a memory allocation – even a socket (!) Same with the target. As long as the target object implements IBinaryTransport the codecs will do their bussiness regardless of medium.

codecs

The output is an URL encoded string, processed at byte level

URL, Base64, UTF8, UTF16 .. and encryption ciphers naturally .. all these processes can now be represented under a single, efficient and small framework. And yes, there will naturally be top-level functions for these, same as before. I don’t expect people to write all of this every time they need to URL encode a string.

The point here is depth, and that Smart Mobile Studio applications can now do more and more of what you would expect from a native compiled system.

Advertisements
  1. October 18, 2016 at 8:52 am

    Shouldn’t Encoder and Decoder be different classes? Sometimes you only need an encoder, so why should you also implement the decoder and vice versa.

    • October 18, 2016 at 3:57 pm

      Well that depends. There is no clear benefit of separating encoding from decoding. In fact, when we start moving into encryption and compression – especially the LZH type compressors — it would actually be worse if we separate them. Both will have to initialize a truckload of lookup tables, all of them to the same data. So by separating them we basically fill the memory with 2 copies of the same tables.

      Also I think we should keep them lightweight. The abstraction allows us to separate them later if required. But that would only make sense if someone sat down and did a MPG encoder or something, and while that would work – I would urge people to call out from nodejs to the OS for heavy stuff like that.

  2. October 18, 2016 at 12:49 pm

    It seems to be a lot of potential, sadly, I couldn’t reproduce the project, you could publish the project. In your example of using the codec-api to URL encode a string it resembles a reset password scenario:

    In this canse, I would use the built-in javascript method encodeURI to encode the “resetCode”:

    http://localhost:8080/root/MyServiceFree.ResetPassword?resetCode=TDsIInOXIfj%2ByvJ4XmSN7sCinLoS0R%2FAxBi3%2FoFnAz01knFQHAzO0QgYzPbOORSV5ix2FvU6Nqlga05MROIgRV%2FIsLo24wOVxx6zMMBFJ65EvGMz0OVxtjdIP2yr9B7WnsfRCPEl%2Fc%2FFbhy3BWBQBVlG6aHBkWRdGPn4HV8B0rF5%2FuBG5qSi3orVAiWZYX3Hix8SBLdGqioTHiQNuHKdLw%3D%3D

    BTW, yesterday, I’ve just created an smart wrapper code for AES encryption/decryption (symmetric cipher 128/256-bit). See at http://forums.smartmobilestudio.com/index.php?/topic/4188-encryption-options/

    The idea is encrypting a cipher/streams/buffers in SMS and decrypt them in the Delphi/FPC server.

  3. October 18, 2016 at 3:40 pm

    Hi buddy!
    Well im sort of creating this now, so naturally its not yet out there — its a draft. I know it may be irritating to not being able to reproduce it, but i figures i would share what im working on.

  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: