Smart codecs, a draft
When you hear the word codec, what comes to your mind? For me I get flashbacks to early 2000’s when I worked in visual C++ and Visual Basic on a huge media project. ActiveX was a total cluster fu**, with a proverbial vipers nest of interfaces, vague explanations and more blue-screens than i care to remember.
Codec is just short for “encode decode”, and a codec represents just an isolated ability to encode or decode a piece of data. If you have followed the evolution of Smart Mobile Studio and know your way around the RTL – you must have noticed that we have a fair bit of duplicate code hanging around. So far I have found two wildly different CRC implementations, a sexy and fast RC4 encryption routine, RLE compression, URL encoding, Base64 encoding, UTF8 encoding .. the list goes on.
All of these routines, no matter how small or large they may be, does the exact same thing: they encode or decode a piece of data. So you input one type of data and you get something else back.
Part of our RTL update is not just about adding cool new features, it’s also about order, cleaning up and getting the RTL into a form that is better suited for the future. I have already blogged about namespaces and the important role they play in the next version of Smart Mobile Studio. Well, I think its time to put our encoding/decoding capabilities into order as well.
The first challenge that comes to mind when you sit down to create a codec framework, is the data formats you have to deal with. I mean, what is the point of a codec class if its only capable of processing strings right? And since Smart has all these cool mediums to play with, like streams, buffers and untyped arrays, we want each codec to support as many such input and output types as possible.
This is where things get’s complicated and the critique I dished out to Microsoft ages ago comes back to haunt me. Because I now see why their codec system had to be so elaborate.
I mean, lets say you have a URL codec. It exists to take strings and replace characters that could be filtered out by the locale on another pc – so all it does is to replace these characters with a %charcode representation. That way the text can be re-mapped on the other side safely.
For a codec designed to process this type of data, you would have to define an input channel that can deal with text – as well as an output channel of the same type. But what if you want the output as a stream or memory buffer? Suddenly input and output gates become more complex, more fluid and data-dependent.
IBinaryTransport to the rescue
In the last update I did something magical. So magical in fact that I was oblivious to the fact that that I had just solved the problem of data mitigation. It was a 2 second mistake that turned out to be a stroke of genius (which I can’t take credit for, because I had no idea I was being genius at the time).
IBinaryTransport is just a super simple interface that gives you info about how much data is available, a reading method and a writing method, and finally the position of the read/write cursor. It’s stupendously simple, yet solves so many problems:
IBinaryTransport = Interface function DataOffset: integer; function DataGetSize: integer; function DataRead(const Offset: integer; const ByteCount: integer): TByteArray; procedure DataWrite(const Offset: integer; const Bytes: TByteArray); end;
What I did was to add this interface to both TStream, TAllocation and TBinaryData. I figured it would be handy when calling ToStream() and similar helper functions. But it turned out that with this interface in place – I could throw out TStreamReader, TStreamWriter, TBufferReader, TBufferwriter – and just write a single reader/writer class. TStreamReader actually doesnt have a clue what a stream is, because it speaks to the target via IBinaryTransport.
Same with TReader and TWriter for TBinaryData (memory buffer). They don’t know anything about memory at all, they just know how to communicate with the target through IBinaryTransport.
So how does this affect codecs? It means we can throw out the older model completely. Instead of having to define a whole list of input/output channels, add support for dealing with each of them — all we have to do is make the input and output channels IBinaryTransport (!)
This reduces the basic codec class to this:
TCustomCodec = class(TObject, ICodecBinding, ICodecProcess ) private FBindings: TCodecBindingList; FCodecInfo: TCodecInfo; protected /* IMPLEMENTS :: ICodecBinding */ procedure RegisterBinding(const Binding: TCodecBinding); procedure UnRegisterBinding(const Binding: TCodecBinding); protected /* IMPLEMENTS :: ICodecBinding */ procedure Encode(const Source: IBinaryTransport; const Target: IBinaryTransport); virtual; abstract; procedure Decode(const Source: IBinaryTransport; const Target: IBinaryTransport); virtual; abstract; protected function MakeCodecInfo: TCodecInfo; virtual; public property Info: TCodecInfo read FCodecInfo; constructor Create;virtual; destructor Destroy;Override; end;
And the work you have to do implementing a new codec to this:
TURLCodec = class(TCustomCodec) protected (* IMPLEMENTS :: ICodecProcess *) procedure Encode(const Source: IBinaryTransport; const Target: IBinaryTransport); override; procedure Decode(const Source: IBinaryTransport; const Target: IBinaryTransport); override; protected function MakeCodecInfo: TCodecInfo; override; end;
Using a codec
A codec should never be used directly, but via proxy. This gives the framework a chance to create instances on demand, and also makes sure you don’t compile in codecs you never use. A few basic codec classes will be registered when your application starts – but these have always been compiled with your Smart application so there is little or no difference with regards to size or speed.
The benefits however are great: all codecs register with a central manager, they also register their mime-type (the format they work on), making it possible for you to query the codec-manager if it supports a specific encoding/decoding mechanism.
And naturally, it gives us uniformity with regards to data processing.
Here is an example of using the codec-api to URL encode a string:
var LBinding: TCodecBinding; LList: TCodecList; LSource: TStream; LTarget: TStream; LReader: TReader; LWriter: TWriter; begin if CodecManager.QueryByName('url', LList) then begin LSource := TMemoryStream.Create; try LTarget := TMemoryStream.Create; try LWriter := TWriter.Create(LSource as IBinaryTransport); try LWriter.Options := [woEmulateCursor, woFlexibleBoundaries]; LWriter.write( TDataType.StringToBytes("This is a string ready for URL-encoding!") ); finally LWriter.free; end; LSource.position := 0; LBinding := TCodecBinding.Create(LList); try LBinding.Input := LSource as IBinaryTransport; LBinding.Output := LTarget as IBinaryTransport; LBinding.Encode(); finally LBinding.free; end; LTarget.position := 0; LReader := TReader.Create(LTarget as IBinaryTransport); try var LBytes := LReader.Read(LTarget.Size); writeln( TDataType.bytesToString(LBytes) ); finally LReader.free; end; finally LTarget.free; end; finally LSource.free; end; end;
The cool part here is that the data can be anything. The source can be a stream, a buffer, a memory allocation – even a socket (!) Same with the target. As long as the target object implements IBinaryTransport the codecs will do their bussiness regardless of medium.
URL, Base64, UTF8, UTF16 .. and encryption ciphers naturally .. all these processes can now be represented under a single, efficient and small framework. And yes, there will naturally be top-level functions for these, same as before. I don’t expect people to write all of this every time they need to URL encode a string.
The point here is depth, and that Smart Mobile Studio applications can now do more and more of what you would expect from a native compiled system.