Building a Delphi Database engine, part two
In the first episode of this tutorial we looked at some of the fundamental ideas behind database storage. We solved the problem of storing arbitrary length data by dividing the database file into conceptual parts; we discovered how we could daisy-chain these parts together to form a sequence; and we looked at how this elegantly solves reclaiming lost space and recycling that for new records. Last but not least we had a peek at the bit-buffer that helps us keep track of what blocks are taken, so we can easily grow and shrink the database on demand.
In this article we will be getting our hands dirty and put our theories into practise. The goal today is to examine the class that deals with these blocks or parts; While we could maybe get better performance by putting everything into a monolithic class, but the topics are best kept separate while learning the ropes. So let’s favour OOP and class encapsulation for this one.
The DbLib framework
Prior to writing this tutorial I had to write the code you would need. It would be a terrible mistake to run a tutorial with just theories to show for it. Thankfully I have been coding for a number of years now, so I had most of the code in my archives. To make life easier for you I have unified the code into a little framework.
This doesn’t mean that you have to use the framework. The units I provide are there to give you something tangible to play with. I have left ample room for optimization and things that can be done differently on purpose.
I have set up a bitbucket git repository for this tutorial, so your first business of the day is to download or fork our repository:
https://bitbucket.org/cipher_diaz/dbproject/src/master/
The database file class
The first installment of this tutorial ended with a few words on the file header. This is the only static or fixed data segment in our file. And it must remain fixed because we need a safe space where we can store offsets to our most important sequences, like the root sequence.
The root sequence is simply put the data that describes the database, also known as metadata. So all those items I listed at the start of the previous article, things like table-definitions, index definitions, the actual binary table data (et al), well we have to keep track of these somewhere right?
Well, that’s where the file header comes in. The header is the keeper of all secrets and is imperative for the whole engine.
The record list
Up until this point we have covered blocks, sequences, the file header, the bit-buffer that keeps track of available and reserved blocks — but what about the actual records?
When someone performs a high-level insert operation, the binary data that makes up the record is written as a sequence; that should be crystal clear by now. But having a ton of sequences stored in a file is pretty useless without a directory or list that remembers them. If we have 10.000 records (sequences) in a file – then we must also keep track of 10.000 offsets right? Otherwise, how can we know where record number 10, 1500 or 9000 starts?
Conceptually, metadata is not actual data. Metadata is a description of data, like a table definition or index definition. The list that holds all the record offsets is real data; As such I don’t want to store it together with the metadata but keep it separate. The bit buffer that keeps track of block availability in the file is likewise “real” data, so I would like to keep that in a separate sequence too.
When we sit down and define our file-header record, which is a structure that is always at the beginning of the file (or stream), we end up with something like this:
- Unique file signature: longword
- Version minor, major, revision: longword (byte, byte, word)
- Database name: 256 bytes [holds utf8 encoded text]
- Encryption cipher: integer
- Compression id: integer
- root-sequence: longword
- record-list-sequence: longword
- bit-buffer-sequence: longword
If you are wondering about the encryption and compression fields, don’t overthink it. It’s just a place to store something that identifies whatever encryption or compression we have used. If time allows we will have a look at zlib and RC4, but even if we don’t it’s good to define these fields for future expansion.
The version longword is actually more important than you think. If the design of your database and header changes dramatically between versions, you want to check the version number to make absolutely sure you can even handle the file. I have placed this as the second field in the record, 4 bytes into the header, so that it can be read early. The moment you have more than one version of your engine you might want to write a routine that just reads the first 8 bytes of the file and check compatibility.
What are those buffers?
Having forked the framework, you suddenly have quite a few interesting units. But you can also feel a bit lost if you don’t know what the buffer classes do, so I want to start with those first.
The buffer classes are alternatives to streams. Streams are excellent but they can be quite slow if you are doing intense read-write operations. More importantly streams lack two fundamental features for DB work, namely insert and remove. For example, lets say you have a 100 megabyte file – and then you want to remove 1 megabyte from the middle of this file. It’s not a complex operation but you still need to copy the trailing data backwards as quick as possible before scaling the stream size. The same is true if you want to inject data into a large file. It’s not a huge operation, but it has to be 100% accurate and move data as fast as possible.
I could have just inherited from TStream, but I wanted to write classes that were faster, that had more options and that were easier to expand in the future. The result of those experiments were the TBuffer classes.
So mentally, just look at TDbBuffer, TDbBufferMemory and TDbBufferFile as streams on steroids. If you need to move data between a stream and a buffer, just create a TDbLibStreamAdapter instance and you can access the buffer like a normal TStream decendent.
Making a file of blocks
With enough theory behind us, let’s dig into the codebase and look at the class that deals with a file as blocks, or parts. Open up the unit dblib.partaccess.pas and you will find the following class:
TDbLibPartAccess = class(TObject) private FBuffer: THexBuffer; FheadSize: integer; FPartSize: integer; protected function GetPartCount: integer; inline; public property Buffer: THexBuffer read FBuffer; property ReservedHeaderSize: integer read FheadSize; property PartSize: integer read FPartSize; property PartCount: integer read GetPartCount; procedure ReadPart(const PartIndex: Integer; var aData); overload; procedure ReadPart(const PartIndex: Integer; const Data: THexBuffer); overload; procedure WritePart(const PartIndex: Integer; const Data; const DataLength: Integer); overload; procedure WritePart(Const PartIndex: Integer; const Data: THexBuffer); overload; procedure AppendPart(const Data; DataLength: Integer); overload; procedure AppendPart(const Data: THexBuffer); overload; function CalcPartsForData(const DataSize: Int64): integer; inline; function CalcOffsetForPart(const PartIndex: Integer): Int64; inline; constructor Create(const DataBuffer: THexBuffer; const ReservedHeaderSize: Integer; const PartSize: Integer); virtual; End;
As you can see this class is pretty straight forward. You pass a buffer (either memory or file) via the constructor together with the size of the file-header. This helps the class to avoid writing to the first section of the file by mistake. Whenever the method CalcOffsetForPart() is called, it will add the size of the header to the result, shielding the header from being over-written.
The other methods should be self-explanatory; you have various overloads for writing a sequence part (block), appending them to the database file, reading them – and all these methods are offset based; meaning you give it the part-number and it calculates where that part is physically located inside the file.
One important method is the CalcPartsForData() function. This is used before splitting a piece of data into a sequence. Let’s say you have 1 megabyte to data you want to store inside the database file, well then you first call this and it calculates how many blocks you need.
Once you know how many blocks you need, the next step is to check the bit-buffer (that we introduced last time) if the file has X number of free blocks. If the file is full, well then you either have to grow the file to fit the new data – or issue an error message.
See? It’s not that complex once you have something to build on!
Proof reading test, making sure what we write is what we read
With the scaffolding in place, let’s write a small test to make absolutely sure that the buffer and class logistics check out ok. We are just going to do this on a normal form (this is the main project in the bitbucket project folder), so you don’t have to type this code. Just fork the code from the URL mentioned at the top of this article and run it.
Our test is simple:
- Define our header and part records, doesn’t have to be accurate at this point
- Create a database file buffer (in memory) with size for header + 100 parts
- Create a TDblibPartAccess class, feed in the sizes as mentioned above
- Create a write buffer the same size as part/block record
- Fill that buffer with some content we can easily check
- Write the writebuffer content to all the parts in the file
- Create a read buffer
- Read back each part and compare content with the write buffer
If any data is written the wrong way or overlapping, what we read back will not match our write buffer. This is a very simple test to just make sure that we have IO fidelity.
Ok, lets write some code!
unit mainform; interface uses Winapi.Windows, Winapi.Messages, System.SysUtils, System.Variants, System.Classes, Vcl.Graphics, Vcl.Controls, Vcl.Forms, Vcl.Dialogs, Vcl.StdCtrls, dblib.common, dblib.buffer, dblib.partaccess, dblib.buffer.memory, dblib.buffer.disk, dblib.encoder, dblib.bitbuffer; const CNT_PAGESIZE = (1024 * 10); type TDbVersion = packed record bvMajor: byte; bvMinor: byte; bvRevision: word; end; TDbHeader = packed record dhSignature: longword; // Signature: $CAFEBABE dhVersion: TDbVersion; // Engine version info dhName: shortstring; // Name of database dhMetadata: longword; // Part# for metadata end; TDbPartData = packed record ddSignature: longword; ddRoot: longword; ddNext: longword; ddBytes: integer; ddData: packed array [0..CNT_PAGESIZE-1] of byte; end; TfrmMain = class(TForm) btnWriteReadTest: TButton; memoOut: TMemo; procedure btnWriteReadTestClick(Sender: TObject); private { Private declarations } FDbFile: TDbLibBufferMemory; FDbAccess: TDbLibPartAccess; public { Public declarations } constructor Create(AOwner: TComponent); override; destructor Destroy; override; end; var frmMain: TfrmMain; implementation {$R *.dfm} { TfrmMain } constructor TfrmMain.Create(AOwner: TComponent); begin inherited; // Create our database file, in memory FDbFile := TDbLibBufferMemory.Create(nil); // Reserve size for our header and 100 free blocks FDBFile.Size := SizeOf(TDbHeader) + ( SizeOf(TDbPartData) * 100 ); // Create our file-part access class, which access the file // as a "block" file. We pass in the size of the header + part FDbAccess := TDbLibPartAccess.Create(FDbFile, SizeOf(TDbHeader), SizeOf(TDbPartData)); end; destructor TfrmMain.Destroy; begin FDbAccess.Free; FDbFile.Free; inherited; end; procedure TfrmMain.btnWriteReadTestClick(Sender: TObject); var LWriteBuffer: TDbLibBufferMemory; LReadBuffer: TDbLibBufferMemory; LMask: ansistring; x: integer; begin memoOut.Lines.Clear(); LMask := 'YES!'; // create a temporary buffer LWriteBuffer := TDbLibBufferMemory.Create(nil); try // make it the same size as our file-part LWriteBuffer.Size := SizeOf(TDbPartData); // fill the buffer with our test-pattern LWriteBuffer.Fill(0, SizeOf(TDbPartData), LMask[1], length(LMask)); // Fill the dbfile by writing each part, using our // temporary buffer. This fills the file with our // little mask above for x := 0 to FDbAccess.PartCount-1 do begin FDbAccess.WritePart(x, LWriteBuffer); end; LReadBuffer := TDbLibBufferMemory.Create(nil); try for x := 0 to FDBAccess.PartCount-1 do begin FDbAccess.ReadPart(x, LReadBuffer); if LReadBuffer.ToString LWriteBuffer.ToString then memoOut.Lines.Add('Proof read part #' + x.ToString() + ' = failed') else memoOut.Lines.Add('Proof read part #' + x.ToString() + ' = success'); end; finally LReadBuffer.Free; end; finally LWriteBuffer.Free; end; end; end.
The form has a button and a memo on it, when you click the button we get the following result:
Finally things are starting to become more interesting! We still have a way to go before we can start pumping records into this thing, but at least we have tangible code to play with.
In our next installment we will implement the sequence class, which takes the TDbLibPartAccess class and augments it with functionality to read and write sequences. We will also include the bit-buffer from our first article and watch as the siluette of our database engine comes into view.
Again, this is not built for speed but for education.
Until next time.
Recent
The vatican vault
- September 2023
- August 2023
- March 2023
- February 2023
- December 2022
- October 2022
- January 2022
- October 2021
- March 2021
- November 2020
- September 2020
- July 2020
- June 2020
- April 2020
- March 2020
- February 2020
- January 2020
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- February 2013
- August 2012
- June 2012
- May 2012
- April 2012
You must be logged in to post a comment.