Building a Delphi Database engine, part three
If you missed the initial two episodes in this tutorial, click here for the first article and here for the second. Even if you feel databases is something you master I urge you to read the articles from scratch, because the database theory we explore here is not the high-level concepts we learnt at school. Having said that it’s not too complex and with a bit of exploring you should find it easy to follow.
Update your git fork
In the previous episode you were asked to fork the project code from bitbucket. As we progress I will continue to add units and refactor code, so if you forked the code last time, you should fetch the latest revision before you continue.
Since GIT is incremental and awesome you can just revert to previous commits for the examples as we move along. I will mark each commit so you can see what is what and easily jump between examples.
Quick recap from our previous episode
Our last episode ended with a fidelity test on the storage medium. We implemented a class that would operate on blocks (or parts) of a defined size, rather than single bytes like a TStream does. The idea being that the class can create, read and write the blocks that contains the data; also known as a database pagesize.
I also introduced the notion of buffer classes, which are similar to Delphi streams. The buffer classes are actually Delphi implementations of the buffers found in node.js, which adds two very important benefits: the ability to inject and remove data from any point. So if you want to inject 1 gigabyte into the middle of a file, that can be done in a single call. If you want to remove 1 gigabyte from somewhere in the file, that is likewise reduced to a single call.
Buffers are designed to give you easier access to the data inside the buffer, while streams are great for normal programming tasks, buffers are much faster and ad-hoc, and thus better suited for database work.
From blocks to sequences
In this article’s code the focus is on sequences and putting together the essentials for our database class. Which means the ability to store sequences and being able to recover them (record persistency).
A sequence is, like I explained in our previous episode, file-parts that are linked together through offsets.
Whenever you want to store data that exceeds the page-size defined for a database, the only solution is to divide that data into parts and then spread the data over many blocks. We daisy-chain them together by setting the block number for the next block in the sequence in each of the parts (a block or part is ultimately just a record we write to the file), terminated by the value -1 ($FFFFFFFF as an unsigned longword). So the last block in the chain has the “next” field set to -1, which tells the system that there is no more data to fetch for the sequence.
If you are unsure about what I mean, please read up on sequences in the previous two articles.
Sequence persistency
What we are doing in this episode is to merge all the things we have talked about so far into a single class. The bit buffer is done, the block-file class is done, and now we need to make sure we can read and write full sequences.
This brings us to the problem of scaling. An empty database will not have any free blocks. So a part of the sequence reading and writing mechanism is the ability to expand the file on demand.
We also need to have some support functionality for mapping a sequence, otherwise we would have to store an awful lot of information for each record. We only really need to know the first block number in a sequence (because that will have a reference to any subsequent blocks used by a sequence). With mapping I mean a function that can investigate a sequence and return an array of block numbers, so that we know where each part of the sequence has ended up.
This might not be immediately useful, but when we reach the point of compacting a database, it becomes vital that we can quickly map what blocks a sequence occupy – and then organize them in a linear fashion so that the data can be read faster, and we can release excess empty space.
Changes to the header and part records
Suddenly our metadata needs have grown considerably. We need to store the bit-buffer that holds the map of the whole database (all the blocks are either used [1] or available [0]). We need to store a list of longwords that represents the records in a database. Please note that on this level, the database doesn’t care what record belongs to what table. And last but not least we need a file-header and block record that can keep track of the initial values.
Let’s have a look at the records first:
// Custom data structures TDbVersion = packed record bvMajor: byte; bvMinor: byte; bvRevision: word; end; TDbLibFileHeader = packed record dhSignature: longword; dhVersion: TDbVersion; dhName: shortstring; dhMetadata: longword; dhRecList: longword; dhBitBuffer: longword; class function Create: TDbLibFileHeader; static; end; TDbPartData = packed record ddSignature: longword; ddRoot: longword; ddPrevious: longword; ddNext: longword; ddBytes: integer; ddData: packed array [0 .. CNT_DATABASEFILE_PAGESIZE - 1] of byte; class function Create: TDbPartData; static; end;
Notice that the part-data (block) has fields to keep track of the root (the first part in the sequence), the previous part in the sequence amd the next part in the sequence. This gives us a lot of freedom when we want to do some tooling for the engine later.
If you look at the file-header this has also been expanded. It has a field to hold the first part in the meta-data sequence, the record list (the total list of sequences stored in the file) and the bit buffer.
Metadata
There are quite a few classes we need to introduce, so let’s start at the top, namely the meta-data classes. In our case the metadata is (first and foremost):
- Tables
- Fields
- Database name
The reason I have isolated each of these as classes, is to make it easier to expand the system later. So let’s have a look at the classes:
IExtendedPersistence = interface ['{282CC310-CD3B-47BF-8EB0-017C1EDF0BFC}'] procedure ObjectFrom(const Reader: TDbLibReader); procedure ObjectFromStream(const Stream: TStream; const Disposable: boolean); procedure ObjectFromData(const Data: TDbLibBuffer; const Disposable: boolean); procedure ObjectFromFile(const Filename: string); procedure ObjectTo(const Writer: TDbLibWriter); function ObjectToStream: TStream; overload; procedure ObjectToStream(const Stream: TStream); overload; function ObjectToData: TDbLibBuffer; overload; procedure ObjectToData(const Data: TDbLibBuffer); overload; procedure ObjectToFile(const Filename: string); end; IDbLibFields = interface ['{0D6A9FE2-24D2-42AE-A343-E65F18409FA2}'] function IndexOf(FieldName: string): integer; function ObjectOf(FieldName: string): TDbLibRecordField; function Add(const FieldName: string; const FieldClass: TDbLibRecordFieldClass): TDbLibRecordField; function Addinteger(const FieldName: string): TDbLibFieldInteger; function AddStr(const FieldName: string): TDbLibFieldString; function Addbyte(const FieldName: string): TDbLibFieldbyte; function AddBool(const FieldName: string): TDbLibFieldboolean; function AddCurrency(const FieldName: string): TDbLibFieldCurrency; function AddData(const FieldName: string): TDbLibFieldData; function AddDateTime(const FieldName: string): TDbLibFieldDateTime; function AddDouble(const FieldName: string): TDbLibFieldDouble; function AddGUID(const FieldName: string): TDbLibFieldGUID; function AddInt64(const FieldName: string): TDbLibFieldInt64; function AddLong(const FieldName: string): TDbLibFieldLong; end; TDbLibPersistent = class(TDbLibObject, IExtendedPersistence) private FObjId: Longword; FUpdCount: integer; strict protected // Implements:: IExtendedPersistence procedure ObjectTo(const Writer: TDbLibWriter); procedure ObjectFrom(const Reader: TDbLibReader); procedure ObjectFromStream(const Stream: TStream; const Disposable: boolean); function ObjectToStream: TStream; overload; procedure ObjectToStream(const Stream: TStream); overload; procedure ObjectFromData(const Binary: TDbLibBuffer; const Disposable: boolean); function ObjectToData: TDbLibBuffer; overload; procedure ObjectToData(const Binary: TDbLibBuffer); overload; procedure ObjectFromFile(const Filename: string); procedure ObjectToFile(const Filename: string); protected procedure BeforeUpdate; virtual; procedure AfterUpdate; virtual; strict protected // Persistency Read/Write methods procedure BeforeReadObject; virtual; procedure AfterReadObject; virtual; procedure BeforeWriteObject; virtual; procedure AfterWriteObject; virtual; procedure WriteObject(const Writer: TDbLibWriter); virtual; procedure ReadObject(const Reader: TDbLibReader); virtual; strict protected // Standard persistence function ObjectHasData: boolean; virtual; procedure ReadObjBin(Stream: TStream); virtual; procedure WriteObjBin(Stream: TStream); virtual; procedure DefineProperties(Filer: TFiler); override; public property UpdateCount: integer read FUpdCount; procedure Assign(Source: TPersistent); override; function ObjectIdentifier: longword; function BeginUpdate: boolean; procedure EndUpdate; class function classIdentifier: longword; constructor Create; override; end; [ComponentPlatformsAttribute(CNT_ALL_PLATFORMS)] TDbLibRecordField = class(TDbLibBufferMemory) private FName: string; FNameHash: Int64; FOnRead: TNotifyEvent; FOnWrite: TNotifyEvent; FOnRelease: TNotifyEvent; procedure SetRecordName(NewName: string); protected function GetDisplayName: string; virtual; procedure BeforeReadObject; override; procedure ReadObject(Reader:TReader); override; procedure WriteObject(Writer:TWriter); override; procedure DoReleaseData; override; protected procedure SignalWrite; procedure SignalRead; procedure SignalRelease; public function AsString: string; virtual;abstract; property DisplayName: string read GetDisplayName; property FieldSignature:Int64 read FNameHash; published property OnValueRead: TNotifyEvent read FOnRead write FOnRead; property OnValueWrite: TNotifyEvent read FOnWrite write FOnWrite; property OnValueRelease: TNotifyEvent read FOnRelease write FOnRelease; property FieldName: string read FName write SetRecordName; end; TDbLibFieldboolean = class(TDbLibRecordField) private function GetValue: boolean; procedure SetValue(const NewValue: boolean); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value: boolean read GetValue write SetValue; end; TDbLibFieldbyte = class(TDbLibRecordField) private function GetValue:byte; procedure SetValue(const NewValue:byte); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:byte read GetValue write SetValue; end; TDbLibFieldCurrency = class(TDbLibRecordField) private function GetValue:Currency; procedure SetValue(const NewValue:Currency); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:Currency read GetValue write SetValue; end; TDbLibFieldData = class(TDbLibRecordField) protected function GetDisplayName: string; override; public function AsString: string; override; end; TDbLibFieldDateTime = class(TDbLibRecordField) private function GetValue:TDateTime; procedure SetValue(const NewValue:TDateTime); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:TDateTime read GetValue write SetValue; end; TDbLibFieldDouble = class(TDbLibRecordField) private function GetValue:Double; procedure SetValue(const NewValue:Double); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:Double read GetValue write SetValue; end; TDbLibFieldGUID = class(TDbLibRecordField) private function GetValue:TGUID; procedure SetValue(const NewValue:TGUID); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:TGUID read GetValue write SetValue; end; TDbLibFieldInteger = class(TDbLibRecordField) private function GetValue: integer; procedure SetValue(const NewValue:integer); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:integer read GetValue write SetValue; end; TDbLibFieldInt64 = class(TDbLibRecordField) private function GetValue:Int64; procedure SetValue(const NewValue:Int64); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:Int64 read GetValue write SetValue; end; TDbLibFieldString = class(TDbLibRecordField) private FLength: integer; FExplicit: boolean; function GetValue: string; procedure SetValue(NewValue: string); procedure SetFieldLength(Value:integer); protected function GetDisplayName: string; override; public function AsString: string; override; constructor Create(AOwner: TComponent); override; published property Value: string read GetValue write SetValue; property Length: integer read FLength write SetFieldLength; property Explicit: boolean read FExplicit write FExplicit; end; TDbLibFieldLong = class(TDbLibRecordField) private function GetValue:Longword; procedure SetValue(const NewValue:Longword); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value: Longword read GetValue write SetValue; end; TDbLibCustomRecord = class(TComponent, IDbLibFields, IStreamPersist) strict private FObjects: TObjectList; strict protected function GetItem(const Index:integer): TDbLibRecordField; procedure SetItem(const Index: integer; const Value: TDbLibRecordField); function GetField(const FieldName: string): TDbLibRecordField; procedure SetField(const FieldName: string; const Value: TDbLibRecordField); function GetCount: integer; property Fields[const FieldName: string]: TDbLibRecordField read GetField write SetField; property Items[const Index: integer]: TDbLibRecordField read GetItem write SetItem; property Count: integer read GetCount; public function Add(const FieldName: string; const Fieldclass: TDbLibRecordFieldclass): TDbLibRecordField; function Addinteger(const FieldName: string): TDbLibFieldInteger; function AddStr(const FieldName: string): TDbLibFieldString; function Addbyte(const FieldName: string): TDbLibFieldbyte; function AddBool(const FieldName: string): TDbLibFieldboolean; function AddCurrency(const FieldName: string): TDbLibFieldCurrency; function AddData(const FieldName: string): TDbLibFieldData; function AddDateTime(const FieldName: string): TDbLibFieldDateTime; function AddDouble(const FieldName: string): TDbLibFieldDouble; function AddGUID(const FieldName: string): TDbLibFieldGUID; function AddInt64(const FieldName: string): TDbLibFieldInt64; function AddLong(const FieldName: string): TDbLibFieldLong; procedure WriteInt(const FieldName: string;const Value: integer); procedure WriteStr(const FieldName: string;const Value: string); procedure Writebyte(const FieldName: string;const Value: byte); procedure WriteBool(const FieldName: string;const Value: boolean); procedure WriteCurrency(const FieldName: string;const Value: currency); procedure WriteData(const FieldName: string;const Value: TStream); procedure WriteDateTime(const FieldName: string;const Value: TDateTime); procedure WriteDouble(const FieldName: string;const Value: double); procedure WriteGUID(const FieldName: string;const Value: TGUID); procedure WriteInt64(const FieldName: string; const Value: int64); procedure WriteLong(const FieldName: string; const Value: longword); procedure Clear; virtual; procedure Assign(source: TPersistent); override; function ToStream: TStream; virtual; function ToBuffer: TDbLibBuffer; virtual; procedure FromStream(const Stream: TStream; const Disposable: boolean = true); // Implements:: IStreamPersist procedure LoadFromStream(Stream: TStream); virtual; procedure SaveToStream(Stream: TStream); virtual; function IndexOf(FieldName: string): integer; function ObjectOf(FieldName: string): TDbLibRecordField; constructor Create(AOwner: TComponent); override; destructor Destroy; override; end; [ComponentPlatformsAttribute(CNT_ALL_PLATFORMS)] TDbLibRecord = class(TDbLibCustomRecord) public property Fields; property Items; property Count; end; TDbLibTable = class(TDbLibPersistent) strict private FName: string; FParent: TDbLibMetaData; FPrototype: TDbLibRecord; FFieldDefs: IDbLibFields; strict protected procedure BeforeReadObject; override; procedure WriteObject(const Writer: TDbLibWriter); override; procedure ReadObject(const Reader: TDbLibReader); override; public constructor Create(const MetaData: TDbLibMetaData); reintroduce; virtual; destructor Destroy; override; published property Parent: TDbLibMetaData read FParent; property FieldDefs: IDbLibFields read FFieldDefs; property TableName: string read FName write FName; end; TDbLibMetaData = class(TDbLibPersistent, IDbLibTables) strict private [Weak] FParent: TDbLibDatabase; FDbName: string; FTables: TObjectList; function GetTable(index: integer): TDbLibTable; function GetTableCount: integer; function AddTable(TableName: string): TDbLibTable; strict protected procedure SetDatabaseName(NewName: string); procedure BeforeReadObject; override; procedure WriteObject(const Writer: TDbLibWriter); override; procedure ReadObject(const Reader: TDbLibReader); override; public property Database: TDbLibDatabase read FParent; property DatabaseName: string read FDbName write SetDatabaseName; property Tables[index: integer]: TDbLibTable read GetTable; default; property TableCount: integer read GetTableCount; function GetTableByName(TableName: string; var Table: TDbLibTable): boolean; constructor Create(const Database: TDbLibDatabase) ; reintroduce; virtual; destructor Destroy; override; end;
Some might not like that I made some alterations to the standard Delphi persistency system. But if you look closely at the class TDbLibPersistent you will see that it makes it easier to handle hosted data. You have methods like BeforeRead() and BeforeWrite() that makes house-keeping a lot easier.
With the metadata out of the way, lets have a peek at the actual database class. Or should I say, the sequence persistency layer, because at the moment it has the capacity to save sequences, remember them when you re-open the file, keep track of the database name and defined tables – but we have not yet turned the metadata into physical data.
TDbLibRecordList = class(TDbLibPersistent) strict private FDbRecList: TList; function GetCount: integer; inline; function GetItem(const index: integer): longword; inline; procedure SetItem(const index: integer; const Value: longword); inline; strict protected procedure BeforeReadObject; override; procedure WriteObject(const Writer: TDbLibWriter); override; procedure ReadObject(const Reader: TDbLibReader); override; public property RecordList: TList read FDbRecList; property Items[const index: integer]: longword read GetItem write SetItem; default; property Count: integer read GetCount; constructor Create; override; destructor Destroy; override; end; TDbLibDatabase = class(TComponent) strict private FHeader: TDbLibFileHeader; FBitbuffer: TDBLibBitBuffer; FDbFile: TDbLibBuffer; FDbFileAccess: TDbLibPartAccess; FDbRecList: TDbLibRecordList; FMetaData: TDbLibMetaData; FTableAccess: IDbLibTables; FActive: boolean; FFilename: string; FAccessMode: TDbLibAccessMode; function GetActive: boolean; procedure SetActive(const NewActive: boolean); function GetFileName: string; procedure SetFileName(NewFileName: string); function GetAccessMode: TDbLibAccessMode; procedure SetAccessMode(const NewMode: TDbLibAccessMode); function GetDbName: string; procedure SetDbName(NewDbName: string); strict protected procedure ResetHeader; procedure ReadHeader; procedure WriteHeader; public property MetaData: TDbLibMetaData read FMetaData; property Tables: IDbLibTables read FTableAccess; procedure Open; procedure Close; procedure GrowBy(Parts: longword); procedure ShrinkBy(Parts: longword); function GetRecordCount: integer; inline; function GetRecordSequence(const RecNo: integer): TDbLibSequence; inline; function GetSequenceBits(NumberOfBits: longword; var Bits: TDbLibSequence): boolean; procedure WriteSequence(const Sequence: TDbLibSequence; const Data: TStream); inline; function WriteDataAsSequence(const Data: TStream; var Sequence: TDbLibSequence): boolean; procedure MapSequence(StartPageIndex: longword; var Sequence: TDbLibSequence); overload; inline; function MapSequence(const StartPageIndex: longword): TDbLibSequence; overload; inline; function ReadDataFromSequence(SeqStart: longword): TStream; inline; procedure BeforeDestruction; override; constructor Create(AOwner: TComponent); override; destructor Destroy; override; public property Active: boolean read GetActive write SetActive; property DatabaseName: string read GetDbName write SetDbName; property Filename: string read GetFileName write SetFileName; property AccessMode: TDbLibAccessMode read GetAccessMode write SetAccessMode; end;
Our previous fidelity test was somewhat meager. This time we give the system a bit more to work with. So our initialization code now looks more akin to proper database work:
var LTable: TDbLibTable; begin memoOut.Lines.Clear(); memoOut.Lines.BeginUpdate(); screen.Cursor := crHourGlass; try try FDatabase.Filename := 'c:\temp\mydb.db'; //'::memory::'; FDatabase.AccessMode := TDbLibAccessMode.amCreate; FDatabase.DatabaseName := 'temp'; LTable := FDatabase.Tables.AddTable('main'); LTable.FieldDefs.Addinteger('id'); LTable.FieldDefs.AddStr('name'); LTable.FieldDefs.Add('data', TDbLibFieldData); FDatabase.Open(); finally if FDatabase.Active then FDatabase.Close(); end; finally memoOut.lines.EndUpdate(); screen.Cursor := crDefault; end;
Fork the code luke
The latest code has been uploaded to the BitBucket repository (same as last), so simply update the fork you did earlier. Make sure you set plenty of breakpoints and also download a hex-editor and examine the resulting databasefile.
Alright! I think that is enough for the third episode. In our next episode we have more than enough to explain and explore, but now the fundamental parts are in place!
Until next time!
Leave a Reply Cancel reply
Recent
The vatican vault
- January 2022
- October 2021
- March 2021
- November 2020
- September 2020
- July 2020
- June 2020
- April 2020
- March 2020
- February 2020
- January 2020
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- February 2013
- August 2012
- June 2012
- May 2012
- April 2012
Great tutorial.
They are very insightful.
Thanks for your sincere efforts.
Just needed one advise. Would it be advisable to modernize TurboPower’s B-Tree Filer (or FlashFiler for client server) ?
I personally think it a full featured RAW power database engine.
The old BTree is pretty cool and it was blazingly fast if memory serves me right.
I for once would love to see it updated and given a nice polish. There is a lot of code that might be old, but that in truth is very effective.
I tend to roam around looking at older units, and you learn something every single time.
Old does not mean bad, that is a fetish that has been created by marketing. The pyramids looks pretty solid 🙂
yogiyang, modernizing FlashFiler would be an exercise in wasted time, because it has already been done. NexusDB is the direct descendant of FF, and you would need to do the equivalent of 15 years of development just to get even. 🙂
Eivind,
May be you are right as far as modernizing and using FF2 in production environment is concerned, but when one wants to learn about DB internals it would proved to be a treasure trove.!!!
And NexusDB is not open source so one cannot learn from its 15 years experience.