Archive

Archive for April 3, 2016

Smart Pascal record arrays, brute force search

April 3, 2016 6 comments

Having introduced the “self-less” class to my readers in an earlier post (the buddha class as I call it), I started thinking about the practical sides of this. Some may just go “oh, so you have an object with no self, so what?”). Well its actually a big deal, especially when talking with servers and dealing with the myriad of non-standard data formats out there.

Array of record

Dictionaries are great because, behind the scenes, they are organized in such a way that the key you use to read or write a value, can be more or less instantaniously verified, looked up and accessed.

Dictionaries can be made in a number of ways, using various formulas for turning a string-key into a meaningful number (which in turn points to the data it represents).

Finding stuff in unstructured data is fun

Finding stuff in unstructured data is fun

But this article is not about fancy dictionaries, but rather something resembling an array of unstructured record. Where each item in the array can actually differ from the other both in content and layout  (!) Just think about it: under JavaScript libraries throw different structures around like candy, and there is no such thing as a fixed standard.

So let’s say you call some NodeJS webservice and it nukes you back with 100, 10000 or 1000000 records with completely different structure and content — that would pose quite a challenge using ordinary Delphi or FreePascal (well, not difficult but time consuming at least).

Let’s see just how fast we can push JavaScript to search a large mass of unstructured data. Note: I will be using code which is not yet in the RTL here, just so you dont freak out. The Ticks functionality will be introduced in the next Smart Update.

Right, so let’s start with the basics, lets give things some names:

type
TW3Associate  = variant;
TW3Associates = variant;

Variant under Smart Pascal is not the same as a Delphi, C++ or C# variant. It is just a way of saying “we dont know the type” of something. It essentially maps to JObject which is the most basic, empty “object” JavaScript can muster.

Next, Let’s create some data to play with:

procedure TForm1.W3Button2Click(Sender: TObject);
var
  Associates: Array of variant;
begin

  for var x:=1 to 1000000 do
  begin
    Associates.add(
      class
      cssname := "Item#" + IntToStr(x);
      typevalue := x;
      end
    );
  end;

end;

This gives us an array of one million JavaScript objects, each with two properties. Next step is to use some old school, brute-force searching. We are going to use loop-expansion, meaning that instead of a simple for/next loop dealing with a single item at a time – we are going to process 8 items per cycle.


function SearchAssociates(const Field:String;const Value:String;
  const Associates:TW3Associates):TW3Associate;
begin
  result := unassigned;

  (* Setup look expansion *)
  var index    := 0;
  var LLongs   := associates.length shr 3;
  var LSingles := associates.length mod 8;

  (* Search batch of 8 items *)
  while (LLongs > 0) do
  begin
    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    dec(LLongs);
  end;

  (* Search odd single items at the end *)
  while (LSingles > 0) do
  begin
    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);
    dec(LSingles);
  end;
end;

And then let’s see how fast it can chew through these “untyped” arrays. And please remember that from JavaScripts’s point of view, each of these structures are unique. It cannot, like Delphi, take for granted that the next object is equal to the previous.

The test code now looks like:

procedure TForm1.W3Button2Click(Sender: TObject);
var
  Associates: Array of variant;
begin

  for var x:=1 to 1000000 do
  begin
    Associates.add(
      class
      cssname := "Item#" + IntToStr(x);
      typevalue := x;
      end
    );
  end;

  writeln('Performing brute force search:');

  var StartTime: Integer;
  var StopTime: Integer;

  StartTime := TW3Dispatch.Ticks;
  var data := SearchAssociates("cssname", "Item#999976",Variant(Associates));
  StopTime := TW3Dispatch.Ticks;

  if (data) then
  begin
    writeln("Found object:");
    writeln(Data);
  end else
  writeln("Search failed!");

  writeln("Ticks spent:" + (StopTime-StartTime).toString );
end;

Results

I ran the test 10 times, but only the first five produced different values – the engine then settled on 22016.

  • 40960
  • 22016
  • 20992
  • 20992
  • 24960

Ticks is calculated as the number of ms since 01.01.1970. There are 621355968000000000 ticks from Ist Jan 1900
to Ist Jan 1970 (hence the const). And 10000 ticks represents a millisecond:

const
  CNT_TICKS_BASE = 621355968000000000;

class function TW3Dispatch.Ticks:Integer;
begin
  result := new JDate().getTime() * 1000 + CNT_TICKS_BASE;
end;

Conclusion?

Old school pascal beats both ForEach() and Filter(), methods introduced into JavaScript to deal with this kind of scenario. Goes to show, object pascal may be old (C++ is 3 years older actually), but it’s a damn fine language and makes NodeJS and web programming fun!