Home > Delphi, JavaScript, nodeJS, Object Pascal, OP4JS, Smart Mobile Studio > Smart Pascal record arrays, brute force search

Smart Pascal record arrays, brute force search

Having introduced the “self-less” class to my readers in an earlier post (the buddha class as I call it), I started thinking about the practical sides of this. Some may just go “oh, so you have an object with no self, so what?”). Well its actually a big deal, especially when talking with servers and dealing with the myriad of non-standard data formats out there.

Array of record

Dictionaries are great because, behind the scenes, they are organized in such a way that the key you use to read or write a value, can be more or less instantaniously verified, looked up and accessed.

Dictionaries can be made in a number of ways, using various formulas for turning a string-key into a meaningful number (which in turn points to the data it represents).

Finding stuff in unstructured data is fun

Finding stuff in unstructured data is fun

But this article is not about fancy dictionaries, but rather something resembling an array of unstructured record. Where each item in the array can actually differ from the other both in content and layout  (!) Just think about it: under JavaScript libraries throw different structures around like candy, and there is no such thing as a fixed standard.

So let’s say you call some NodeJS webservice and it nukes you back with 100, 10000 or 1000000 records with completely different structure and content — that would pose quite a challenge using ordinary Delphi or FreePascal (well, not difficult but time consuming at least).

Let’s see just how fast we can push JavaScript to search a large mass of unstructured data. Note: I will be using code which is not yet in the RTL here, just so you dont freak out. The Ticks functionality will be introduced in the next Smart Update.

Right, so let’s start with the basics, lets give things some names:

type
TW3Associate  = variant;
TW3Associates = variant;

Variant under Smart Pascal is not the same as a Delphi, C++ or C# variant. It is just a way of saying “we dont know the type” of something. It essentially maps to JObject which is the most basic, empty “object” JavaScript can muster.

Next, Let’s create some data to play with:

procedure TForm1.W3Button2Click(Sender: TObject);
var
  Associates: Array of variant;
begin

  for var x:=1 to 1000000 do
  begin
    Associates.add(
      class
      cssname := "Item#" + IntToStr(x);
      typevalue := x;
      end
    );
  end;

end;

This gives us an array of one million JavaScript objects, each with two properties. Next step is to use some old school, brute-force searching. We are going to use loop-expansion, meaning that instead of a simple for/next loop dealing with a single item at a time – we are going to process 8 items per cycle.


function SearchAssociates(const Field:String;const Value:String;
  const Associates:TW3Associates):TW3Associate;
begin
  result := unassigned;

  (* Setup look expansion *)
  var index    := 0;
  var LLongs   := associates.length shr 3;
  var LSingles := associates.length mod 8;

  (* Search batch of 8 items *)
  while (LLongs > 0) do
  begin
    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);

    dec(LLongs);
  end;

  (* Search odd single items at the end *)
  while (LSingles > 0) do
  begin
    if Associates[index][Field] = Value then
    begin result := Associates[index]; exit; end;
    inc(index);
    dec(LSingles);
  end;
end;

And then let’s see how fast it can chew through these “untyped” arrays. And please remember that from JavaScripts’s point of view, each of these structures are unique. It cannot, like Delphi, take for granted that the next object is equal to the previous.

The test code now looks like:

procedure TForm1.W3Button2Click(Sender: TObject);
var
  Associates: Array of variant;
begin

  for var x:=1 to 1000000 do
  begin
    Associates.add(
      class
      cssname := "Item#" + IntToStr(x);
      typevalue := x;
      end
    );
  end;

  writeln('Performing brute force search:');

  var StartTime: Integer;
  var StopTime: Integer;

  StartTime := TW3Dispatch.Ticks;
  var data := SearchAssociates("cssname", "Item#999976",Variant(Associates));
  StopTime := TW3Dispatch.Ticks;

  if (data) then
  begin
    writeln("Found object:");
    writeln(Data);
  end else
  writeln("Search failed!");

  writeln("Ticks spent:" + (StopTime-StartTime).toString );
end;

Results

I ran the test 10 times, but only the first five produced different values – the engine then settled on 22016.

  • 40960
  • 22016
  • 20992
  • 20992
  • 24960

Ticks is calculated as the number of ms since 01.01.1970. There are 621355968000000000 ticks from Ist Jan 1900
to Ist Jan 1970 (hence the const). And 10000 ticks represents a millisecond:

const
  CNT_TICKS_BASE = 621355968000000000;

class function TW3Dispatch.Ticks:Integer;
begin
  result := new JDate().getTime() * 1000 + CNT_TICKS_BASE;
end;

Conclusion?

Old school pascal beats both ForEach() and Filter(), methods introduced into JavaScript to deal with this kind of scenario. Goes to show, object pascal may be old (C++ is 3 years older actually), but it’s a damn fine language and makes NodeJS and web programming fun!

Advertisements
  1. April 3, 2016 at 8:28 pm

    Try to do the same with an array of record. And you would see that JavaScript would reuse the same object prototype (thanks to how SMS compiler initialize its records). My guess is that it would use less memory and be faster than your code. And IMHO not need to unroll the loop in this case.
    See http://www.html5rocks.com/en/tutorials/speed/v8/ and https://www.youtube.com/watch?v=UJPdhx5zTaw
    You should ask Eric on such matter.

    • Jon Lennart Aasenden
      April 3, 2016 at 9:04 pm

      But the point was non-fixed (unknown) types, not fixed records — that would naturally be a completely different thing.
      And Eric and myself debate these things all the time, we are on the same team as you know. Records were added for delphi compatability, but they are compiler managed. Meaning that their internal field names will be altered. Anonymous classes will not have their field names changed.
      So there is a reason for this post beyond just array of record 🙂

    • Jon Lennart Aasenden
      April 4, 2016 at 1:21 am

      First of all, and you can ask eric if you feel like it, “records” were added just to help pascal code get ported. If you do a serialization on them, you will find that the names are changed by the compiler.
      Anonymous classes does not suffer that change. So ghost classes are better both for trans-language communication and being more flexible.
      Second, yes, there are always more ways to skin a car — so i would love to see your version of it, and i might learn something was well.
      Im not stating facts carved in stone, just showing that object pascal is a good web language 🙂
      But please, expand on the idea and show me some new stuff – i love learning new techniques so im all ears 🙂

  2. April 3, 2016 at 8:34 pm

    Hi, there’s potencial issue when we use static array with an anonymous class (the buddha class), for instance:

    var preloaded = class
    values : array [0..4] of array [0..1] of Integer = [
    [0, 100],
    [1000, 200],
    [2000, 300],
    [3000, 400],
    [4000, 500]
    ];
    end;

    WriteLn(JSON.stringify(preloaded));

    {“values”:[[4000,500],[1000,200],[2000,300],[3000,400],[4000,500]]}

    the correct should be…

    var
    valor: array [0..4] of array [0..1] of Integer = [
    [0, 100],
    [1000, 200],
    [2000, 300],
    [3000, 400],
    [4000, 500]
    ];

    begin
    var preloaded2 = class
    values := valor;
    end;

    WriteLn(JSON.stringify(preloaded2));
    {“values”:[[0,100],[1000,200],[2000,300],[3000,400],[4000,500]]}

    • Jon Lennart Aasenden
      April 4, 2016 at 1:21 am

      Hm. post a bugreport on that, that is an important find!

  3. April 3, 2016 at 10:44 pm

    In fact, a modern js engine would reuse the same object slots even for your buddha class code, just like records. You create the same fields so the same object “type” would be reused by the jit. See my links.

  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: