Archive

Archive for November, 2019

NVidia Jetson Nano: Part 2

November 27, 2019 Leave a comment

Last week I posted a review of the NVidia Jetson Nano IoT board, where I gave the board a score of 6 out of 10. This score was based on the fact that the CPU in particular was a joke compared to other, far better and more affordable boards on the market.

D24mBT_UcAA1HnY

Ubuntu running on the NVidia Jetson Nano

Since my primary segment for IoT boards involves web technology, WebAssembly and Asm.js (HTML5) in particular, the cpu did not deliver the performance I expected from an NVidia board. But as stated in my article, this board was never designed to be a multi-purpose maker board. This is a board for training and implementing A.I models (machine learning), which is why it ships with an astonishing 128 CUDA cores in it’s GPU.

However, two days after I posted said review, NVidia issued several updates. The graphics drivers were updated, followed by a custom Chrome browser build. Naturally I was eager to see if this would affect my initial experience, and to say it did would be an understatement.

Driver update

After this update the Jetson was able to render the HTML5 based desktop system I use for testing, and it passed more or less all my test-cases with flying colors. In fact, I was able to run two tests simultaneously. It renders Quake 3 (uses WebGl) in full-screen at 45 fps, with Tyrian (uses the gpu’s 2d functions) running at a whopping 35 fps (!).

Obviously that payload is significant, and all 4 CPU cores were taxed at 70% when running two such demanding programs inside the same viewport. Personally I’m not much of a gamer, but I find that testing a SoC by its ability to process HTML5 and JS, especially code that taps into the underlying GPU for both 3d and 2d – gives a good picture of the SoC’s capabilities.

I still think the CPU was a poor choice by NVidia. The production costs of adding an A72 CPU is practically non-existent. The A72 would add 50 cents (USD 0.5) to the boards retail value, but in return it would give us a much higher clock speed and beefier CPU cache. The A72 is in practical terms twice as fast as the A57.

A better score

But, fair is fair, and I am changing the review score to 7 out of 10, which is the best score I have ever given to any IoT device.

The only board that has a chance of topping that, is the ODroid N2, when (if ever) Hardkernel bothers to implement X drivers for the Mali GPU.

NVidia Jetson Nano

November 21, 2019 Leave a comment

The sheer volume of technology you can pick up for $100 in 2019 is simply amazing. Back in 2012 when the IoT revolution began, people were amazed at how the tiny Raspberry PI SoC was able to run a modern browser and play quake. Fast forward to 2019 and that power has been multiplied a hundred fold.

Of all the single board computers available today, the NVidia Jetson Nano exists in a whole different sphere compared to the rest. It ships with 4 Gb of ram, a mid-range 64-bit quad processor – and a whopping 128 GPU cores (!).

jetson

It does cost a little more than the average board, but not much. The latest Raspberry PI v4 sells for $79, just $20 less than the Jetson. But the Video-Core IV that powers the Raspberry PI v4 was not designed for tasks that Jetson can take on. Now don’t get me wrong, the Raspberry PI v4 is roughly the same SoC (the CPU on the PI is one revision higher), and the Video-Core GPU is a formidable graphics processor. But if you want to work with artificial intelligence, you need proper Cuda support; something the Video-Core can’t deliver.

The specs

Let’s have a look at the specs before we continue. To give you some idea of the power here, this is roughly the same specs as the Nintendo Switch. In fact, if we go up one model to the NVidia TX 1 – that is exactly what the Nintendo Switch uses. So if you are expecting something to replace your Intel i5 or i7 desktop computer, you might want to look elsewhere. But as an IoT device for artificial intelligence work, it does have a lot to offer:

  • 128-core Maxwell GPU
  • Quad-core Arm A57 processor @ 1.43 GHz
  • System Memory – 4GB 64-bit LPDDR4 @ 25.6 GB/s
  • Storage – microSD card slot (devkit) or 16GB eMMC flash (production)
  • Video Encode – 4K @ 30 | 4x 1080p @ 30 | 9x 720p @ 30 (H.264/H.265)
  • Video Decode – 4K @ 60 | 2x 4K @ 30 | 8x 1080p @ 30 | 18x 720p @ 30 (H.264/H.265)
  • Video Output – HDMI 2.0 and eDP 1.4 (video only)
  • Gigabit Ethernet (RJ45) + 4-pin PoE header
  • USB – 4x USB 3.0 ports, 1x USB 2.0 Micro-B port for power or device mode
  • M.2 Key E socket (PCIe x1, USB 2.0, UART, I2S, and I2C)
  • 40-pin expansion header with GPIO, I2C, I2S, SPI, UART signals
  • 8-pin button header with system power, reset, and force recovery related signals
  • Misc – Power LED, 4-pin fan header
  • Power Supply – 5V/4A via power barrel or 5V/2A via micro USB port

ARM for the future

Most of the single board computers (SBC’s) on the market today, sell somewhere around the $70 mark. As a consequence their processing power and graphical capacity is severely limited compared to traditional desktop computers. IoT boards are closely linked to the mobile phone industry, typically behind the latest SoC by a couple of years; Which means you can expect to see ARM IoT boards that floors an Intel i5 sometime in 2022 (perhaps sooner).

Top of the line mobile phones, like the Samsung Galaxy Note 10, are presently shipping with the Snapdragon 830 SoC. The Snapdragon 830 delivers the same processing power as an Intel i5 CPU (actually slightly faster). It is amazing to see how fast ARM is catching up with Intel and AMD. Powerful technology is expensive, which is why IoT is brilliant; by using the same SoC’s that the mobile industry has perfected, they are able to tap into the price drop that comes with mass production. IoT is quite frankly piggybacking on the the every growing demand for faster mobile devices; which is a great thing for us consumers.

Intel and AMD is not resting on their laurels either, and Intel in particular has lowered their prices considerably to meet the onslaught of cheap ARM boards hitting the market. But despite their best efforts – there is no doubt where the market is heading; ARM is the future – both for mobile, desktop and server.

The Nvidia Jetson Nano

Out of all the boards available on the market for less than US $100, and that is a long list – one board deserves special attention: the NVidia Jetson Nano developer board. This was released back in march of this year. I wanted to get this board as it hit the stores – but since my ODroid XU4 and N2 have performed so well, there simply was no need for it; until recently.

First of all, the Jetson SBC is not a cheap $35 board. The GPU on this board are above and beyond anything available for competing single board computers. Most IoT boards are fitted with a relatively modest GPU (graphical processing unit), typically with 1, 2, 4 or 8 cores. The NVidia Jetson Nano though, ships with 128 GPU cores (!); making it by far the best option for creating, training and deploying artificial intelligence models.

Hello_AI_World

The Jetson boards were designed for one thing, A.I. It’s not really a general purpose SBC

I mean, the IoT boards to date have all suffered from the same tradeoffs, where the makers must try to strike a balance between CPU, GPU and RAM. This doesn’t always work out like the architects have planned. Single board computers like like the Asus Tinkerboard or NanoPI Fire 3 are examples of this; The Tinkerboard is probably the fastest 32-bit ARM processor I have ever experienced, but the way the different parts have been integrated, coupled with poorly planned power distribution (as in electrical power) resulted in terrible stability problems. The NanoPI Fire 3 is also a paper-tiger that, for the best of intentions, is useless for anything but Android.

The Jetson Nano is more expensive, but NVidia has managed to balance each part in a way that works well; at least for artificial intelligence work. I must underline that the CPU was a big disappointment for me personally. I misread the specs before I ordered and expected it to be fitted with a quad core A72 cpu @ 2 GHz. Instead it turned out to use the A57 cpu running @ 1.43 GHz. This might sound unimportant, but the A72 is almost twice as fast (and costing only 50 cents more). Why NVidia decided on that model of CPU stands as a completely mystery.

In my view, this sends a pretty clear message that NVidia has no interest in competing with other mainstream boards, but instead aim solely at the artificial intelligence market.

Artificial intelligence

If working with A.I is something that you find interesting, then you are in luck. This is the primary purpose of the Jetson Nano. Training an A.I model on a Raspberry PI v4 or ODroid N2 is also possible, but it would be a terrible and frustrating experience because those boards are not designed for it. What those 128 cuda cores can chew through in 2 hours, would take a week for a Raspberry PI to process.

If we look away from the GPU, single board computers like the ODroid N2 is better in every way. The power distribution on the SoC is better, the boot options are better, the bluetooth and IR options are better, the network and disk controllers are more powerful, the ram runs at higher clock-speeds -and last but not least, the “little-big” CPU architecture ODroid devices are known for, just floors the Jetson. It literally has nothing to compete with if we ignore that GPU.

So if you have zero interest in A.I, cuda or Tensorflow, then I urge you to look elsewhere. To put the CPU into perspective: The model used on the Jetson is only slightly faster than the Raspberry PI 3b. The extra memory helps it when browsing complex HTML5 websites, but not enough to justify the price. By comparison the ODroid XU4 retails at $40 and will slice through those website like it was nothing. So if you are looking for a multi-purpose board, the Jetson Nano is not it.

Upstream Ubuntu

Most IoT boards ship with a trimmed-down Linux distribution especially adapted for that board. This is true for more or less all the IoT boards I have tested over the last 18 months. The reason companies ship these slim and optimized setups is simply because running a full Linux distribution, like we have on x86, demands too much of the CPU (read: there will be little left for your code). These are micro computers after all.

D24mBT_UcAA1HnY

The Jetson Ubuntu distro comes pre-loaded with drivers and various A.I libraries

The difference between SBC’s like the PI, ODroid (or whatnot) and the Jetson Nano, is that Jetson comes with a full, upstream Ubuntu based distribution. In other words – the same distribution you would install for your desktop PC, is somehow expected to perform miracles on a device only marginally faster than a Raspberry PI 3b.
I must say I am somewhat puzzled by Nvidia’s choice of Linux for this board, but against all odds the board does perform. The desktop is responsive and smooth despite the payload involved, most likely because the renderer and layout engine taps into the GPU cores. But optimal? Hardly.

Other distributions?

One thing that must be stressed when it comes to the Jetson Nano, is that Nvidia is not shipping a system where everything is wide open. Just like the Raspberry PI’s VC (video core) GPUs are proprietary and requires commercial, closed-source drivers, so does the Jetson. It’s not just a question of driver in this case, but a customized kernel.

The board is called the “Nvidia Jetson Nano developer kit”, which means Nvidia sells this exclusively to work with A.I under their standards. For this reason, there are no alternative distributions or flavours. You are expected to work with Ubuntu, end of story.

Having said that, the way to get around this -is simply to work backwards, and strip the official disk image of services and packages you don’t need. A full upstream Ubuntu install means you have two browsers, email and news client, music players, movie players, photo software, a full office suite -and more python libraries than you will ever need. With a little bit of cleaning, you can trim the distro down and make it suitable for other tasks.

Practical uses

No single board computer is identical. There are different CPUs, different GPUs, IO controllers, ram – which all helps define what a board is good at.
The Jetson Nano is a powerful board, but only within the sphere it was designed for. The moment you move away from artificial intelligence, the board is instantly surpassed by cheaper models.

qtx

51 fps running Quake [Javascript] is very good. But the general experience of HTML5 is not optimal due to the cpu not being the most powerful

As such, the only practical use this board has, is for running A.I software and models. Unless Nvidia decides to do a special build of Chrome where those cores are used to speed up HTML5 rendering, I can’t even recommend this for casual browsing. It is an excellent SoC for gaming though. It’s probably the only IoT board that delivers 35-50 fps when emulating Playstation and Sega Saturn games (which are very demanding). I also got 54 fps running Quake 3 compiled to JavaScript, which is pretty cool.

But if gaming is you sole interest, $100 will buy you a much better x86 board to play with. In one of my personal projects (Quartex Media Desktop), which is a cluster “supercomputer” consisting of many IoT boards connected through a switch – the Jetson was supposed to act as the master board. In a cluster design you typically have X number of slaves which runs headless (no desktop or graphical output as all) – and a master unit that deals with the display / desktop. The slave boards don’t need much in terms of GPU since they won’t run graphical tasks (so the GPU is often disabled)– which puts enormous pressure on the master to represent everything graphically.

In my case the front-end is written in Javascript and uses Chrome to render a very complex display. Finding an SBC with enough power to render modern and complex HTML5 has not been easy. I was shocked that the Jetson Nano had poorer performance than devices costing half it’s asking price.

It could be throttling based on heat, because the board gets really hot. It has massive heatsink much like the ODroid N2, but I have a sneaking suspicion that active cooling is going to be needed to get the best out of this puppy. At worst i get 17 fps when running Quake 3 [javascript], but i think there was an issue with a driver. After an update was released, the frame-rate went back to 50+ fps. Extremely annoying because you feel you can’t trust the SoC.

Final verdict

The Nvidia Jetson Nano is an impressive IoT board, and when it comes to GPU power and artificial intelligence, those 128 cuda cores is a massive advantage. But computing is not just about fancy A.I models. And this is ultimately where the Jetson comes up short. When something as simple as browsing modern websites can unexpectedly bring the SoC to a grinding halt, then I cannot recommend the system for anything but what it was designed to do: artificial intelligence.

There are also some strange design decisions on hardware level. On most boards the USB sockets have separate data-lines and power route. But on the Jetson the four USB 3.0 sockets share a single data-line. In other words, whatever you connect to the device, will affect data throughput.

Nvidia also made the same mistake that Asus did, namely to power the USB ports and CPU from the same route. It’s not quite as bad as the Tinkerboard. Nvidia made sure the board has two power sockets, so should undervolting become an issue, you can just plug in a secondary 5V barrel socket (disabled by default).

The lack of eMMc support or a sata interface is also curious. Hosting a full Ubuntu on a humble SD-card, with an active swapfile, is a recipe for disaster.

So is it worth the $99 asking price?

If you work with artificial intelligence or GPU based computing then yes, the board will help you prototype your products. But if you expect to do anything ordinary, like run demanding HTML5 apps or host CPU intensive system services – I find myself struggling to say yes.

It’s a cool board, and it also makes one hell of a games machine – but the hardware is already outdated. If you want a general purpose single board computer, you are much better off buying the Raspberry PI 4 or the ODroid N2, both are cheaper and deliver much better processing power. Obviously the GPU is hard to match.

I will give the board a score of 6 out of 10, purely because the board is brilliant for A.I and compute tasks. This might change as more distros  become available. A full Ubuntu install is one hell of a load on the system after all.

Quartex “Cloud Ripper” hardware

November 10, 2019 Leave a comment

For close to a year now I have been busy on a very exciting project, namely my own cloud system. While I have written about this project quite a bit these past months, mostly focusing on the software aspect, not much has been said about that hardware.

74238389_10156646805205906_1728576808808349696_o

Quartex “Cloud Ripper” running neatly on my home-office desk

So let’s have a look at Cloud Ripper, the official hardware setup for Quartex Media Desktop.

Tiny footprint, maximum power

Despite its complexity, the Quartex Media Desktop architecture is surprisingly lightweight. The services that makes up the baseline system (read: essential services) barely consume 40 megabytes of ram per instance (!). And while there is a lot of activity going on between these services -most of that activity is message-dispatching. Sending messages costs practically nothing in cpu and network terms. This will naturally change the moment you run your cloud as a public service, or setup the system in an office environment for a team. The more users, the more signals are shipped between the processes – but with the exception of reading and writing large files, messages are delivered practically instantaneous and hardly use CPU time.

CloudRipper

Quartex Media Desktop is based on a clustered micro-service architecture

One of the reasons I compile my code to JavaScript (Quartex Media Desktop is written from the ground up in Object Pascal, which is compiled to JavaScript) has to do with the speed and universality of node.js services. As you might know, Node.js is powered by the Google V8 runtime engine, which means the code is first converted to bytecodes, and further compiled into highly optimized machine-code [courtesy of llvm]. When coded right, such Javascript based services execute just as fast as those implemented in a native language. There simply are no perks to be gained from using a native language for this type of work. There are however plenty of perks from using Node.js as a service-host:

  • Node.js delivers the exact same behavior no matter what hardware or operating-system you are booting up from. In our case we use a minimal Linux setup with just enough infrastructure to run our services. But you can use any OS that supports Node.js. I actually have it installed on my Android based Smart-TV (!)
  • We can literally copy our services between different machines and operating systems without recompiling a line of code. So we don’t need to maintain several versions of the same software for different systems.
  • We can generate scripts “on the fly”, physically ship the code over the network, and execute it on any of the machines in our cluster. While possible to do with native code, it’s not very practical and would raise some major security concerns.
  • Node.js supports WebAssembly, you can use the Elements Compiler from RemObjects to write service modules that executes blazingly fast yet remain platform and chipset independent.

The Cloud-Ripper cube

The principal design goal when I started the project, was that it should be a distributed system. This means that instead of having one large-service that does everything (read: a typical “native” monolithic design), we instead operate with a microservice cluster design. Services that run on separate SBC’s (single board computers). The idea here is to spread the payload over multiple mico-computers that combined becomes more than the sum of their parts.

IMG_4644_Product_1024x1024@2x

Cloud Ripper – Based on the Pico 5H case and fitted with 5 x ODroid XU4 SBC’s

So instead of buying a single, dedicated x86 PC to host Quartex Media Desktop, you can instead buy cheap, off-the-shelves, easily available single-board computers and daisy chain them together. So instead of spending $800 (just to pin a number) on x86 hardware, you can pick up $400 worth of cheap ARM boards and get better network throughput and identical processing power (*). In fact, since Node.js is universal you can mix and match between x86, ARM, Mips and PPC as you see fit. Got an older PPC Mac-Mini collecting dust? Install Linux on it and get a few extra years out of these old gems.

(*) A single XU4 is hopelessly underpowered compared to an Intel i5 or i7 based PC. But in a cluster design there are more factors than just raw computational power. Each board has 8 CPU cores, bringing the total number of cores to 40. You also get 5 ARM Mali-T628 MP6 GPUs running at 533MHz. Only one of these will be used to render the HTML5 display, leaving 4 GPUs available for video processing, machine learning or compute tasks. Obviously these GPUs won’t hold a candle to even a mid-range graphics card, but the fact that we can use these chips for audio, video and computation tasks makes the system incredibly versatile.

Another design goal was to implement a UDP based Zero-Configuration mechanism. This means that the services will find and register with the core (read: master service) automatically, providing the machines are all connected to the same router or switch.

IMG_4650_Product_1024x1024@2x

Put together your own supercomputer for less than $500

The first “official” hardware setup is a cluster based on 5 cheap ARM boards; namely the ODroid XU4. The entire setup fits inside a Pico Cube, which is a special case designed to house this particular model of single board computers. Pico offers several different designs, ranging from 3 boards to a 20 board super-cluster. You are not limited ODroid XU4 boards if you prefer something else. I picked the XU4 boards because they represent the lowest possible specs you can run the Quartex Media Desktop on. While the services themselves require very little, the master board (the board that runs the QTXCore.js service) is also in charge of rendering the HTML5 display. And having tested a plethora of boards, the ODroid XU4 was the only model that could render the desktop properly (at that low a price range).

Note: If you are thinking about using a Raspberry PI 3B (or older) as the master SBC, you can pretty much forget it. The media desktop is a piece of very complex HTML5, and anything below an ODroid XU4 will only give you a terrible experience (!). You can use smaller boards as slaves, meaning that they can host one of the services, but the master should preferably be an ODroid XU4 or better. The ODroid N2 [with 4Gb Ram] is a much better candidate than a Raspberry PI v4. A Jetson Nano is an even better option due to its extremely powerful GPU.

Booting into the desktop

One of the things that confuse people when they read about the desktop project, is how it’s possible to boot into the desktop itself and use Quartex Media Desktop as a ChromeOS alternative?

How can a “cloud platform” be used as a desktop alternative? Don’t you need access to the internet at all times? If it’s a server based system, how then can we boot into it? Don’t we need a second PC with a browser to show the desktop?

73475069_10156646805615906_2668445017588105216_o

Accessing the desktop like a “web-page” from a normal Linux setup

To make a long story short: the “master” in our cluster architecture (read: the single-board computer defined as the boss) is setup to boot into a Chrome browser display under “kiosk mode”. When you start Chrome in kiosk mode, this removes all traces of the ordinary browser experience. There will be no toolbars, no URL field, no keyboard shortcuts, no right-click popup menus etc. It simply starts in full-screen and whatever HTML5 you load, has complete control over the display.

What I have done, is to to setup a minimal Linux boot sequence. It contains just enough Linux to run Chrome. So it has all the drivers etc. for the device, but instead of starting the ordinary Linux Desktop (X or Wayland) -we instead start Chrome in kiosk mode.

74602781_10156646805300906_6294526665393438720_o

Booting into the same desktop through Chrome in Kiosk Mode. In this mode, no Linux desktop is required. The Linux boot sequence is altered to jump straight into Chrome

Chrome is started to load from 127.0.0.1 (this is a special address that always means “this machine”), which is where our QTXCore.js service resides that has it’s own HTTP/S and Websocket servers. The client (HTML5 part) is loaded in under a second from the core — and the experience is more or less identical to starting your ChromeBook or NAS box. Most modern NAS (network active storage) devices are much more than a file-server today. NAS boxes like those from Asustor Inc have HDMI out, ships with a remote control, and are designed to act as a media center. So you connect the NAS directly to your TV, and can watch movies and listen to music without any manual conversion etc.

In short, you can setup Quartex Media Desktop to do the exact same thing as ChromeOS does, booting straight into the web based desktop environment. The same desktop environment that is available over the network. So you are not limited to visiting your Cloud-Ripper machine via a browser from another computer; nor are you limited to just  using a dedicated machine. You can setup the system as you see fit.

Why should I assemble a Cloud-Ripper?

Getting a Cloud-Ripper is not forced on anyone. You can put together whatever spare hardware you have (or just run it locally under Windows). Since the services are extremely lightweight, any x86 PC will do. If you invest in a ODroid N2 board ($80 range) then you can install all the services on that if you like. So if you have no interest in clustering or building your own supercomputer, then any PC, Laptop or IOT single-board computer(s) will do. Provided it yields more or equal power as the XU4 (!)

What you will experience with a dedicated cluster, regardless of putting the boards in a nice cube, is that you get excellent performance for very little money. It is quite amazing what $200 can buy you in 2019. And when you daisy chain 5 ODroid XU4 boards together on a switch, those 5 cheap boards will deliver the same serving power as an x86 setup costing twice as much.

Jetson-Nano_3QTR-Front_Left_trimmed

The NVidia Jetson Nano SBC, one of the fastest boards available at under $100

Pico is offering 3 different packages. The most expensive option is the pre-assembled cube. This is for some reason priced at $750 which is completely absurd. If you can operate a screwdriver, then you can assemble the cube yourself in less than an hour. So the starter-kit case which costs $259 is more than enough.

Next, you can buy the XU4 boards directly from Hardkernel for $40 a piece, which will set you back $200. If you order the Pico 5H case as a kit, that brings the sub-total up to $459. But that price-tag includes everything you need except sd-cards. So the kit contains power-supply, the electrical wiring, a fast gigabit ethernet switch [built-into the cube], active cooling, network cables and power cables. You don’t need more than 8Gb sd-cards, which costs practically nothing these days.

Note: The Quartex Media Desktop “file-service” should have a dedicated disk. I bought a 256Gb SSD disk with a USB 3.0 interface, but you can just use a vanilla USB stick to store user-account data + user files.

As a bonus, such a setup is easy to recycle should you want to do something else later. Perhaps you want to learn more about Kubernetes? What about a docker-swarm? A freepascal build-server perhaps? Why not install FreeNas, Plex, and a good backup solution? You can set this up as you can afford. If 5 x ODroid XU4 is too much, then get 3 of them instead + the Pico 3H case.

So should Quartex Media Desktop not be for you, or you want to do something else entirely — then having 5 ODroid XU4 boards around the house is not a bad thing.

Oh and if you want some serious firepower, then order the Pico 5H kit for the NVidia Jetson Nano boards. Graphically those boards are beyond any other SoC on the market (in it’s price range). But as a consequence the Jetson Nano starts at $99. So for a full kit you will end up with $500 for the boards alone. But man those are the proverbial Ferrari of IOT.