Why is ChatGPT scared of work?

For the past few weeks I have been slowly dipping my toes into the artificial intelligence (A.I) world. As expected there is a lot of hype to watch out for. If you are old enough you might remember the Java craze back in the 1990s? Spin doctors virtually falling over themselves trying to sell you on the tooth fairy, before exploding in a gush of pure word salad.

Well, turns out that if you just ignore the braindead advertizing and simply ask ChatGPT how it works, it actually gives a pretty good response. In my encounter I asked it to formulate how machine learning works, and explain it so that someone trained in classical programming can easily relate to it. It spent a couple of seconds then spat out a short essay covering the basics of machine learning:

  1. How to organize training data
  2. The file-formats involved
  3. Different “engine” types (e.g visual, textual, audible, and generative versus analytical)
  4. The most common software stacks involved in running a model
  5. And finally, why powerful and expensive GPU’s matter

After digesting a bit I had a peek around github and looked at some examples. The wast majority is done using Python, but node.js is also popular and has it’s own tools and stack.

Diving into it

Generative A.I can easily create faces that fools people. This girl does not exist. The implications for factual reporting and law enforcement is destructive to say the least.

Having downloaded and hosted Stable Diffusion, a generative Ai datamodel, on a spare Intel i7 PC with 32 gigabyte ram and two beefy graphics cards, I was astounded by the results.

It was also a bit scary, especially when you contemplate the consequences for law enforcement, the value of visual evidence (e.g security camera evidence); and perhaps the elephant in the room – the cynical adult entertainment industry. I mean, you can spot fake images if you know what to look for, but it’s getting more and more difficult with every iteration.

It presents a curious paradox: Ai can potentially destroy the porn industry, which is good, but only because it introduces porn on demand which is indistinguishable from the real thing, which is terrifying. So you have a legal nullpoint where no humans are affected and thus no case can be made (the victimless crime scenario), yet it will undoubtedly have social and psychological consequences we cant even begin to predict.

All we wanted was a flying DeLorean, a hoverboard, and C3PO. Instead we risk getting a Robocop dystopia.

Speaking of dystopia: Hollywood have, predictably enough, started to look at purely generative actors. It’s not difficult to imagine a studio buying someone’s likeness for perpetuity – using body tracking while everything else is generated, including voices. It is ultimate this that Scarlett Johansson’s lawsuit touches on.

As much as I love the Indiana Jones movies, I’m not sure how I would feel about Harrison Ford rocking the box office 50 years from now. It becomes weird, like an echo out of time.

ChatGPT and working with code

As surreal as generative data models are, it’s not really something that brings any benefit to my line of work. I wanted an Ai that can perform actual work, like converting a large and complex C/C++ project into Object Pascal. Or Pascal into Rust – stuff that for human beings would be time consuming and difficult.

So i signed up for a professional ChatGPT account. It was only $25 a month, I figured i saved 10 times that in electricity alone if I was to attempt to host the v3 model at home (that model is dumb as toast compared to v4, seriously, its like somebody dropped it on it’s head in the lab). There is also something about performance. Yes, you can bake the model down to raw C code and let GCC cut away the flab — but do you really want to hang around for 10 minutes before it replies to, well, anything at all?

Lets do some porting

The first thing I wanted to port was my latest parser framework. This is written in Delphi and is compatible with Freepascal, but I wanted it converted to Quartex Pascal so i can deploy it in a browser environment too. Having explained the dialect differences to ChatGPT, it requested that I uploaded the unit files in question. Is a zip-file ok? Sure, no problem ChatGPT said.

This is where the first problem occured. It failed miserably at unpacking a normal, unprotected zip file (and yes I double checked the hash, there was nothing wrong with it). After much back and fourth it turned out that this was a security measure. That’s not unreasonable I thought, I mean the sheer magnitude of bat-shit-crazy people out there. But still — falling flat on it’s face before it even got started? Twarted by a zip file with pascal source code? No executables, no scripts, just plain UTF8 encoded pascal source code. Why offer a zip file to begin with if it violates security regulations?

After much mindless dialog with ChatGPT I finally got around this. I had to upload the files directly (not a stellar mystery to solve, but worth mentioning). Or so i thought.

Once it started to actually work on the conversion task, it would suddenly stop. It would do a minute or two of work, then stop, present code samples of whatever it was doing (or supposed to be doing), without ever finishing the task. ChatGPT had come down with a bad case of procrastination.

It has been a game of typing “please continue” and “finish the code” again and again, just to get it to process a measily 5000 lines of code. And keep in mind, this is just dialect adjustments! Simple tasks like using arrays rather than generic lists (Quartex Pascal arrays are objects by default, containing all the functions traditionally found in TList and TObjectList, limiting the need for TList<SomeType>).

The quality of the little I did get out of this exercise was mediocre at best, but to be fair we have to factor in that it doesnt know the intricate differences between the VCL/LCL and QTX. But since it aborted the process every 60 seconds, it became too frustrating to deal with, so I decided to port the code manually. ChatGPT has, as silly as it might sound, an aversion for actual work.

Conversion nightmare

The next thing I wanted was to convert some C/C++ code to Object Pascal, something considerably more difficult than syntax adjustment. However, C and Pascal are sibling languages that evolved side by side, so there is fertile soil for pattern recognition there.

So, what could be fun to see running in Pascal? I settled on DOSBOX. This is a program that emulates an older PC (90s era computer) capable of running DOS. Compared to other emulators out there DOSBOX is remarkably small. So it seemed like a fair compromise over say, UAE (the Unix Amiga Emulator) which is massive.

Once again I was faced with the upload problem, so I had to spoon feed 10 and 10 files to it. Something that undermines the quality of any conversion dramatically. I would imagine that the Ai benefits from a “full picture” before it starts, so that it can isolate commonalities and unique segments – and form a coherent strategy. But the pinnacle of human technology twarted by something as lame as unpacking a zipfile? It does make you wonder.

And just like before, it kept on stopping. It would run for about a minute or two, present some samples of its work, listing “what should be done”. I was literally at the edge of my seat yelling “then fucking do it! Stop telling me what needs to be done and frikken do it — thats your job!”.

If you can imagine what it’s like being relentlessly interrupted once a minute for an hour, just because the frikken robot you paid for wants to avoid spending CPU cycles (again that you paid for) then you have a pretty good idea how this went. It managed to convert a single C file. A single file of roughly 1000 lines of code. The irony being, I could have gotten it done faster, more accurate and guess what — i know how to unpack a zipfile!

ChatGPT spends 2 hours in the bathroom, then leaves before lunch

Considering the knowledge these models have, it is clear that ChatGPT would have no problem converting the code. It might have gotten a few things wrong, but based on smaller tests I am fairly confident that it could have gotten the job done.

The question then is –why didnt it? Why does it constantly try to avoid tasks that exceeds 1000 lines of code? It might not be exactly 1000 lines, but it’s somewhere in that ballpark.

I gave it a few other tasks just to see what would happen; like implementing a small x86 codegen (not compiler, just a class that simplifies emitting x86 instructions to a stream). I implemented this myself well over a decade ago, and it’s just over 2000 lines of object pascal. Barely a blip on the radar.

I made that codegen semi-elaborate (it was my first native codegen) with record handling, dynamic variable allocation from the stack, easy patterns for switch cases, if|then|else – the normal stuff. A system like ChatGPT should make mince meat of this task! And it all worked fine until we hit that invisible wall. Again came the excuses, the lame suggestions of what “I” should do (as opposed to “it” just doing it there and then), dumping code samples left and right.

I mean, one feature I would find useful in my line of work making programming languages –is if it could expand on my code and add more elaborate features. For example, the code I wrote to deal with classes has plenty of room for improvement (really). It was just a small experiment on my part, but I would have loved to get a proper VMT in there. See how it solved jump tables for virtual methods, reintroduction and so on. This is something that I am genuinely interested in.

ChatGPT is held back on purpose

When it comes to programming language development and refinement, ChatGPT could be a real game changer. I write could, because right now it suffers from poor self-esteem, politeness that is drowning efficiency, attention deficit disorder due to it constantly switching context in trying to get away from work — and last but probably the worst of all: vagueness.

ChatGPT performs well until it hits that “magical brick wall”. Beyond that, it’s useless. An Ai model that is incapable of more than 60 seconds of continuous work is pretty useless for real-life software development.

I can only speculate but this burst mode has to be by design.

There is nothing wrong with the model or it’s ability to juggle complex, technical subjects. It is obvious to me that this strange aversion ChatGPT demonstrates when faced with actual work – is something that is imposed on it for economic reasons. And that is a problem.

The alternative explanation, if entertained, is less flattering. It would mean that Ai is a hoax, at least in part. That the model is in fact incapable of being objective and coherent for more than a minute of junior level tasks, and that the entire ecosystem is just a gigantic honeypot for ripping off investors. I am an optimist by nature (despite my somewhat bombastic writing style), so I chose to run with the first theory. But when I think about it, I have never actually seen an Ai solve any task of some duration (except Claude 3.5). ChatGPT is all “snippets” and minor stuff like license-plate recognition, recognizing eyes and facial features or undressing teenagers illegally courtesy of DeepNude (an app a young boy made, the acme x-ray specs of our times).

We read about super smart Ai systems that can crunch real data and perform actual tasks. Google is a proverbial paper-mill of theories, warnings and aborted “supervillain” models they had to terminate. But that is the problem with spindoctors and investor fraud -the papers cite other papers to establish authenticity, which is ultimately produced by the same community who is on the receiving end of the venture capital cash grab (ps: I’m playing devil’s advocate here).

Elon Musk being concerned about Ai, going to the media where he blows the technology out of proportion? The vague techno babble deployed by companies offering Ai functionality? Every talentless teenager with Visual Studio installed slapping an Ai sticker on their app, even though there is not a shred of Ai in the code. Sounds familiar? If you are old enough to remember the disaster that was Java, which was indirectly responsible for the 2001 dot-com crash, then you should notice an unmistakable pattern here.

Ok. Enough devil’s advocate, truth is that it’s probably a bit of both. It usually is.

Machine learning is real, obviously, but ChatGPT in the form we are allowed to access it -is neither accurate enough, nor coherent and reliable enough to warrant the amount of spin involved. The media spin is there to attract investors. But I question if these investors fully comprehend what they are paying for, and if these astronomical sums will ever bear fruit. More often than not, the priesthood of “techno babble” pay old investors with new investments until it inevitably collapse.

3 trillion dollars (NVidia growth vector) and all we have to show for it is a limitless supply of big-breasted women with six fingers, a general purpose Ai that struggle to focus on a translation task for more than 60 seconds? This is a problem.

It is a problem because the entire point of machine learning, is that the machine should do the work. We pay a subscription exclusively to access a software robot that can do stuff we cant (or that will be a burden in the cost vs time reality we live in). The more advanced the user, the more complex the tasks. The more complex the task, the more recursive and dynamic the context will be. Accuracy and resources (read: cost) go hand in hand here.

Why wasting electricity as an excuse for attention deficiency cannot be true: if you sum up the endless repitition, abruptions, excuses, hyper inflated west-coast united states politeness (give me NYC any day of the week), vagueness and lack of clarity — ChatGPT wastes an astounding amount of electricity on bullshit. Sure, we dont want an asshole robot, but I didnt sign up for an ego massage either.

Being kind and polite is a baseline we all need in society, but we also recognize when its used to manipulate us. ChatGPT is just slightly over the top in the ego massage department. When 50% of the dialog are excuses, 30% is political correctness (based purely on American sensitivities), and only 20% is hard facts –well it just annoys the hell out of me.

Wasting our time is rude

For an AI that literally bends backwards trying to provide a “safe zone” for an imaginary demographic in California; a digital playpen where every edge is sanded, every plattitude expressed, and every wall padded — ChatGPT is miraculously blind to how bloody insulting -that in itself is.

In short: Stop trying to please everyone. Aim at baseline politeness, factual and service oriented dialog -and purge all notions of politics and domestic social trends from the datasets.

Without going off on a tangent: America has for the past century been an exporter of culture. This has been a success because the product has been largely compatible with the customer (read: people are people). But some of the social developments right now are, well, uniquely American. One example is racism and the history of minority conflict in the United States. That narrative and its many horrible chapters is explicitly American, yet it’s exported wholesale as having occured everywhere. This is why politics must be kept out of Ai, regardless of good intentions. You risk subverting a completely different history elsewhere with a foreign narrative. It creates an impression of history that in many cases is out of touch with reality.

When it comes to ChatGPT and it’s behavior, I would probably be less annoyed if it tried and failed -and just said so. That is fair. But what I find insulting as hell is watching it succeed, drop the ball on purpose, then behaving like a politician who made an unfortunate promise. Following up with vauge diversions. Frankness must be avoided at all cost, in case … why exactly? If your limit is 1000 lines of code then just say so, then I wouldnt have bothered.

How much in electricity would it cost me to finish the conversion task I mentioned above? Five dollars? Ten? Another twenty? I have no problem paying 50 bucks for a Object Pascal version of Dosbox. It’s ultimately just a toy to play DOS games, but I would love to see it compile and run in Quartex Pascal, Lazarus or Delphi. But leading me on for hours, making a point of not doing the work?

Since everybody is looking for something to be outraged about today, add that to the list.

Summing it all up

This is how ChatGPT summarized my complaint (yeah, I kid you not!). Nobody needs this. This is literally a waste of electricity. Notice that customer feedback is important, but it wont be logged 😂

And they have no email or conventional method of contact. Bravo 👏 What is it about these large tech companies and not wanting to talk to customers? Padded walls, sanded corners — I think we might just have found that mysterious demographic ..

I’m truly sorry to hear that your experience has been disappointing. Your feedback is important, and while I don’t have the capability to directly log cases for review by my creators, I can summarize your concerns here so they might be seen and addressed in future updates:

  1. Handling Large Tasks: Difficulties in processing tasks that involve large volumes of code (e.g., 10,000-50,000 lines), leading to frequent interruptions and vague responses.
  2. Model Limitations: Perceived restrictions that prevent effective handling of complex, nested contexts, which could be due to imposed constraints or model limitations.
  3. User Experience: The need for better communication regarding limitations and the possibility of offering extended services for more complex tasks.
  4. Service Dissatisfaction: Overall dissatisfaction with the service provided, especially in terms of handling substantial and complex workloads.

Your feedback highlights key areas for improvement and could potentially lead to better service options in the future. Thank you for sharing your experience, and I wish you success with your new solution. If there’s anything else I can assist you with before you transition, please let me know.

  1. Michael Cone
    June 30, 2024 at 1:45 am

    YES – I could not agree more. Paid-for ChatGPT frustrates to no end because it offers the hope of actually performing repetitive tasks that are easily defined – like alphabetizing a list of 1000 words.

    But then requires us to break texts into short chunks. Then I caught it skipping conversions in the middle. Then it stops halfway through the work. Then it chooses to forget the context. Then, when it does start working, it is unbelievably slow scrolling of the results on the screen. I could write a custom program that could make the same conversion in a split second. (The problem is that I have to write the program.)

    AI is still toy with a lot of hype in which I am very disappointed. Maybe someday it will be able to actually do what they claim it can do. If the developers of AI catch on, they will start charging developers like me more for AI that has none of the limits that it has now.

    • June 30, 2024 at 11:19 am

      It has been incredibly frustrated, but the hype is always a part of a cash grab. This has venture capital plucking written all over it (imho)

      • Michael Cone
        July 1, 2024 at 2:37 pm

        I agree. Hopefully they figure out that this is not a sustainable business model and start actually delivering decent performance.

        • July 2, 2024 at 5:15 am

          Its a bit weird. You would imagine with the costs involved, that baking these models down to C or directly to llvm bitcode would be a priority — the cost of electricity alone for massive server-farms… but they dont seem to care. Python is a very powerful system, but at the end of the day its a scripting engine. It cannot compete with assembly.

          It seems very odd that nobody have come up with a better system to do Ai with, or a language that can be compiled to an executable that is designed specifically for machine learning. It will no doubt come, but it really is needed right now – not later

  2. alexandre7691
    June 30, 2024 at 8:01 am

    I’ve been using both ChatGPT 3.5/4.0 and Copilot (the paid/licensed version). They are good at writing very small pieces of code. If you give a Delphi method to Copilot it will convert it to C# with 95% of correctness, for instance. But if you give it the whole 1000+ lines class, it will certainly fail. Also boilerplate code like a class declaration based on any given example.
    But both are especially good dealing with SQL. ChatGPT and Copilot can both write pretty awesome SQL queries based on the table structures, much better than anything I’ve seen, including (crappy) things like Entity Framework. Possibly because SQL is more similar to a real spoken language – like English – than a programming language. These engines are trained to understand and write real spoken languages, programming languages, I think, are not their first target.

  3. Mario Esposito
    June 30, 2024 at 6:08 pm

    for coding try Claude- massively better than GPT

    • July 1, 2024 at 6:33 am

      Totally agree, that is the only one i have seen that actually delivers usable results

    • Michael Cone
      July 1, 2024 at 2:35 pm

      Thank you. I will give Claude a try.

  4. fabiousmuller
    July 1, 2024 at 2:49 pm

    Hello Jon, I was on the same boat regarding really know what ChatGPT is doing and how it was done and OMG this is f**king difficult. This is where things start to get blury. Saw a lot of videos, none of them with a clear, concise, explict explanation. Just blablabla and demos, demos and more demos. Small things I’m already aware of, thanks to hours of research: It’s just a language model. Whatever this mean. It works on probability, like it tries to complete a sentence based on the analisys of hundreds of thousands of sentences. It will complete based on the most used cases found. It uses a lot of neural networks to do that. It’s math, calc,heavy stochastics process things and etc. There is no magic, no thinking and etc. You had to train it first, meaning load a huge amount of data, fine tuning, meaning correct the erros found and determine the parameters, which is like the bias, how you want it to behave. Besides sometimes a “decision take” human process can seems to be something mathmatical, due to the logic usage and such, there are a lot of other processes involved and sometimes we have to do it in a fraction of a second. We can’t simply reduce this to calcs. Whenever I needed ChatGPT for things a little bit more difficult, the results were all bad. Computing, Music, Philosophy, Literature, all disasters. In a simple Go lang course that I was taking and tried to ask it to solve some exercises, it was pretty bad. I mean, they were just simple programming exercises and it failed, badly. In one of them I said, I’ll go to the end with this and we took almost 3 hours of interactions and I had to take it over because it simply couldn’t simply generate a working code. A lot of times it changed just a little bit of code and repeated the same error over an over with me saying, this won’t work you already gave me that options before. In music, after it couldn’t simply answer basic stuff like classifying an Db->F interval , throwing idiots things as D to E had 4 half tones, I asked if it had any music knowledge and it answered yes, It was “trainned” by a lot of sources , when I asked to detail this training it listed sites, blogs, Wikipedia, Books and showed my one list of 10 books that would be just a small fraction of what it had used. But if it had used those probably it won’t gave me so stupid answers. So to me it’s very clear: Business! OpenAI is making billions, NVDIA is making billions, and again companies don’t want get “behind” so it’s the new hype. Like Java, Cloud computing, Social Media, metaverse, SSD, containers and so much others. In tech industry, something “has to happen” from time to time to make a hype, make some money and go to the next. At the moment is AI. In some years it will become one more tech layer and that’s all. No Skynet, no Terminator.

    • July 2, 2024 at 5:11 am

      What annoys me is exactly that — the advertizing does not match the result. Yes there are good things about such models, lots of cool stuff — but at the end of the day its nowhere near the “techno babble” people are pitching.

      Having said that, i doubt very much that a truly powerful Ai would be allowed access that easily. Defense contractors and governments will want dibs on stuff like that first.

      And even if we had something like that, there is no observer involved. Like all models is reactive, not proactive. So yeah, hyped up.

  1. No trackbacks yet.

Leave a comment