Updated Review of LLM Based Development

I tried developing using GPT mid-2022. While I was amazed by the potential, I was not impressed enough to add it to my daily development workflow. It fell off my radar as a development tool, outpaced by a far more impactful use of text generation and image creation. A toolset that has significantly changed my day-to-day productivity.

Recently, a group of peers convinced me to give coding with LLM another shot. They loved using A.I. to develop code in languages they were not comfortable with. Or, as a manager, a way to better explain what they wanted to see in a project from their team. What convinced me to try it again was their highlighting of how well the results were formatted, syntactically correct, and well documented from the get-go. While they admitted the development of code may not be faster, the prospect of all those benefits culminating into a cleaner, well formatted final product convinced me to develop with GPT again in earnest.

I began my reexamination of the tooling via demos, as we often do. I was very impressed. I converted code into PowerShell (which I don’t know well) and re-created functionality I came across in weeks prior. I was so impressed, I showed my team examples of how the work they completed in the past could’ve been done with the help of GPT instead.

After those successes, I committed to using GPT to develop. Over the next few weeks I made sure to use it on projects I was working on.

While the technology showed incredible advancements since I tried it last year, it still hasn’t become my go-to in the same way using ChatGPT has for writing content.

Here are some areas I was both impressed with but left wanting:

  1. Code completion
    • Pro: Impressive. The look-ahead came in handy similarly to code-completion functionality of the past, with the added benefit of more contextual relevance that was not just a “cookie cutter” snippet.
    • Con: It gave me a useless hint quite a bit and I found myself typing almost as much as before with the incumbent “dumb completion”. I think it is because my mind is moving ahead to what I want the code to do, not necessarily what it is doing on the console at the moment. In the end, it is using patterns to make predictions. So, any new code that is a result of changes to my approach, or my on-the-fly reworking to fix a bug (that was not due to syntax issues) took as much time to develop as non-GPT-based code completion.
  2. Testing
    • Pro: When it comes to testing an existing application, the A.I. hits it out of the park. Ask it to “write a test for myFunction() using Jest” it creates an awesome base test case that I would have hated to write for each function.
    • Con: Some of the same issues outlined in the “Code Completion” and Functional Development” can be problematic here. It doesn’t always create a great test for code I haven’t written yet. (i.e. TDD) However, if the code is already there, it uses that context I’ve provided and its LLM to unpack what it the function is suppose to do and generate all the mocks and assertions needed to create a well written unit test.
  3. Functional Development
    • Pro: Much like helping me get past the dreaded blank page in text generation, I found it more useful than Google searches and StackOverflow reviews to develop a series of functions I wanted, without developing entirely from scratch. Better than code snippets, the snippets A.I. gave were pre-filled based on my prompts, variables, and existing object definitions. That was appreciated. I didn’t have to review the documentation to tweeze out the result I wanted. The A.I. pulled it all together for me.
      Additionally, the fact that it almost always has an answer goes under appreciated in other reviews I’ve read. The part that makes it so advanced, is it fills in a lot of grey area even if I (as a stupid human) carelessly leave out an instruction that is critical in generating a solution. If I were to get the response, “could not understand your request” due to my laziness, I would never use it. The assumptions it makes are close enough to my intent that I am either using the solution, learn a new path, or see what my explanation is missing so I can improve how I communicate with it.
    • Con: The end result did not work out of the gate most of the time. Sometimes it never got it correct and I had to Google the documentation to figure the issue. This was due to what I think was more than one documentation existing for various versions of the library I was using. I’m not sure. While the syntax was correct, the parameters it assumed I needed, or the way the calls were made to interface with a library/API led to errors.
  4. Debugging
    • Pro: Per the “functional development” points above, I was impressed at how I could respond to a prompt result with “I got this error when using the code above: [error]”. It recognized where it went wrong, and attempted to rewrite the code based on that feedback.
    • Con: Each response had a different result than the original. So, instead of fixing what I found was wrong (like a param missing) it also added or removed other features from the code that were correct. This made the generated result difficult to utilize. In some cases, it could never understand the issue well enough to generate working code.

One limitation I am not too surprised about, and am hopeful to see evolve in the future, is the AI’s understanding of a project in its entirety. Done so in a way that context is used in its function creation, making the solutions it provides “full stack”. Imagine a Serverless.com config, for an AWS deployment, that generates files and code that creates and deploys workflows using Lambda, DynamoDB, S3 and so on, all being developed based on prompts. With the yearly (and more recently) weekly leaps, I don’t think we are to far away.

As of today, I find myself going to GPT when filling in starter templates for a new project. I find it’s a much better starting point than starting from cookie cutter function as I set up my core, early, “re-inventing the wheel”-type, skeleton.

For example, I will use a Gitlab template for my infrastructure (be it GL Pages, Serverless, React, nodejs or Python and on and on), then fill in the starter code an tests via a few GPT prompts, and copying them over. Beyond that copy, I find myself detaching from GPT for the most part, and returning to occasionally “rubber duck” new framework functions.

Examples referenced above

Here I asked for a non-3rd-party use of promises (only await/async) which worked. Then I asked to modify the code by adding a zip task, and it re-introduced the promisify utility when it added the zip process.

Finding Nebo

In my journey from being “a writer hater to a writer lover”, finding the Nebo app was a defining moment. Of all the apps I tried, only Nebo could recognize my chicken scratch, retain my handwritten texts for review, and allow me an edit the original text before converting it to type.

Nebo beautifully melds the written form, digital tech, and typography. Its edit-gestures feel incredibly natural, the digital ink flows like your favorite pen, and the final product is compatible with the modern world. I’ve gained in all mediums and compromised in none.

It’s rare to find an app drives you to create opportunities to find excuses to use it. Especially when the app exists to enhance the mind, not rot it.

Feature Highlights

Chicken scratch interpreter

I’m amazed at how well the interpreter is, able to convert my god awful handwriting to text. It seems to combine A.I. OCR handwriting with grammar to assume a nearest approximation of what I’m trying to write. Whatever the methodology, the results far better than apple’s built-in note taking app.

Inline Editing in ink or between typed notes

In the event the interpreter fails, the editing features makes correcting easy and fun. For example, handwritten text is retained until you double tap it to convert to type.This allows you to review your writing before conversion. You can also preview the converted text in a banner scrolling horizontally above.

Wish List & Nits

Night/Dark Mode

Sometimes inspiration strikes right as I’m getting to bed. I reach for my iPad, open Nebo, and BAMB! An intensely white screen blinds me as my eyes try to adjust.

More Heading sizes

Simply put, H1 looks like H2, and I can’t bold text on Its own line without it becoming a heading. So, new H3?

Clearer New Line & Erase Interpreters

I like the natural feel of gestures. However, I find my self trying to gesture a new line over and over to no avail. I think the app can be smarter. Why not assume one new line is available, infinitely until converted. So, when I hit the end of a line I don’t need to gesture in the first place.