Determine my head’s direction from web cam

Is my face pointing up or down? That was the question I was curious to solve using only a static web-based client infrastructure. My first lead pointed me to Google’s MediaPipes’ Face Mesh docs. There they have links to various languages to leverage their AI face modeling solution on Codepen. The demo was useful in quickly getting a grasp of how the library worked and ways to manipulate it.

The mesh is essentially a collection of points organized in an array. Each index in this array corresponds to a specific point on the 3D face we create. To make things easier, I’ve labeled these points in my code example. That’s where the live tags in my demo come in handy, a massive aide in navigating through the mesh.

At this point, I understood the face mesh structure and had the code to transform into React components. However, I still needed to capture the orientation of the face. I experimented with several approaches, employing techniques like tracking the irises directions or eyebrows positions based on the blended face index (i.e. FaceLandmarker.FACE_LANDMARKS_IRIS). Unfortunately, this approach was unreliable …

Check out the full story, code, and demo on Medium https://medium.com/@sshadmand/a-simple-and-efficient-face-direction-detection-in-react-e02cd9d547e5

Updated Review of LLM Based Development

I tried developing using GPT mid-2022. While I was amazed by the potential, I was not impressed enough to add it to my daily development workflow. It fell off my radar as a development tool, outpaced by a far more impactful use of text generation and image creation. A toolset that has significantly changed my day-to-day productivity.

Recently, a group of peers convinced me to give coding with LLM another shot. They loved using A.I. to develop code in languages they were not comfortable with. Or, as a manager, a way to better explain what they wanted to see in a project from their team. What convinced me to try it again was their highlighting of how well the results were formatted, syntactically correct, and well documented from the get-go. While they admitted the development of code may not be faster, the prospect of all those benefits culminating into a cleaner, well formatted final product convinced me to develop with GPT again in earnest.

I began my reexamination of the tooling via demos, as we often do. I was very impressed. I converted code into PowerShell (which I don’t know well) and re-created functionality I came across in weeks prior. I was so impressed, I showed my team examples of how the work they completed in the past could’ve been done with the help of GPT instead.

After those successes, I committed to using GPT to develop. Over the next few weeks I made sure to use it on projects I was working on.

While the technology showed incredible advancements since I tried it last year, it still hasn’t become my go-to in the same way using ChatGPT has for writing content.

Here are some areas I was both impressed with but left wanting:

  1. Code completion
    • Pro: Impressive. The look-ahead came in handy similarly to code-completion functionality of the past, with the added benefit of more contextual relevance that was not just a “cookie cutter” snippet.
    • Con: It gave me a useless hint quite a bit and I found myself typing almost as much as before with the incumbent “dumb completion”. I think it is because my mind is moving ahead to what I want the code to do, not necessarily what it is doing on the console at the moment. In the end, it is using patterns to make predictions. So, any new code that is a result of changes to my approach, or my on-the-fly reworking to fix a bug (that was not due to syntax issues) took as much time to develop as non-GPT-based code completion.
  2. Testing
    • Pro: When it comes to testing an existing application, the A.I. hits it out of the park. Ask it to “write a test for myFunction() using Jest” it creates an awesome base test case that I would have hated to write for each function.
    • Con: Some of the same issues outlined in the “Code Completion” and Functional Development” can be problematic here. It doesn’t always create a great test for code I haven’t written yet. (i.e. TDD) However, if the code is already there, it uses that context I’ve provided and its LLM to unpack what it the function is suppose to do and generate all the mocks and assertions needed to create a well written unit test.
  3. Functional Development
    • Pro: Much like helping me get past the dreaded blank page in text generation, I found it more useful than Google searches and StackOverflow reviews to develop a series of functions I wanted, without developing entirely from scratch. Better than code snippets, the snippets A.I. gave were pre-filled based on my prompts, variables, and existing object definitions. That was appreciated. I didn’t have to review the documentation to tweeze out the result I wanted. The A.I. pulled it all together for me.
      Additionally, the fact that it almost always has an answer goes under appreciated in other reviews I’ve read. The part that makes it so advanced, is it fills in a lot of grey area even if I (as a stupid human) carelessly leave out an instruction that is critical in generating a solution. If I were to get the response, “could not understand your request” due to my laziness, I would never use it. The assumptions it makes are close enough to my intent that I am either using the solution, learn a new path, or see what my explanation is missing so I can improve how I communicate with it.
    • Con: The end result did not work out of the gate most of the time. Sometimes it never got it correct and I had to Google the documentation to figure the issue. This was due to what I think was more than one documentation existing for various versions of the library I was using. I’m not sure. While the syntax was correct, the parameters it assumed I needed, or the way the calls were made to interface with a library/API led to errors.
  4. Debugging
    • Pro: Per the “functional development” points above, I was impressed at how I could respond to a prompt result with “I got this error when using the code above: [error]”. It recognized where it went wrong, and attempted to rewrite the code based on that feedback.
    • Con: Each response had a different result than the original. So, instead of fixing what I found was wrong (like a param missing) it also added or removed other features from the code that were correct. This made the generated result difficult to utilize. In some cases, it could never understand the issue well enough to generate working code.

One limitation I am not too surprised about, and am hopeful to see evolve in the future, is the AI’s understanding of a project in its entirety. Done so in a way that context is used in its function creation, making the solutions it provides “full stack”. Imagine a Serverless.com config, for an AWS deployment, that generates files and code that creates and deploys workflows using Lambda, DynamoDB, S3 and so on, all being developed based on prompts. With the yearly (and more recently) weekly leaps, I don’t think we are to far away.

As of today, I find myself going to GPT when filling in starter templates for a new project. I find it’s a much better starting point than starting from cookie cutter function as I set up my core, early, “re-inventing the wheel”-type, skeleton.

For example, I will use a Gitlab template for my infrastructure (be it GL Pages, Serverless, React, nodejs or Python and on and on), then fill in the starter code an tests via a few GPT prompts, and copying them over. Beyond that copy, I find myself detaching from GPT for the most part, and returning to occasionally “rubber duck” new framework functions.

Examples referenced above

Here I asked for a non-3rd-party use of promises (only await/async) which worked. Then I asked to modify the code by adding a zip task, and it re-introduced the promisify utility when it added the zip process.

Steps in overcoming the great blank page. Write more, write well, write often.

I’ve experimented with a variety of methods to provoke myself to write more, and with any luck, write better. My latest writing routine is a combination of using my Apple iPad Pro and Apple Pencil, Text-to-Speach tools, and A.I. content generation. Through this latest addition I am writing more, writing more confidently, and feeling more free than ever to set loose the thoughts swirling in my mind.

Step 1: I capture thoughts and ideas in stream-of-consciousness in my Nebo app. If a new thought pops up mid dump, I scroll down the page and jot the branching thought down, then return to my previous cursor and carry on.

Step 2: After I’ve exhausted my train of thought, I review my hand-written notes, making corrections via the A.I handwriting interpreter built into Nebo and expanding upon incomplete or unclear ideas. I avoid self-criticism about what I am saying. Much like throwing ingredients into a crockpot, I am not concerned with specific measurements.

Step 3: Next, I head to a product that leverages GPT like Konch.ai or OpenAi’s ChatGPT. I trim, paste, move, and add to the A.I. responses. In some cases I ask GPT to merge the note into a single post. Other times I am happy with the foundation and move into editing.

I find that GPT is much like an always-available writing coach, always willing to review what I have created and give me back something with more polish or a suggestion on a new arrangement. I can disagree with it, “the third paragraph misses the point of how X affected Y in the story.” Or, delete sections and start over, “remove the intro”.

For whatever reason, it is easier for me to try and write something that “prompts” the A.I. to understand me and what I am trying to say, as opposed to writing to myself through a blank page and knowing that the feedback won’t come until after I post the entry. Much like with coding in an IDE, or learning languages via Duolingo, I get a tremendous amount of motivation and satisfaction from instant feedback.

Step 4: Now comes the fine tuning. At this point I have separated from A.I. and I’m excited and engaged with my story. Sort of how one gets going when they are surrounded by a group of friends interested in the same subject and a couple beers deep.

At this point the heavy lifting of the slabs of marble into my studio is done, the bulk of the shape is formed, and I am left with my small chisels to smooth out the edges and bring the work to life.

Step 5: I have heard reading one’s own writing aloud improves their work. I know they are right, but reading my own post aloud over and over is uncomfortable and exhausting. It is like being a bit shy of myself. But, the advancements in text-to-speech have come a long way. Tools like AWS Polly, or this free, web-based tool called TTSReader, convert my text into a voice that is more pleasant to hear than my own. I copy and paste my text, sit back and listen, and correct any mistakes I’ve missed up until now. The fact that the A.I. reads it exactly as it is written and punctuated makes mistakes sound like a scratch on a chalkboard.

Step 6: To complete the post, I either find an image online or use Konch.ai to leverage AI-based image generation via amazing technological breakthroughs like Dall-e, and Stable Defusion.

The key to advancing my writing proficiency is overcoming roadblocks that keep me from the work. Just as a new pair of sneakers get me wanting to run again, tooling like this gets me excited to write, and before I know it, I can’t stop the words from flowing to my keyboard. I no longer fear the blank white page! By using AI to remove the things I am not the best at naturally, I am able to write, post, and share my ideas much more efficiently. The end result is that I am writing more, making clearer points, and feeling more confident in my writing abilities. So, while I admit I am not respecting our literary history by slogging through the use of my feather pen and an ink blotter, I am perfectly happy skipping ahead to the fun parts to get the job done.

Don’t sell the sale

Being on either end of a sales call can be tricky. The aim is to either engage with potential customers and sell, or for a buyer, get the transparency needed and end with the best bang-for-the-buck. One of the most effective strategies adopts the “simple” art of not talking. It may sound easy, but the drive to make conversation is deeply embedded in our culture. Filling space, or finding it awkward to rest in open space, can push us further from our goal. There are a few simple ways you can remember to avoid falling into conversational land mines that work against your best interests.

Selling the Sale

The first example of this mistake is described by my group of friends as “selling the sale”. One of us will try to convince the other to take part in an activity. Say, you want to convince your friends to go on a ski trip. On the call you get through the first couple reasons you’ve prepared to convince them. Your friends unexpectedly agree. But, you are so excited to present the rest of your “great reasons” that you continue on. Even after they have agreed, you continue pitching the idea. In that moment you may hear my group call you out: “Hey man, I said yes. Don’t sell the sale”. Why do my friends call this moment out? Well, once a person says “yes” they are “in”, anything else out of your mouth can only work against you. You have gone from summiting a mountain of agreement to barreling down a hill filled with land mines. In short, once an agreement is reach – Don’t sell the sale and create opportunity to lose the ground you’ve gained. Don’t forget that your goal is to convince them, not show them how great of a sales pitch you can make. In other words, don’t sell the sale, sell the product.

Silence is Powerful

Another advantage of creating space in a conversation is humans have a bias to assume silence means “disagreement.” It means no such thing. For example, I was once on a call with a vendor. The sales rep ended their pitch and gave me their price. I said nothing. Honestly, I had no idea if it was expensive or not. Moments later I heard “…but we can do cheaper if that’s too high.”

I have seen this uncomfortable silence change rates, contracts, and features with not so much as a whisper.

Literally.

By allowing statements to sit – and breath – you allow the other person to find time to air out what is running through their mind, be it doubt, logic, or ethics. At the end of their thought process they may realize their asking price is too high, for example, or their proposed agreement is too strict as their conscience felt icky once the words left their mouth and they wish they could take it back. This approach can sound like a silly game, but it is not. It is simply allowing non-verbal communication to fast-forward any snake oil quips or rehearsed phrases. It allows the party to turn their asks into a discussion. Best of all, it required very little added effort from you.

Invest

Finally, while I haven’t researched it, I have found in practice that giving a person space to speak creates a sense of comfort. They remember the experience having went well.

Whether it be by building rapport, establishing a connection, and creating a sense of trust and collaboration, you can make your goals on a call much more achievable by practicing the art of silence. The next time you are on a call, try to allow for longer gaps of silence and see the difference it can make.

“I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops.”

― Stephen Jay Gould

The Future of Work: We are not giving up, we are finally letting go

In our rapidly advancing technological age, it’s not uncommon to hear discussions about what jobs and tasks will be taken over by machines. I tend to look at it from an flipped perspective: What if we assume every task you deal with today is meant for machines. Humans are born burdened, unnecessarily, with repetitive and labor-intensive processes of work. Our ancestors could not advance without physical labor. This is a temporary state that we deal with until we figure out the best way to, inevitably, hand these tasks off to machines. From the beginning of human history, we have always been simply the “in-between”.

Reframing our problems and ideas allow us to remove walls that are only set by tradition or cultural perspectives. Once we find ways to break free from those binds, we can more easily identify ways to advance. The goal is to increase our happiness and ease of existence, not savor the burdens we are born with, or that have been passed down.

Many people are familiar with the concept of the “mechanical Turk,” where human labor is used to perform individual tasks instead of relying on a machine. However, isn’t everything a mechanical Turk? Isn’t that definition backwards? Isn’t every task not done by a machine simply an example of us imitating machinery? From making eggs to driving to work, filling out spreadsheets, targeting investments, and delivering a baby, these are all tasks that could be broken down into simpler repetitive tasks. We are not losing tasks to machines, but freeing ourselves from machine-appropriate tasks so we can do and live as freely and unburdened as possible.

By assuming that everything is meant for machines and that humans are merely the in-between, we can more easily identify the tasks that should be handed off to machines to improve our quality of life. This shift in perspective can help us reframe problems and ideate new products and procedures that are more efficient and beneficial for humanity.