Pages

Saturday, March 9, 2024

The Waiting Is The Hardest Part



For the past few weeks, I have been in awe of what OpenAI's Sora Video model has been able to create. The early examples that have been released to the public show great promise. At the time the Sora team dropped the first videos, AI Video was not good enough for me to focus on. I was focused on the early stages of planning my first Graphic Novel. Something I have wanted to do since I got swept up in the AI frenzy last Spring.

However, when I glimpsed those first Sora videos I realized that such a tool would allow me to make movies. Short movies to start, but movies nonetheless. Not only have I written novels, novellas, short stories, movie scripts, and TV scripts I have also illustrated a book using AI tools, acted on stage and screen, as well as directed, produced, edited, and designed the music for several short films. While I would never claim to be an expert at all of these, I feel confident that in combination with what I have been learning about AI tools I can not only create graphic novels but also movies. 

There are a ton of people who are much better than I am at using these generative AI tools. I can not claim to be an expert at prompt engineering either. So many of these bright people are a lot younger than me and grew up learning about programming. A decade ago, I took a couple of simple courses about coding because I knew it might be a skill worth having. I did not stick to it and I regret not having done more at the time. Last year, I considered taking a few more courses on coding as I was learning about Machine Learning, Deep Learning, and LLMs. My goal was to understand the technology so that I could better use it myself. 

What I learned is that the engineers who have created these tools are not making it just for people like themselves. They are constantly trying to make these tools usable for everyone, whether you have a degree in Data Science or not. My hope was to get out in front of the technological wave so that I might gain an advantage, not to become a Machine Learning Engineer. 

I made some sacrifices over the past year that I would not have made had it not been for the rise of AI. Last Spring, I, like most of us, could see where things were headed. While I have jumped in with both feet to try and adapt to the inevitable change, not everyone has. A lot of people are scared of AI. And I get it. My first reaction when I realized where things were headed was one mixed with pessimism having seen movies where AI goes rogue and destroys humanity. That is definitely in our collective psyche. And we are right to be apprehensive. We have no idea what things will be like once Artificial General Intelligence (AGI) is achieved. Will the public be allowed to interact with it? If so, how long will we have to wait? 

AGI is only the next step with AI. There is not even an agreed-upon definition of what it is. Some say that it may not be smarter than us but that it must be self-aware. Others say that it must be smarter than us in every way. For me, I need the next step in AI, which may have been achieved with Claude 3 or with GPT-5, to be the ideal assistant. GPT-4 has been pretty good but it seems to have regressed a bit, taking several prompts for it to understand what was clearly explained from the start. 

I am a storyteller and while I have no intention of letting AI write my stories, I need it to be better at completing tasks that I ask it to do. For example, I have been working on pitch documents for a TV series that I created. It is tedious work and there is a lot of trial and error like most writing. I have fed GPT-4 a script and other documents about the story to see how it might do in reviewing what I have written. I did make some changes to the bible and pitch deck based on its feedback, and, during the chat, I also discovered more information about interactive storytelling. 

While the interactions are usually helpful in some way, this model takes a lot of prompting and input to create material that is useable and it is usually less than you need. For instance, what would help me out a lot is if I were to provide GPT-4 the pilot script and the Bible, which is a detailed explanation of the TV series, and have it create the Pitch Deck. These are much shorter than a Bible and serve more as a summary with plenty of images. While GPT-4 can create a summary of the Bible it cannot take all that I provided and create a Pitch Deck, even though it can create images. It is okay at summarizing things and can definitely help in the overall process of creating scripts, Bibles, and Pitch Decks. However, it cannot do any of these in a way that would reflect what I want it to do to save me more time. 

This will not be the case in the near future. Soon, I will be able to have a multimodal AI model be a more competent assistant that will not only help me brainstorm and summarize what I have already written but actually be able to take what I have written and help me create materials like a Pitch Deck. I know these new models will be able to do even more, but I want to maintain control of the writing of the story. I enjoy the story-creation process more than anything. However, I am open to having these tools make my process even more efficient so that I can write even more stories instead of being bogged down by having to create all of this other material. 

What a current model like GPT-4 can help me do right now is help me create graphic novels. It is very good at reformatting screenplays into a graphic novel script, which it can then help generate images with panels that make up graphic novel pages. It is not perfect. The main limitation here is Dall-E 3, which is a decent AI Image generator and can handle some text, but it is by no means perfect. I see future models being much, much better at this, making it a hell of a lot easier for me to turn my screenplays into graphic novels -- something I have always wanted to do. 

These current models will only improve, which makes this a frustrating time because we can all see what will be possible. While I was right to get my hopes up last year, things aren't quite where I need them to be for me to create a media empire all on my own just yet. We are close, very close. That is the frustrating part. Because I am ready. 

Some people are attempting this with pre-Sora AI video content, but I don't want my stories told with that dodgy-looking stuff. Then I saw these Sora videos and it had me thinking I might need to change plans for the year and focus on AI videos. That was until I saw the following interview with the Sora team (Bill Peebles, Tim Brooks, and Aditya Ramesh) 
last night.
They said that Sora would not be made available to the public anytime soon. My heart sank as I had been desperate to get my hands on it. The good thing is they are working on making it more user-friendly by adding controls to the output so that you are not just stuck with an output created only by a prompt. That is great! Then the team lead Aditya chimed in with what their main goal is by saying, "Modeling reality is the first step to be able to transcend it." 

Their interview with Marques Brownlee starts at 53:00. 

Not only will that help me as a filmmaker but it is also something that will help OpenAI reach AGI, if they haven't already by one of the many definitions out there. That also means they are likely thinking of interactive experiences that are not yet possible. This is something I am also very keen about. Star Trek: The Next Generation brought us something that we have wanted ever since "The Holo Deck." This would allow you to go anywhere and do anything and it would seem like you are there. It would be like The Matrix or Ready Player One. A virtual reality that is indistinguishable from our own. A place where our digital selves could live forever, as long as the power stays on. I wrote a story about that with my longtime co-writer Chuck Thomas over a decade ago. Might be time to revisit that. 

Anyway, it sounds like we are a ways off from that as well as having access to Sora. This means there is no need for me to jump into AI Videos other than to keep up with how to integrate these tools into projects. While I am not keen on the all-AI video projects at this moment, I am interested in those who are combining live action with some AI video. While I can see this being an acceptable way to tell a story, that only seems feasible if you are already shooting films with actors. I haven't done that as an independent filmmaker in a very long time. I don't anticipate myself going back down that path unless I have funding up front. Not that I wouldn't like to it's just that it doesn't make sense at this time. That is why I am so keen to create AI films. I have the content and ability, I'm just awaiting the tools to progress to the next level and be made available to me. 

In the meantime, I will revert back to my plan of focusing on the illustrated novel series. Sora had me dreaming of all sorts of AI Video possibilities but that will have to wait. I just need to keep learning and preparing for the time when I will not only be able to transform my screenplays to Graphic Novels but also into AI Movies and TV Shows. We are getting so close that it is okay to dream about all of that being possible. I am so excited for all of this after years of thinking I would just have to be satisfied with having written these amazing stories that no one would ever get to experience. Buckle up world. AI is improving and with it I will be creating and releasing all kinds of new material. Exciting times. 

No comments: