A Humanistic Perspective on AI Generative Models

Recently, the popular Chat GPT introduced by its developer OpenAI is described as follows:

“ChatGPT is an AI-powered language model developed by OpenAI, capable of generating human-like text based on context and past conversations.”

According to this definition, it is an AI language model developed by OpenAI that is capable of generating “human-like” text based on context and past conversations. Hence, it is named ‘Chat’ GPT, meaning “Generative Pre-trained Transformer.” What captures my interest in these concepts are the words “human-like” and “generation.”

Both “human-like” and “generation” are deeply linked and have a significant humanistic essence. The reason being, we perceive something as “human-like” not because it repetitively presents what is already established based on a fixed principle, but rather, it continuously generates relevant meanings within a relational context over time. This dialogic relationship through specific mediums (especially language, but images are possible too) makes us feel that the entity is “human-like.”

While the term “human-like” in the context defined by Chat GPT leans more towards humanities, “generation” should indeed be examined in the context of AI technology implementation. However, the use of these two words in one sentence is no coincidence.

Let’s trace the meaning of “generation” first, as outlined in their definition. Simply put, it translates to the question: What is a “generative model” or “generative artificial intelligence”? The dictionary definition of “generation” implies the emergence of something new, not merely a repetition or reproduction of what exists—a term filled with the ambitions of the AI industry. Yet, what matters to us is not our ‘feeling’ towards the word but how it is actually used.

In essence, ‘generation’ here represents what current AI technology aims to achieve. Namely, ‘generative artificial intelligence’ refers to the process of learning defined probability distributions from specific parameters, seeking the probability values that enable this. Ultimately, the goal is to minimize the difference between two probability distributions, using metrics like KL-Divergence in the MLE learning method to reduce these differences. However, since it’s impossible to grasp the entire probability distribution at once, comparing the two distributions is theoretically impractical. Instead, the alternative involves maximizing ‘likelihood’ defined as ‘the probability of the parameter-defined probability distribution restoring the actual distribution.’

MLE learning in generative models has a history. Initially, AR models were popular. For example, all digital images are composed of pixels. To “generate” a real-life image, one would need to know the entire probability distribution, which is impossible. Therefore, the approach involves calculating the ‘joint probabilities’ of pixels step by step through ‘conditional probabilities.’ However, a critical limitation of AR models was the difficulty in determining the ‘order’ in which to calculate these probabilities. What sequence is appropriate? This was a significant challenge.

Therefore, VAE models emerged. These models do not seek to find a probability distribution in any particular sequence but generate images instantly through latent variables. For instance, if generating a human face, one would define latent variables necessary for creating a face—like gender, ethnicity, hair color, skin tone—and generate an image from the probability distribution defined by these variables. However, this method also has limitations as it takes too long to model the actual probability distribution mathematically.

To overcome these issues, EBM models were introduced, using functions that calculate the magnetic energy between materials in physics, adapted to fit probability distributions. This allowed for the use of arbitrary functions and provided high mathematical stability, though the downside was performance degradation due to long computation times.

To address this, SBM models appeared. Even without knowing the current state’s probability distribution, these models continually adjust points in a direction that increases similarity to the actual distribution through a process called score matching. Eventually, the point converges to the highest similarity, but this method has limits: if the random noise is too high, learning becomes impossible; if too low, movement is too slow.

Following MLE methods in generative models, alternatives emerged to address its shortcomings. MLE methods aim to maximize ‘likelihood,’ hence generating ‘plausible’ images from real-life probability distributions. However, this often fails to generate ‘unlikely but possible events.’

Models like GANs were developed to overcome these limitations, incorporating two opposing models within a single system. While the generative model continuously produces ‘realistic fakes,’ the discriminator model determines if they are real or fake. If the competition between them is consistent, it results in images indistinguishable as real or fake. However, the issue with this approach is oscillation between the models, and the generative model need only fool the discriminator, not necessarily appealing to human perceptions, leading to strange images (mode collapse). To address these limitations, models like cycle GAN emerged, allowing domain-to-domain transformations, ensuring that both models could differentiate real from fake and that images could be restored to their original state upon returning. However, this also means a vulnerability to changes, retaining the limitations of MLE methods in generating ‘unlikely events.’

In conclusion, as we delve into the history of AI technology in ‘generation,’ it essentially means ‘sampling by probability distribution.’ The evolution of thought on how to reduce differences with actual distributions to generate realistic images is evident.

Now, applying this concept of image generation to Chat GPT as a language model might be beneficial. The key is whether this ‘generation’ truly aligns with its original intent and how it now relates to the concept of being “human-like.”

We often use “human-like” in various contexts—whether something looks similar, makes rational judgments, exhibits sensitivity to pain or pleasure, shows autonomous choice, assigns value to itself, or endures disadvantages for certain understandings or goals.

In Chat GPT’s case, “human-like” entirely stems from its language capabilities, appearing to make rational judgments and thereby seeming thoughtful, intelligent, and facing an entity with a personality.

However, there are significant gaps between these linked feelings. More detailed discussions on this will follow, but for now, it’s crucial to note that equating language ability, rational judgment, thought, intellectual existence, and personal entity too readily is a prejudice stemming from modernity.

If we loosen and broaden the connections and contexts of “human-like” situations or conditions, the uncritical and blind linkage between the terms “human-like” and “generation” in Chat GPT’s definition now appears challenging.

Nonetheless, as the dependency on generative AI increases visibly, we notice a shortsightedness in people, even handing over uniquely human and noble tasks wholesale.

It’s not about not using AI but using it wisely when needed, without depending on it for our thoughts or lives. If the gap between “generation” and “human-like” has widened even slightly, let’s strengthen our ability to use AI wisely as masters of our lives. After all, ‘humanitas,’ a broad concept of humanity, can amply encompass ‘artificial intelligence.’

HEAVENPINE

A Humanistic Perspective on AI Generative Models

Leave a comment Cancel reply

A Humanistic Perspective on AI Generative Models

Share this:

Leave a comment Cancel reply