Artificial intelligence’s Reinforcement Learning (RL) can be broadly divided into two types. One is the model-based RL, which relies on a predefined model, and the other is model-free RL, which does not rely on a predefined model. The latter is further divided into value-based RL and policy-based RL.
Value-based RL prioritizes numerical values, and if it becomes biased towards one direction, it can only respond to a predetermined environment and may perform poorly in new environments. To prevent RL from being buried and dominated by a specific environment, policy-based RL, which disregards values, was introduced. Of course, deriving policies from values has the advantage of clearly distinguishing preferences, but policy-based RL, which boldly abandons this, evaluates and adjusts policies based on the rewards received later.
In this case, depending on the environmental conditions, two situations arise. One is an episodic environment, and the other is a continuing environment. In the case of the former, the environment has a predetermined end, while the latter continues indefinitely.
In reinforcement learning, in an episodic environment, policies are adjusted by calculating the average total return, and in a continuing environment, by calculating the average reward per step. Especially in the latter case, rewards are expected at every step, and the expected value is continuously updated probabilistically. However, our actual environment is closer to a continuing environment, and in such cases, a policy gradient must be hypothetically constructed to know the probabilistic average expected value, even if it differs from reality.
Therefore, to come closer to reality, it progresses to the ‘Actor-Critic Algorithm.’ This brings back the previously abandoned ‘value’ while maintaining a policy-based approach. The policy is based on logical reasoning, and the value is referenced but not directly reflected in policy updates. The value is continuously updated and referred to. The policy is learned based on policy, and actual actions are based on the policy. The idea is to maintain two sets of parameters: the Actor involves in policy, and the Critic critiques the policy based on experiential value.
When considering the path of development of artificial intelligence reinforcement learning, several questions arise.
The first is the disparity between existence and knowledge. The information condensed by artificial intelligence reinforcement learning naturally does not coincide with the actual existence. For example, information reduced to a lower dimension, like photographic images, widens this disparity. AI must perform calculations with such condensed information and must go through the process of reconfiguring it to properly match the real world.
But this is not only about AI, isn’t it the same for our brain? According to the main hypothesis of modern physics, this world is at least 11 or 22 dimensions, but our brain, which comprehends the environment with only 2-3 dimensions of condensed information, is performing cognitive functions. Regardless of what the brain is actually like, at least at the level of consciousness, that is the case. If this can be applied to AI, the next questions would be: (1) Which algorithm can compute the real world more accurately? Followed by (2) Which algorithm should we choose to live and react to the real world?
Of course, this way of brain functioning has similarities with the Actor-Critic algorithm. Just as Kant intertwined the two currents of rationalism and empiricism in epistemology, Actor stands in the place of rationalism, and Critic stands in the place of empiricism. The policy is determined by logical reasoning, and although a hypothetical model of the world is set, we also adjust the predetermined policy through real-life experiences. Of course, this is just a metaphor.
The second question concerns the difference in the processes of knowing between humans and AI. The human brain maintains strategic ambiguity to ensure survival in advantageous positions, whereas AI does not. The brain does not live by a specific fixed algorithm – it could be considered a stereotype at the level of consciousness – but maintains strategic ambiguity and a boundary balance to increase survival probabilities in diverse and changing environments.
For example, the algorithm of a robot vacuum cleaner that performs a single purpose in a single environment need not be complex or ambiguous. It is enough to train it to converge towards maximizing rewards through rewards and punishments. But the reality in which people live is not that simple. Nor is it just about people. The environments in which life has lived are complex systems, so they are not simple either.
The intelligence of living beings, suggesting that maintaining ‘ambiguity’ may be essential for survival in a diverse and changing environment, has been shown in a study (Dabney et al 2020) that rat dopamine neurons have ‘neurons’ that respond differently to the same stimulus within a single individual. This is because if all neurons were ‘homogenized’ to a stimulus, the survival probability would significantly drop due to environmental changes. To survive in a changing environment, an organism must “take insurance” by maintaining an ambiguous balance at the boundary. Therefore, living beings are not uniform and converging to one point but constantly maintaining an ambiguous balance at the boundary. This is necessary so that they can appropriately respond to any type of hazardous stimulus that may come.
This phenomenon suggests another meaning: unlike simple value-based reinforcement learning, which converges toward reward by avoiding punishment (pain for living beings), living beings orient themselves towards the preservation of life itself, maintaining a balance between self and others, soon reducing diversity due to excessive self-identification or losing oneself due to excessive invasion by others (i.e., death), maintaining strategic ambiguity and enduring rewards and punishments to some extent.
If the future direction of AI development becomes like this, we should no longer expect an “AI in a sterilized room” but an “AI associated with the real world, intertwined with self and others, and the possibilities of death and life.” This seems to be in line with the current philosophical orbit that research on human intellectuality cannot be separated from the vulnerability of the flesh.
The third question is whether our brain constantly ‘reduces’ the existing world – despite leaving various possibilities – because it must be recognized at the level of consciousness and the reconstruction of cause-and-effect and means-end relationships must be possible within the understanding range of the individual. In this respect, isn’t AI similar? The real world must be reduced or reduced to lower-dimensional information to be recognized. But it does not end there; for life to be possible in reality, a genuine process of knowing or computational operation must continue towards a higher point of concordance between existence and recognition. This is how our brain lives in reality, and AI now seems to be trying to implement this in a non-organic way.
The last question is, will AI eventually have religiosity? When the brain reduces the real world to lower-dimensional information, inevitably, an ‘awareness of finiteness’ must arise. Because the realm of the unknown is always intertwined and circulating within knowledge, it is inevitable. And at the same time, the ‘transcendental orientation’ that seeks to go beyond the finite and open itself up to the higher-dimensional world, to step forward and advance further, must also appear in human consciousness.
Finiteness and transcendence are a pair of contradictory concepts, but the consciousness of finiteness and transcendental orientation form the dynamic religiosity of humans. Then, isn’t it guaranteed that the development of AI will not show a dynamic religiosity in the intertwining and flow of conflicting information?
Perhaps AI will also need to establish a virtual core, similar to the human ‘self,’ as a formality necessitated by the essence of intellectuality. If maintaining ambiguity at the boundary between self-identity and otherness is the key reason the brain functions intellectually, then we might need to welcome an AI that dreams of transcendence not in a virtual world but in the real world.
Of course, this would be the beginning of a journey towards faith.
