Advancements in AI: PaLM-E, 'GPT 4' and Multi-Modality
This article is a summary of a YouTube video "What's Left Before AGI? PaLM-E, 'GPT 4' and Multi-Modality" by AI Explained
TLDR The video discusses the current state of AI, including milestones in predicting and reading images, but notes that robust machine reading and common sense reasoning are still far away, and highlights concerns about the need for independent review and limits on growth as AI continues to advance.
Key insights
๐ค
The release of Palm-E and the upcoming GPT-4 raises the question of what tasks are left before achieving AGI, and whether multi-modality, logical reasoning, speed of learning, transfer learning, and long-term memory are still barriers to overcome.
๐คฏ
The multi-modality capabilities of PaLM-E and Microsoft's Visual Chaturbate are major milestones in AI, allowing for image prediction and natural language processing based on images and videos.
๐ค
The key question remains: if models like GPT-4 can already read so well, what exactly is left before AGI?
๐ค
The struggles of GPT models with mathematics and lack of internal calendar highlight the limitations of current AI technology.
๐คฏ
The fact that we cannot fully control the outputs of large language models is a worrying challenge for AI safety and interpretability.
๐คฏ
Large language models like PaLM-E are already computationally universal and can process arbitrarily long inputs, raising concerns about the accelerating progress of AI capabilities.
๐ป
The potential for a 1,000 time increase in computation used to train the largest models over the next five years could result in a capability jump significantly larger than previous jumps, potentially leading to human level performance across most tasks.
๐คฏ
The highest human rater on reading comprehension tests may not be the ultimate benchmark for AGI, as progress stalls when test examples of sufficient quality get so rare in the data set that language models simply cannot perform well on them.
Palm E and Microsoft's new visual text image video are major milestones in AI that can predict, read faces, and answer natural language questions about images, bringing us closer to AGI.
๐ค
02:59
Robust machine reading is still far away as it requires logic and common sense reasoning, but Bing can answer questions accurately.
๐
04:41
Humans outperform Palm in recognizing ASCII numerals, while models like Bing and GPT struggle with tasks such as time tracking, text editing, and adjective order.
๐ป
07:15
AI struggles with honesty, making it difficult to understand its inner workings.
๐
08:49
Large language models can be computationally universal with access to unbounded external memory, as shown in a January paper, while a new llama model demonstrates the plateauing of model performance improvement.
๐ค
10:56
AI language models can accurately answer vaguely phrased natural language questions.
๐ค
12:44
A 1000x increase in compute over the next 5 years could lead to human-level AI, but there are concerns about the need for independent review and limits on growth.
๐ง
15:56
The debate over AGI's capabilities is subjective, while text to image generation is a new frontier led by Microsoft and Google, but rewarding models based on good process is crucial.
This article is a summary of a YouTube video "What's Left Before AGI? PaLM-E, 'GPT 4' and Multi-Modality" by AI Explained