In the past couple of years, the advancement in Large Language Models (LLM) has continued to push the limits of what is possible in natural language processing (NLP). However, there’s a concerning gap between the hype and what LLMs actually are.
At its core, a language model is tasked with predicting what word comes next. These unsupervised models are trained on a huge amount of text data and their main objective is to learn a language’s statistical patterns and assign probability to word sequences given the short-range surrounding context. Therefore, by starting with these pre-trained models and fine-tuning them for more specific tasks, LLMs can enhance many NLP tasks and activities.
For all their benefits, it’s important to note these models don’t have a deep understanding of what they’re reading or writing. In fact, LLMs have a lot of limitations when it comes to factual outputs, scientific reasoning, and causal relationships, just to name a few. Ignoring these shortcomings leads to misrepresentation of what LLMs can accomplish.
To overcome these limitations, the industry is racing to train ever larger language models measured by the size of their parameters and training data. GPT-3, is one of the most popular LLMs. The autoregressive language model has over 175 billion parameters trained on a large dataset containing Common Crawl, web texts, books, and Wikipedia (1). And among the latest to join the trend is Meta AI’s Galactica with 120 billion parameters trained on 48 million examples of sources including scientific papers, textbooks, lecture notes, websites, and encyclopedias (2).
Risks and Downsides
But there are risks and even downsides to training larger and larger language models. (3) Galactica was introduced as a LLM “ that can store, combine and reason about scientific knowledge.”(2) Indeed, marketing around the model’s launch advertised it as a practical research tool that could even write complete scientific papers. In practice, however, and as expected from a large language model, Galactica’s outputs were not based on real facts. After three days of intense criticism, Meta took down the public demo of Galactica.
Galactica’s failure illustrates one of the main limitations of LLMs: they are prone to producing content that sounds factual and scientific but is in fact subjective, inaccurate –– and even fictional. In fact, the massive amount of training data needed for LLMs is one of the root causes of this problem. When training LLMs, researchers collect as much relevant data they can from the internet and therefore there is limited curation and verification applied to the training data. As a result, these LLMs are trained to reproduce biases depending on what data they were fed. This type of training data also frequently contains conflicting sources of content, making it difficult for the models to distinguish fact from fiction. Moreover, this training data is relatively static and does not capture developing and shifting of human knowledge or the fluidity of events that provide context for reasoning and understanding. This phenomenon is highlighted in both social movements and scientific discoveries. For example, what was considered scientific fact when the Covid-19 pandemic began might not be scientifically accepted now. Given the high computational cost of training these LLMs, it’s likely unfeasible to retrain them often enough to keep up with ever-changing knowledge.
Lastly, another pitfall of LLMs is the “hallucination” factor. The general-purpose nature of LLMs, combined with their lack of contextual awareness have led the models to blend information and produce nonsensical outputs. Galactica, for example, was claimed to be able to “summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins,”(2) but the LLM made up fake articles (4), such as one on the history of bears in space (5), and another on the benefit of eating crushed glass. (6) In other words, the tendency of LLMs to create fictional output makes the models unfit for organizations looking for AI systems that produce reliable, factual results.
Need for Specialization
Similar to how humanity has gone from generalization to specialization, the AI landscape also needs to evolve to be more specialized. It is therefore necessary to develop ensembles of smaller and more focused AI models. This was one of the main inspirations driving the development of the Charli AI Ancaeus platform, a breakthrough technology that supports orchestration, tracking, and interaction of ensembles of targeted AI models. The use of a network of specialized AI models yields fact-based, verifiable outcomes; and this is based on the precision of contextually relevant data for training and analysis.
In stark contrast to the static nature of LLMs, the Charli AI Ancaeus platform enables users to provide intuitive and relevant feedback at every stage. It is designed to learn and adapt based on these human interactions and feedback loops. Though there are baseline out-of-the-box features, when combined with the platform’s capabilities, these networked models’ size and specialization allow the models to be continuously retrained and customized for users, teams, organizations, and industries.
At times when enterprises need to increase revenue and decrease costs, generative AI has a big role to play. However, the requirement of factual and verifiable results by professional organizations limits the viability of one-size-fits-all LLMs, and other solutions are required.
Charli AI is the world’s smartest AI-powered content services platform for enterprises. Founded in 2019, Charli AI (Charli) is a first-of-its-kind, AI-driven intelligent content management platform that tracks, organizes, understands, interprets, analyzes and automates business content enterprise-wide, unlocking knowledge, improving collaboration across teams, and streamlining operations. Charli AI provides businesses with a competitive advantage in today’s digital world by reducing content chaos and manual effort, allowing workforces to focus on contributing their expertise. Today, Charli AI can integrate with over 500 applications giving it broad appeal across industry sectors.
- Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
- Taylor, R., Kardas, M., Cucurull, G., Scialom, T., Hartshorn, A., Saravia, E., … & Stojnic, R. (2022). Galactica: A Large Language Model for Science. arXiv preprint arXiv:2211.09085.
- Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021, March). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?🦜. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610-623).