April 30, 2024

TechNewsInsight

Technology/Tech News – Get all the latest news on Technology, Gadgets with reviews, prices, features, highlights and specificatio

A data library will be released that summarizes more than 10,000 LLM types and visualizes the number of downloads and similarities in an easy-to-understand way – GIGAZINE

A data library will be released that summarizes more than 10,000 LLM types and visualizes the number of downloads and similarities in an easy-to-understand way – GIGAZINE



Since the second half of 2022, countless large-scale language models (LLMs) and artificial intelligence services such as `ChatGPT` and `Bard` have appeared, and users around the world have begun to actively use generative AI. Many of these large language models are repositories of machine learning models and data sets.face huggingBut researchers at Stanford University have released a new visualization of the hugging face data.

[2307.09793] On the origin of LLMs: an evolutionary tree and graph of 15,821 large language archetypes.
https://doi.org/10.48550/arXiv.2307.09793

constellation
https://constellation.sites.stanford.edu/

Go to “Constellation” above and click on “Access Constellation”.


Next, select the LLM you want to view. The number above is the minimum number of downloads. Change it when you want to display only those Hugging Face downloads that exceed the specified number. This allows you to narrow down to only common LLMs. The number below is called the number of clusters, which simply specifies the number of clusters the LLM is divided into. LLMs are grouped by similarity.


This time, select the checkbox to display the word cloud and click on “Run Combinations”. After a while, some graphs will be displayed.


The first thing displayed is a tree diagram that organizes all the LLMs filtered by this specification. It is very hard to see, but you can zoom in and view it by making full use of the zoom feature.


LLM with similar accuracy to ChatGPT and Google BardVicuna-13bYou can check what is “derived from”, what is a similar language model, etc.


This is it,Louvain methodIt is a graph when each LLM is divided into several communities using. Closely associated LLMs are considered to be a community surrounded by a thin circle. Hovering over each node (LLM) shows the name of the LLM, the order of the number of downloads, the number of downloads, the number of “likes” made in Hugging Face, and the number of parameters.

See also  Snapdragon 8 Gen 2 AndroidXiaomi Pad 6S Pro 12.4m Xiaomi Focus Pen


What you then see is an aggregated list of the top 20 LLMs with the most ties to each LLM. Open source and commercially availableFalconrepresent the top three positions.


Then you will see a list of LLMs sorted by community size. LargestFalcon-7B-Anarch. The second largest “GPT-3Aiming for an open source language paradigm with performance close to “GPT-NeoIt is a modelgpt-neo-125mHe is.


For each group, a word cloud is also shown showing which group of models stand out.


The last thing you see is a graph showing the number of downloads versus the number of likes. OpenAI’s open source LLM”GPT-2“He was.


LLM is often given similar names like “GPT” or “Model”. The researchers who created this data library generated a detailed list of the number of words that appear in LLM names,magazineIt has been described.


Copy the title and URL of this article