OpenAssistant releases its open-source ChatGPT competitor


newsletter
Newsletter

OpenAssistant is supposed to become a real open-source alternative to OpenAI’s ChatGPT. Now first models, training data, and code are available.

The OpenAssistant project started in December, shortly after OpenAI released ChatGPT. The goal is to create an open-source AI assistant with the same capabilities. To that end, the team spent months collecting a “human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages distributed across 66,497 conversation trees, in 35 different languages, annotated with 461,292 quality ratings,” with the help of more than 13,500 volunteers.

Now OpenAssistant models, training data, and code are available.

OpenAssistant’s releases models up to 30 billion parameters

The OpenAssistant team has used the collected instructional data to refine several language models, including variants of Meta’s LLaMA model and EleutherAI’s Pyhtia model. The largest variant is based on the LLaMA model with 30 billion parameters. Like Alpaca or Vicuna, the models are “instruction-tuned” and have not been further improved by reinforcement learning with human feedback (RLHF).

ad

Nevertheless, the results generated by the chatbots should approach those of ChatGPT’s gpt-3.5-turbo model, according to a comparative study with volunteers. Initial experiments with plugins such as a Google search are already underway. The team also plans to train and release a LLaMA-30B model with RLHF in the future.

The Pythia models are already available and the LLaMA models will be released soon. While the LLaMA models cannot be used commercially due to Meta’s licensing, the Pythia models are licensed for commercial use.

In addition to the models, the team also releases the code and with OpenAssistant Conversations the collected data. In addition, all models can be tried out via a web interface, where the conversations can also be evaluated and used to further improve the models.

OpenAssistant speaks openly about current limitations

According to an accompanying paper, the models exhibit the well-known problems of large language models, such as hallucinations. It also says that the training data collected was mostly contributed by male annotators, with a median age of 26. “This demographic profile may inadvertently introduce biases in the dataset, as it is bound to reflect the values, perspectives, and interests of the annotators ,” the paper states.

The team has also taken steps to detect and remove harmful messages in the dataset, but the system is not infallible, it says. “Given the limitations discussed above, we advocate for the use of our LLMs in academic research contexts only,” the paper states. “We strongly encourage researchers to thoroughly investigate the safety and bias of the models before employing them in downstream tasks. It is important to recognize that the released models may exhibit unsafe behavior and are likely susceptible to prompt injection attacks.”

Recommendation

OpenAssistant web interface. The code and more details are available on GitHub. The models are available on Hugging Face.

OpenAssistant was founded by Andreas Köpf, Yannic Kilcher, Huu Nguyen, and Christoph Schumann and includes a team of over 20 developers, data and security experts, and a moderation and documentation team. The project is supported with computational resources, tools, and other assistance by Redmond AI, Hugging Face, Weights & Biases, as well as Stability AI, and LAION.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top