CodeGen is a family of standard transformer-based auto-regressive language models for program synthesis, which as defined by the authors as a method for generating computer programs that solve specified problems, using input-output examples or natural language descriptions.
Like many other LLMs, Phi-2 is a transformer-based model with a next-word prediction objective that
is trained on billions of tokens. At 2.7 billion parameters, Phi-2 is a relatively small language model,
but it achieves outstanding performance on a variety of tasks, including common sense reasoning,
language understanding, math, and coding. For reference, GPT 3.5 has 175 billion parameters and the
smallest version of LLaMA-2 has 7 billion parameters.
According to Microsoft, Phi-2 is capable of matching or
outperforming models up to 25 times larger due to more carefully curated training data and model
scaling.
ChatGLM-6B is an open bilingual (Chinese-English) language model with 6.2 billion parameters. It’s
optimized for Chinese conversation based on General Language Model (GLM) architecture. GLM is a
pretraining framework that seeks to combine the strengths of autoencoder models (like BERT) and
autoregressive models (like GPT). The GLM framework randomly blanks out continuous spans of
tokens from the input text (autoencoding methodology) and trains the model to sequentially
reconstruct the spans (autoregressive pretraining methodology).
MusicGen is an autoregressive, transformer-based model that predicts the next segment of a piece of
music based on previous segments. This is a similar approach to language models predicting the next
token.