Language & LLMs
What Is a Corpus?
A corpus is a large and structured collection of text used for linguistic analysis or training machine learning models. It may be drawn from books, websites, articles, or transcribed speech. The size and quality of a corpus strongly affect the performance of models trained on it.
Further reading
Read more about corpus — articles and blogs from around the web: