From the course: GPT-4: The New GPT Release and What You Need to Know

Unlock the full course today

Join today to access over 23,100 courses taught by industry experts.

HELM

HELM

- [Instructor] When organizations like OpenAI and Google train and make large language models available, they often spend millions of dollars doing so. Now, when these models are used in products like Google Search, this can impact billions of users. One of the things we don't have is a standard way to compare these models. Although we're interested in how good they are at a task, we don't know if the same model generates false information. Instead of looking at just one metric, Stanford University researchers proposed HELM, or the Holistic Evaluation of Language Models in their paper. There's a lot covered in this paper around different scenarios and metrics and benchmarks. We'll just focus on comparing the large language models for now. At the time of this recording, GPT-4 does not appear in this research, but it's important to understand how to evaluate large language models in general. Now, I know it isn't very easy to…

Contents