Overview: Top Python frameworks streamline the entire lifecycle of artificial intelligence projects from research to ...
Welcome to the lm-evaluation-harness! This application provides a simple way to evaluate autoregressive language models. Whether you're a researcher or just someone interested in experimenting with ...
PatchEval is a benchmark designed to systematically evaluate LLMs and Agents in the task of automated vulnerability repair. It includes 1,000 vulnerabilities sourced from CVEs reported between 2015 ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results