Curator by NVIDIA NeMo
Free1.6k GitHub stars
Platform & FrameworkAgnosticFile System
Overview
Curator is a scalable data pre-processing and curation toolkit designed for large language models (LLMs). It is ideal for data scientists and machine learning engineers looking to enhance data quality and streamline data preparation workflows.