Trafilatura
Free6.0k GitHub stars
Agent ToolAgnosticWeb Scraper
Overview
Trafilatura is a Python and command-line tool designed for gathering text and metadata from the web through crawling and scraping. It is ideal for researchers and developers looking to extract and process web content in various formats such as CSV, JSON, and HTML.