MinerU HTML
Free248 GitHub stars
Agent ToolAgnosticWeb Scraper
Overview
MinerU HTML is an SLM-powered HTML main content extractor that outputs clean HTML bodies. It is ideal for Deep Research Agents, RAG applications, and training data generation.
MinerU HTML is an SLM-powered HTML main content extractor that outputs clean HTML bodies. It is ideal for Deep Research Agents, RAG applications, and training data generation.