| The fast, flexible, and elegant library for parsing and manipulating HTML a... | Pushed 2 days ago 135 contributors Created 13 years ago | 27.8k |
| The scalable web scraping and crawling library for JavaScript/Node.js. Enab... | Pushed a day ago 87 contributors Created 8 years ago | 12.3k |
| Extract the Readable Content from an HTML Document | Pushed 2 days ago 73 contributors Created 9 years ago | 8.15k |
| The next web scraper. See through the <html> noise. | Pushed 4 years ago 41 contributors Created 9 years ago | 5.84k |
| Extract meaningful content from the chaos of a web page | Pushed a year ago 57 contributors Created 8 years ago | 5.28k |
| A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Mar... | Pushed 6 days ago 19 contributors Created 6 years ago | 4.13k |
| A Node.js scraper for humans. | Pushed 19 days ago 19 contributors Created 8 years ago | 3.99k |
| Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitt... | Pushed 4 days ago 36 contributors Created 8 years ago | 2.24k |
| Download website to local directory (including all css, images, js, etc.) | Pushed 13 days ago 16 contributors Created 10 years ago | 1.51k |
| Extract main article, main image and meta data from URL | Pushed 5 days ago 16 contributors Created 8 years ago | 1.41k |
| A super simple site crawler and broken link checker | Pushed 2 months ago 24 contributors Created 5 years ago | 987 |
| Metadata scraper with support for oEmbed, Twitter Cards and Open Graph Prot... | Pushed 3 months ago 22 contributors Created 7 years ago | 468 |