Grow your team on GitHub
GitHub is home to over 50 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.
Sign up
Pinned repositories
Repositories
-
scrapy-autounit
Automatic unit test generation for Scrapy.
-
scrapy-poet
Page Object pattern for Scrapy
-
scrapy-autoextract
Scrapinghub AutoExtract API integration for Scrapy
-
dateparser
python parser for human readable dates
-
splash
Lightweight, scriptable browser as a service with an HTTP API
-
shub
Scrapinghub Command Line Client
-
-
sample-projects
Sample projects showcasing Scrapinghub tech
-
shublang
Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
-
spidermon
Scrapy Extension for monitoring spiders execution.
-
-
extruct
Extract embedded metadata from HTML markup
-
crawlera-headless-proxy
A complimentary proxy to help to use Crawlera with headless browsers
-
scrapinghub-autoextract
Python clients for Scrapinghub AutoExtract API
-
number-parser
Parse numbers written in natural language
-
crawlera-clients
Crawlera HTTPS clients collection
-
autoextract-poet
web-poet definitions for AutoExtract
-
kafka-docker
Forked from wurstmeister/kafka-docker -
js2xml
Convert Javascript code to an XML document
-
frontera
A scalable frontier for web crawlers
-
scrapinghub-stack-scrapy
Software stack with latest Scrapy and updated deps
-
-
-
-
article-extraction-benchmark
Article extraction benchmark: dataset and evaluation scripts
-
autoextract-spiders
Pre-built Scrapy spiders for AutoExtract
-