Scrapy comes with a built-in web server for monitoring and controlling a Scrapy running process.
The web console is built-in Scrapy extension which comes enabled by default, but you can also disable it if you’re running tight on memory.
For more information about this extension see Web console extension.
Writing a web console extension is similar to writing any other Scrapy extensions except that the extension class must:
The id by which the Scrapy web interface will known this extension, and also the main dir under which this extension interface will work. For example, assuming Scrapy web server is listening on http://localhost:8000/ and the webconsole_id='extension1' the web main page for the interface of that extension will be:
http://localhost:8000/extension1/
wc_request is a twisted.web.http.Request object with the HTTP request sent to the web console.
It must return a str with the web page to render, typically containing HTML code.
Here’s an example of a simple web console extension that just displays a “Hello world!” text:
from scrapy.xlib.pydispatch import dispatcher
from scrapy.management.web import webconsole_discover_module
class HelloWorldConsole(object):
webconsole_id = 'helloworld'
webconsole_name = 'Hello world'
def __init__(self):
dispatcher.connect(self.webconsole_discover_module, signal=webconsole_discover_module)
def webconsole_discover_module(self):
return self
def webconsole_render(self, wc_request):
return "<html><head></head><body><h1>Hello world!</h1></body>"
If you start Scrapy with the web console enabled on http://localhost:8000/ and you access the URL:
http://localhost:8000/helloworld/
You will see a page containing a big “Hello World!” text.
Here is a list of built-in web console extensions.
Display a list of all pending Requests in the Scheduler queue, grouped by domain/spider.
Display a table with stats of all spider crawled by the current Scrapy run, including: