WikiBot help

WikiBot is an IRC bot that helps archive MediaWiki and DokuWiki sites. View all WikiTeam downloads - View only WikiBot downloads.

What is WikiTeam?

WikiBot and WikiTeam3+WARC support are maintained by DigitalDragons. WikiTeam3 and DokuWikiDumper are maintained by yzqzss and the saveweb project.

All jobs will require the --explain parameter to be passed with a short explanation of why the job was started. (eg --explain no coverage)

MediaWiki (WikiTeam3)

Archive MediaWiki sites with WikiTeam3.

Usage: !mediawikisingle <options>
Alias: !mw

on/off switches are OFF by default

Option Type Explanation
--url URL URL of the MediaWiki instance - it will auto-detect the API and index.php pages
--api URL URL of the MediaWiki's api.php. If not specified, it will be automatically detected if possible.
--index URL URL of the MediaWiki's index.php. If not specified, it will be automatically detected if possible.
--delay double The delay, in seconds, that should be made between requests (eg --delay 1.5, default 1.5)
--retries int How many times failed requests should be retried before the job fails (default 5)
--api_chunksize int The maximum number of revisions that the bot should try to request from the MediaWiki API (default 50)
--xml on/off Whether the bot should download XML (page content)
--xmlapiexport on/off Export XML via the MediaWiki API? Requires --xml
--xmlrevisions on/off Export XML via the MediaWiki Revisions API? Requires --xml. (Recommended, fastest method, requires MW 1.27+)
--curonly on/off Only download the latest version of every page
--insecure on/off Ignore bad certificates on the web server, and other SSL errors.
--images on/off Whether the bot should download images
--bypass-cdn-image-compression on/off Attempts to bypass CDN image compression (e.g. Cloudflare Polish) and get the uncompressed file)
--disable-image-verify on/off (DANGEROUS!) Disable verifying the hash of downloaded images with the one sent by the server.
--force on/off Bypass "This wiki has already been uploaded to IA this year" and "This is a Wikimedia Foundation wiki" protections.

(BETA) You may also request pages be saved in WARC format. Please note that this is very slow compared to normal dumping, and that it still will not make wikis playable in the Wayback Machine.

Option Type Explanation
--warc-images on/off Save images to WARC - requires --images and does not take any extra time
--warc-pages on/off Save the current version of every page to WARC - requires --xml or --xmlapiexport
--warc-pages-history on/off Save every page, with full history, to WARC - requires --xml or --xmlapiexport.

DokuWiki (DokuWikiDumper)

Archive DokuWiki sites with DokuWikiDumper.

Usage: !dokusingle <options>
Alias: !dw

on/off switches are OFF by default

Option Type Explanation
--url URL URL of the DokuWiki instance
--auto on/off Automatically set common settings. Dumps: content+media+html, threads=5, ignore-action-disabled-edit. (threads is overridable)
--retry int Number of times to retry on bad responses (default 5)
--hard-retry int Number of times to retry on "hard errors" (default 3)
--ignore-disposition-header-missing on/off Ignore missing dokuwiki disposition headers. Helpful for sites on older DokuWiki versions
--delay double Delay, in seconds, between requests (eg --delay 1.5). (default 0)
--threads int Number of threads to use when downloading. (default 1)
--ignore-action-disabled-edit on/off Ignore errors caused by editing being disabled on the wiki.
--insecure on/off Ignore bad certificates on the web server, and other SSL errors.
--current-only on/off Download only the newest version of every page.
--content on/off Download page content.
--media on/off Download page media.
--html on/off Download page html.
--pdf on/off Download PDFs.

You may also request to download wikis in bulk. In this case, use the !dokubulk command, and for your URL, pass a .txt list of the wikis, one per line. Optionally, any text after the wiki URL on the same line, will be set as the explanation. Any --options you set in your command will be passed to every job.

Other bot commands

!status [job ID] - shows the status of a job. If a job id is not passed, it will show the number of running and queued jobs.

!abort <job ID> - Abort a job in progress.

!reupload <job ID> - Retry uploading the wiki to the Internet Archive. Useful when the archive is busy and causes uploads to fail

!check <search> - Generates a search link.