WikiBot is an IRC bot that helps archive MediaWiki and DokuWiki sites. View all WikiTeam downloads - View only WikiBot downloads.
WikiBot and WikiTeam3+WARC support are maintained by DigitalDragons. WikiTeam3 and DokuWikiDumper are maintained by yzqzss and the saveweb project.
All jobs will require the --explain parameter to be passed with a short explanation of why the job was started. (eg --explain no coverage)
Archive MediaWiki sites with WikiTeam3.
Usage: !mediawikisingle <options>
Alias: !mw
on/off switches are OFF by default
Option | Type | Explanation |
--url | URL | URL of the MediaWiki instance - it will auto-detect the API and index.php pages |
--api | URL | URL of the MediaWiki's api.php. If not specified, it will be automatically detected if possible. |
--index | URL | URL of the MediaWiki's index.php. If not specified, it will be automatically detected if possible. |
--delay | double | The delay, in seconds, that should be made between requests (eg --delay 1.5, default 1.5) |
--retries | int | How many times failed requests should be retried before the job fails (default 5) |
--api_chunksize | int | The maximum number of revisions that the bot should try to request from the MediaWiki API (default 50) |
--xml | on/off | Whether the bot should download XML (page content) |
--xmlapiexport | on/off | Export XML via the MediaWiki API? Requires --xml |
--xmlrevisions | on/off | Export XML via the MediaWiki Revisions API? Requires --xml. (Recommended, fastest method, requires MW 1.27+) |
--curonly | on/off | Only download the latest version of every page |
--insecure | on/off | Ignore bad certificates on the web server, and other SSL errors. |
--images | on/off | Whether the bot should download images |
--bypass-cdn-image-compression | on/off | Attempts to bypass CDN image compression (e.g. Cloudflare Polish) and get the uncompressed file) |
--disable-image-verify | on/off | (DANGEROUS!) Disable verifying the hash of downloaded images with the one sent by the server. |
--force | on/off | Bypass "This wiki has already been uploaded to IA this year" and "This is a Wikimedia Foundation wiki" protections. |
(BETA) You may also request pages be saved in WARC format. Please note that this is very slow compared to normal dumping, and that it still will not make wikis playable in the Wayback Machine.
Option | Type | Explanation |
--warc-images | on/off | Save images to WARC - requires --images and does not take any extra time |
--warc-pages | on/off | Save the current version of every page to WARC - requires --xml or --xmlapiexport |
--warc-pages-history | on/off | Save every page, with full history, to WARC - requires --xml or --xmlapiexport. |
Archive DokuWiki sites with DokuWikiDumper.
Usage: !dokusingle <options>
Alias: !dw
on/off switches are OFF by default
Option | Type | Explanation |
--url | URL | URL of the DokuWiki instance |
--auto | on/off | Automatically set common settings. Dumps: content+media+html, threads=5, ignore-action-disabled-edit. (threads is overridable) |
--retry | int | Number of times to retry on bad responses (default 5) |
--hard-retry | int | Number of times to retry on "hard errors" (default 3) |
--ignore-disposition-header-missing | on/off | Ignore missing dokuwiki disposition headers. Helpful for sites on older DokuWiki versions |
--delay | double | Delay, in seconds, between requests (eg --delay 1.5). (default 0) |
--threads | int | Number of threads to use when downloading. (default 1) |
--ignore-action-disabled-edit | on/off | Ignore errors caused by editing being disabled on the wiki. |
--insecure | on/off | Ignore bad certificates on the web server, and other SSL errors. |
--current-only | on/off | Download only the newest version of every page. |
--content | on/off | Download page content. |
--media | on/off | Download page media. |
--html | on/off | Download page html. |
on/off | Download PDFs. |
You may also request to download wikis in bulk. In this case, use the !dokubulk command, and for your URL, pass a .txt list of the wikis, one per line. Optionally, any text after the wiki URL on the same line, will be set as the explanation. Any --options you set in your command will be passed to every job.
!status [job ID] - shows the status of a job. If a job id is not passed, it will show the number of running and queued jobs.
!abort <job ID> - Abort a job in progress.
!reupload <job ID> - Retry uploading the wiki to the Internet Archive. Useful when the archive is busy and causes uploads to fail
!check <search> - Generates a search link.