Mirroring a Website
The following command will download a full local copy of a website, converting all links so they work offline:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent URL
Replace URL with the address of the site or page you want to mirror. Here is what each flag does:
--mirror : enables mirroring mode, equivalent to turning on recursive downloading, timestamping, and infinite recursion depth all at once. wget will follow links and pull the full structure of the site.
--convert-links : after downloading, converts all links in the HTML to point to the local files instead of the original web addresses. This is what makes the archived site browsable offline: clicking a link opens the local copy rather than trying to fetch from the internet.
--adjust-extension : adds the correct file extension (e.g. .html) to downloaded files that do not already have one, so your browser knows how to open them.
--page-requisites : downloads everything needed to display each page correctly: images, stylesheets, scripts, and other assets, even if they are hosted on a different domain.
--no-parent : restricts downloading to the URL you specified and anything below it in the directory tree. Without this, wget could wander up to the site's root and start pulling the entire domain. This keeps the download focused.
Choosing a Download Location
By default, wget saves files into the current directory, organised into subfolders named after the domain.
cd ~/Documents/Archives
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent URL
Or specify an output directory directly with -P:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent -P ~/Documents/Archives URL
Troubleshooting and Tips
The download is taking forever or pulling too much : some sites are enormous. You can limit recursion depth with --level=N (e.g. --level=2 to only go two links deep) or add a wait between requests with --wait=1 to be polite to the server and avoid being rate-limited.
wget is being blocked by the server : some servers reject requests that do not look like a browser. Try spoofing a user agent by adding --user-agent="Mozilla/5.0" to your command.
Links still point to the live site after downloading : make sure you included --convert-links. If you forgot it, you can re-run the command with -c (--continue) to resume and only fetch what is missing.