WGET GUIDE PART 1: SETTING THE SCENE Wget is the premier tool for downloading entire websites to your machine. It is a staple of Linux and has persisted since 1996. As a result it is heavily documented, has decades worth of versions and is an invaluable asset to your efforts. Installation Linux 1. Debian-based: sudo apt install wget 2. Arch-based: sudo pacman -S wget 3. RHEL-based: sudo dnf install wget 4. Verify installation: wget --version macOS 1. Homebrew: brew install wget 2. MacPorts: sudo port install wget 3. Nix: nix-shell -p wget 4. Verify installation: wget --version Windows 1. Winget: winget install GnuWin32.Wget 2. Scoop: scoop install wget 3. Chocolatey: choco install wget 4. Verify installation: wget --version PART 2: MIRRORING A WEBSITE The following command will download a full local copy of a website, converting all links so they work offline: wget --mirror --convert-links --adjust-extension --page-requisites --no-parent URL Replace URL with the address of the site or page you want to mirror. Here is what each flag does: --mirror Enables mirroring mode, equivalent to turning on recursive downloading, timestamping, and infinite recursion depth all at once. wget will follow links and pull the full structure of the site. --convert-links After downloading, converts all links in the HTML to point to the local files instead of the original web addresses. This is what makes the archived site browsable offline: clicking a link opens the local copy rather than trying to fetch from the internet. --adjust-extension Adds the correct file extension (e.g. .html) to downloaded files that do not already have one, so your browser knows how to open them. --page-requisites Downloads everything needed to display each page correctly: images, stylesheets, scripts, and other assets, even if they are hosted on a different domain. --no-parent Restricts downloading to the URL you specified and anything below it in the directory tree. Without this, wget could wander up to the site's root and start pulling the entire domain. This keeps the download focused. Choosing a Download Location By default, wget saves files into the current directory, organised into subfolders named after the domain. To run the command from a specific folder, navigate there first: cd ~/Documents/Archives wget --mirror --convert-links --adjust-extension --page-requisites --no-parent URL Or specify an output directory directly with -P: wget --mirror --convert-links --adjust-extension --page-requisites --no-parent -P ~/Documents/Archives URL PART 3: TROUBLESHOOTING AND TIPS The download is taking forever or pulling too much Some sites are enormous. You can limit recursion depth with --level=N (e.g. --level=2 to only go two links deep) or add a wait between requests with --wait=1 to be polite to the server and avoid being rate-limited. wget is being blocked by the server Some servers reject requests that do not look like a browser. Try spoofing a user agent by adding --user-agent="Mozilla/5.0" to your command. Links still point to the live site after downloading Make sure you included --convert-links. If you forgot it, you can re-run the command with -c (--continue) to resume and only fetch what is missing. Some assets are missing from the local copy If images or styles are hosted on a different domain (a CDN, for instance), --page-requisites should catch them. However, some sites load assets dynamically via JavaScript, which wget cannot execute. For heavily JavaScript-dependent sites, a headless browser tool will produce better results than wget alone. Windows Notes curl is already on your machine Windows 10 and 11 come with curl built in. For simple one-off file downloads, curl will do the job without any installation. Where wget earns its place on Windows is site mirroring and recursive downloads, which curl cannot do. Use PowerShell, not Command Prompt wget runs in both, but PowerShell is the better environment on Windows for command-line work generally. If you installed via Scoop, PowerShell is required. File paths use backslashes on Windows When specifying an output directory with -P, use Windows-style paths if needed, for example -P C:\Users\You\Archives. Forward slashes also work in PowerShell, so -P ~/Documents/Archives should work there too. wget on Windows is a port, not the native tool The Windows version is a port of GNU wget rather than a first-class Windows application. It works well for the use cases in this guide, but if you hit unusual issues not covered here, they may be Windows-specific. Searching for the error message alongside "wget Windows" is usually the fastest way to find a fix. Tips 1. Add --wait=1 --random-wait to your command to space out requests and be less aggressive. This reduces the chance of being blocked and is kinder to smaller sites. 2. To resume an interrupted download, re-run the same command with the -c (--continue) flag added. wget will pick up where it left off rather than starting over. 3. To download a single file rather than mirror a whole site, just pass the URL with no extra flags: wget URL 4. You can use --reject to skip certain file types. For example, --reject="*.mp4,*.zip" will skip video and archive files if you only want the text and images. 5. Pair wget with the Wayback Machine: mirror a site locally with wget for offline access, and submit the URL to https://web.archive.org/save to preserve a public copy too. CONCLUSION Congratulations! You can now mirror websites to your local machine. For a full list of wget options, run wget --help or consult the GNU wget manual at https://www.gnu.org/software/wget/manual/