Monday, September 15, 2008

Downloading Files with wget

Sometimes you need to download a file from a remote server using the command line. For example, you find a link to an RPM software package, but the link goes through several HTTP redirects that prevent rpm from installing straight from HTTP. Or you may want to script the automated download of a file, such as a log file, every night.The wget command can download files from web servers (HTTP and HTTPS) and FTP servers. With a server that doesn’t require authentication, a wget command can be as simple as the wget command and the location of the download file:

$ wget https://help.ubuntu.com/7.04/common/img/headerlogo.png

If, for example, an FTP server requires a login and password, you can enter that information on
the wget command line in the following forms:

$ wget ftp://user:password@ftp.example.com/path/to/file
$ wget --user=user --password=password ftp://ftp.example.com/path/to/file

For example:
$ wget ftp://chris:mykuulpwd@ftp.linuxtoys.net/home/chris/image.jpg
$ wget –-user=chris –-password=mykuulpwd \
ftp://ftp.linuxtoys.net/home/chris/image.jpg

You can use wget to download a single web page as follows:
$ wget http://www.wiley.com Download only the Web page

If you open the resulting index.html, you’ll have all sorts of broken links. To download all the images and other elements required to render the page properly, use the-p option:

$ wget -p http://www.wiley.com Download Web page and other elements

But if you open the resulting index.html in your browser, chances are you will still have all the broken links even though all the images were downloaded. That’s because the links need to be translated to point to your local files. So instead, do this:

$ wget -pk http://www.wiley.com Download pages and use local file names

And if you’d like wget to keep the original file and also do the translation, type this:

$ wget -pkK http://www.wiley.com Rename to local names, keep original

Sometimes an HTML file you download does not have an.html extension, but ends in .asp or .cgi instead. That may result in your browser not knowing how to open your local copy of the file. You can have wget append .html to those files using the -E option:

$ wget -E http://www.aspexamples.com Append .html to downloaded files

With the wget command, you can recursively mirror an entire web site. While copying files and directories for the entire depth of the server’s file structure, the -m option adds timestamping and keeps FTP directory listings. (Use this with caution, because it can take a lot of time and space.)

$ wget -m http://www.linuxtoys.net

Using some of the options just described, the following command line results in the most usable local copy of a web site:

$ wget -mEkK http://www.linuxtoys.net

If you have ever had a large file download (such as a CD or DVD image file) disconnect before it completed, you may find the -c option to wget to be a lifesaver. Using -c, wget resumes where it left off, continuing an interrupted file download. For example:

$ wget http://example.com/DVD.iso Begin downloading large file
...

95%[========== ] 685,251,583 55K/s Download killed before completion
$ wget -c http://example.com/DVD.iso Resume download where stopped
...
HTTP request sent, awaiting response... 206 Partial Content
Length: 699,389,952 (667), 691,513 (66M) remaining [text/plain]

Because of the continue feature (-c), wget can be particularly useful for those with slow Internet connections who need to download large files. If you have ever had a several-hour download get killed just before it finished, you’ll know what we mean. (Note that if you don’t use the -c when you mean to resume a file download, the file will be saved to a different file: the original name with a .1 appended to it.)


No comments:

Post a Comment