취코, 취하다 코딩에~
WGET 명령어 본문
사용예 : wget -r ftp://ftp.ncbi.nlm.nih.gov/blast/db/
설명 : 이렇게 하면 ftp://ftp.ncbi.nlm.nih.gov/blast/db/ 의 디렉토리 구조를 유지한 채로 모든 파일을 불러온다.
- nd : 디렉토리를 만들지 않는다. 계층적으로 나열된 웹 사이트의 디렉토리의 내용을 한 디렉토리로 불러올 때 편리하다. -r 옵션과 같이 사용하면 매우 유용하다.
사용예 : wget -nd -r ftp://ftp.ncbi.nlm.nih.gov/blast/db/
설명 : 이렇게 하면 ftp://ftp.ncbi.nlm.nih.gov/blast/db/ 내의 내용물을 현재 폴더에 몽땅 다운로드받는다.
-A, --accept=: 지정된 확장자의 파일만을 받아온다.
사용예 : wget -nd -r --accept=fna ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/
설명 : 이렇게 하면 ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria 에서 확장자가 .fna 인 파일만을 받아서 현재 디렉토리에 저장한다. (물론 -nd 옵션을 빼면 폴더 구조가 그대로 유지된다)
-R, --reject=: 지정된 확장자의 파일만을 빼고 받아온다.
사용예 : wget -nd -r --accept=fna ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/
설명 : 이렇게 하면 ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria 에서 확장자가 .fna 인 파일만 빼고 받아서 현재 디렉토리에 저장한다.
-l , --level= : -r 옵션, 즉 하위 디렉토리 받아오기를 사용하였을 때 다운로드받을 최대 단계를 지정할 때 사용한다.
사용예 : wget -nd -r --accept=fna --level=3 ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/
설명 : ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria 에서 확장자가 .fna 인 파일만 빼고 받고 3단계까지 거슬러 올라서 다운로드를 수행한다.
-N : 현재 다운로드 받을 위치에 있는 파일이 현재 내 하드에 있는 파일보다 새로운 파일일때만 다운로드를 수행한다
-m : 미러 명령. 즉, 특정한 웹사이트의 내용을 그대로 폴더 구조채 긁어오되, 새로 업데이트한 내용만을 다운받고 싶을 때 사용한다.
ftp id, password 지정 : wget ftp://id:password@website
- 자동화 쉘 스크립트
#!/bin/bash
wget -nd -r ftp://id:password@website/$1/
- url의 디렉토리 구조까지 모두 다운로드 가능
1. Download Single File with wget
The following example downloads a single file from internet and stores in the current directory.
$ wget http://www.openss7.org/repos/tarballs/strx25-0.9.2.1.tar.bz2
While downloading it will show a progress bar with the following information:
- %age of download completion (for e.g. 31% as shown below)
- Total amount of bytes downloaded so far (for e.g. 1,213,592 bytes as shown below)
- Current download speed (for e.g. 68.2K/s as shown below)
- Remaining time to download (for e.g. eta 34 seconds as shown below)
Download in progress:
$ wget http://www.openss7.org/repos/tarballs/strx25-0.9.2.1.tar.bz2 Saving to: `strx25-0.9.2.1.tar.bz2.1' 31% [=================> 1,213,592 68.2K/s eta 34s
Download completed:
$ wget http://www.openss7.org/repos/tarballs/strx25-0.9.2.1.tar.bz2 Saving to: `strx25-0.9.2.1.tar.bz2' 100%[======================>] 3,852,374 76.8K/s in 55s 2009-09-25 11:15:30 (68.7 KB/s) - `strx25-0.9.2.1.tar.bz2' saved [3852374/3852374]
2. Download and Store With a Different File name Using wget -O
By default wget will pick the filename from the last word after last forward slash, which may not be appropriate always.
Wrong: Following example will download and store the file with name: download_script.php?src_id=7701
$ wget http://www.vim.org/scripts/download_script.php?src_id=7701
Even though the downloaded file is in zip format, it will get stored in the file as shown below.
$ ls download_script.php?src_id=7701
Correct: To correct this issue, we can specify the output file name using the -O option as:
$ wget -O taglist.zip http://www.vim.org/scripts/download_script.php?src_id=7701
3. Specify Download Speed / Download Rate Using wget –limit-rate
While executing the wget, by default it will try to occupy full possible bandwidth. This might not be acceptable when you are downloading huge files on production servers. So, to avoid that we can limit the download speed using the –limit-rate as shown below.
In the following example, the download speed is limited to 200k
$ wget --limit-rate=200k http://www.openss7.org/repos/tarballs/strx25-0.9.2.1.tar.bz2
4. Continue the Incomplete Download Using wget -c
Restart a download which got stopped in the middle using wget -c option as shown below.
$ wget -c http://www.openss7.org/repos/tarballs/strx25-0.9.2.1.tar.bz2
This is very helpful when you have initiated a very big file download which got interrupted in the middle. Instead of starting the whole download again, you can start the download from where it got interrupted using option -c
Note: If a download is stopped in middle, when you restart the download again without the option -c, wget will append .1 to the filename automatically as a file with the previous name already exist. If a file with .1 already exist, it will download the file with .2 at the end.
5. Download in the Background Using wget -b
For a huge download, put the download in background using wget option -b as shown below.
$ wget -b http://www.openss7.org/repos/tarballs/strx25-0.9.2.1.tar.bz2 Continuing in background, pid 1984. Output will be written to `wget-log'.
It will initiate the download and gives back the shell prompt to you. You can always check the status of the download using tail -f as shown below.
$ tail -f wget-log Saving to: `strx25-0.9.2.1.tar.bz2.4' 0K .......... .......... .......... .......... .......... 1% 65.5K 57s 50K .......... .......... .......... .......... .......... 2% 85.9K 49s 100K .......... .......... .......... .......... .......... 3% 83.3K 47s 150K .......... .......... .......... .......... .......... 5% 86.6K 45s 200K .......... .......... .......... .......... .......... 6% 33.9K 56s 250K .......... .......... .......... .......... .......... 7% 182M 46s 300K .......... .......... .......... .......... .......... 9% 57.9K 47s
Also, make sure to review our previous multitail article on how to use tail command effectively to view multiple files.
6. Mask User Agent and Display wget like Browser Using wget –user-agent
Some websites can disallow you to download its page by identifying that the user agent is not a browser. So you can mask the user agent by using –user-agent options and show wget like a browser as shown below.
$ wget --user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3) Gecko/2008092416 Firefox/3.0.3" URL-TO-DOWNLOAD
7. Test Download URL Using wget –spider
When you are going to do scheduled download, you should check whether download will happen fine or not at scheduled time. To do so, copy the line exactly from the schedule, and then add –spider option to check.
$ wget --spider DOWNLOAD-URL
If the URL given is correct, it will say
$ wget --spider download-url Spider mode enabled. Check if remote file exists. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Remote file exists and could contain further links, but recursion is disabled -- not retrieving.
This ensures that the downloading will get success at the scheduled time. But when you had give a wrong URL, you will get the following error.
$ wget --spider download-url Spider mode enabled. Check if remote file exists. HTTP request sent, awaiting response... 404 Not Found Remote file does not exist -- broken link!!!
You can use the spider option under following scenarios:
- Check before scheduling a download.
- Monitoring whether a website is available or not at certain intervals.
- Check a list of pages from your bookmark, and find out which pages are still exists.
8. Increase Total Number of Retry Attempts Using wget –tries
If the internet connection has problem, and if the download file is large there is a chance of failures in the download. By default wget retries 20 times to make the download successful.
If needed, you can increase retry attempts using –tries option as shown below.
$ wget --tries=75 DOWNLOAD-URL
9. Download Multiple Files / URLs Using Wget -i
First, store all the download files or URLs in a text file as:
$ cat > download-file-list.txt URL1 URL2 URL3 URL4
Next, give the download-file-list.txt as argument to wget using -i option as shown below.
$ wget -i download-file-list.txt
10. Download a Full Website Using wget –mirror
Following is the command line which you want to execute when you want to download a full website and made available for local viewing.
$ wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL
- –mirror : turn on options suitable for mirroring.
- -p : download all files that are necessary to properly display a given HTML page.
- –convert-links : after the download, convert the links in document for local viewing.
- -P ./LOCAL-DIR : save all the files and directories to the specified directory.
11. Reject Certain File Types while Downloading Using wget –reject
You have found a website which is useful, but don’t want to download the images you can specify the following.
$ wget --reject=gif WEBSITE-TO-BE-DOWNLOADED
12. Log messages to a log file instead of stderr Using wget -o
When you wanted the log to be redirected to a log file instead of the terminal.
$ wget -o download.log DOWNLOAD-URL
13. Quit Downloading When it Exceeds Certain Size Using wget -Q
When you want to stop download when it crosses 5 MB you can use the following wget command line.
$ wget -Q5m -i FILE-WHICH-HAS-URLS
Note: This quota will not get effect when you do a download a single URL. That is irrespective of the quota size everything will get downloaded when you specify a single file. This quota is applicable only for recursive downloads.
14. Download Only Certain File Types Using wget -r -A
You can use this under following situations:
- Download all images from a website
- Download all videos from a website
- Download all PDF files from a website
$ wget -r -A.pdf http://url-to-webpage-with-pdfs/
15. FTP Download With wget
You can use wget to perform FTP download as shown below.
Anonymous FTP download using Wget
$ wget ftp-url
FTP download using wget with username and password authentication.
$ wget --ftp-user=USERNAME --ftp-password=PASSWORD DOWNLOAD-URL
지정한 파일명으로 다운을 받게될것이다.
wget 옵션에는 전송 속도를 제한하는 옵션도 있는데 이건....개인 데스크탑용으로 사용하는 내 우분투에는 이 옵션을 사용할 일이 없을듯 하다..
wget --limit-rate=200k http://www.openss7.org/repos/tarballs/strx25-0.9.2.1.tar.bz2
그리고 다운로드를 하다보면 역시 가장 큰 문제는 큰 파일을 다운로드 중간에 먼가 생각지 못한 오류로 인한 중지가 되었을때이다.... 아마도 이 옵션을 몰랐다면 아마도 다시 다운로드 받았을 것이다. 물론 그리 바쁠꺼 없으니 신경또한 쓰지 않았겠지만...어찌됐건 wget 에는 -c라는 옵션이 존재한다.
wget -c http://www.openss7.org/repos/tarballs/strx25-0.9.2.1.tar.bz2
국내 인터넷 속도가 워낙에 빠르다보니 한개의 파일을 받는대 보통은 몇초면 왠만한 파일들은 몇초안에 다운로드가 가능하다.
하지만 때로는 여러 파일들을 동시에 받는것을 원할 때도 있는데 이 또한 wget은 지원한다.
wget -b http://www.openss7.org/repos/tarballs/strx25-0.9.2.1.tar.bz2
-b 옵션으로 백그라운드 프로세스로 돌리면 된다.
백그라운드 프로세스로 돌렸다고 여러 파일들에 대한 정보를 보는데 불편함이 있느냐? 전혀 그렇다 않다.
'프로그래밍 > 리눅스' 카테고리의 다른 글
wget 사용법 (0) | 2018.02.11 |
---|---|
한번에 끝내는 Ubuntu 웹서버세팅 (우분투 서버세팅) (0) | 2018.02.08 |
리눅스 설치, 백업, 잡오류 (0) | 2018.02.08 |
ubuntu에서 java 환경변수 설정 (0) | 2018.02.08 |
리눅스 정리 (0) | 2018.02.08 |