wget을 사용하여 임의의 파일이있는 디렉토리를 재귀 적으로 가져 오기

development

wget을 사용하여 임의의 파일이있는 디렉토리를 재귀 적으로 가져 오기

big-blog 2020. 2. 10. 22:16

wget을 사용하여 임의의 파일이있는 디렉토리를 재귀 적으로 가져 오기

구성 파일을 저장하는 웹 디렉토리가 있습니다. wget을 사용하여 해당 파일을 가져 와서 현재 구조를 유지하고 싶습니다. 예를 들어, 원격 디렉토리는 다음과 같습니다.

http://mysite.com/configs/.vim/

.vim은 여러 파일과 디렉토리를 보유합니다. wget을 사용하여 클라이언트에서 복제하고 싶습니다. 이 작업을 수행하기 위해 wget 플래그의 올바른 콤보를 찾을 수 없습니다. 어떤 아이디어?

당신은 통과해야 -np/ --no-parent에 옵션을 wget(에 추가 -r/ --recursive물론), 그렇지 않으면 상위 디렉토리에 내 사이트에 디렉토리 인덱스에있는 링크를 따릅니다. 따라서 명령은 다음과 같습니다.

wget --recursive --no-parent http://example.com/configs/.vim/

자동 생성 된 index.html파일을 다운로드하지 않으려면 -R/ --reject옵션을 사용하십시오 .

wget -r -np -R "index.html*" http://example.com/configs/.vim/

디렉토리를 재귀 적으로 다운로드하려면 index.html * 파일을 거부하고 호스트 이름, 상위 디렉토리 및 전체 디렉토리 구조없이 다운로드하십시오.

wget -r -nH --cut-dirs=2 --no-parent --reject="index.html*" http://mysite.com/dir1/dir2/data

비슷한 문제가있는 다른 사람. Wget follow robots.txt는 사이트를 가져 오지 못할 수 있습니다. 걱정할 필요가 없습니다.

wget -e robots=off http://www.example.com/

http://www.gnu.org/software/wget/manual/html_node/Robot-Exclusion.html

-m (미러) 플래그를 사용해야합니다. 타임 스탬프를 엉망으로 만들지 않고 무기한 반복됩니다.

wget -m http://example.com/configs/.vim/

이 스레드에서 다른 사람들이 언급 한 점을 추가하면 다음과 같습니다.

wget -m -e robots=off --no-parent http://example.com/configs/.vim/

다음은 서버 디렉토리에서 파일을 다운로드하는 데 도움이 된 전체 wget 명령입니다 (무시 robots.txt).

wget -e robots=off --cut-dirs=3 --user-agent=Mozilla/5.0 --reject="index.html*" --no-parent --recursive --relative --level=1 --no-directories http://www.example.com/archive/example/5.3.0/

--no-parent도움 이 되지 않으면 --include옵션을 사용할 수 있습니다 .

디렉토리 구조 :

http://<host>/downloads/good
http://<host>/downloads/bad

그리고 디렉토리 를 다운로드 downloads/good하지 않고 다운로드하려고합니다 downloads/bad.

wget --include downloads/good --mirror --execute robots=off --no-host-directories --cut-dirs=1 --reject="index.html*" --continue http://<host>/downloads/good

wget -r http://mysite.com/configs/.vim/

나를 위해 작동합니다.

아마도 방해하는 .wgetrc가 있습니까?

사용자 이름과 비밀번호를 사용하여 디렉토리를 재귀 적으로 가져 오려면 다음 명령을 사용하십시오.

wget -r --user=(put username here) --password='(put password here)' --no-parent http://example.com/

당신이 필요로하는 일이고, 두 플래그입니다 "-r"재귀 및 대한 "--no-parent"(또는 -np에 갈 수없는 순서대로) '.'와 "..". 이처럼 :

wget -r --no-parent http://example.com/configs/.vim/

그게 다야. 다음 로컬 트리로 다운로드됩니다 ./example.com/configs/.vim.. 그러나 처음 두 디렉토리를 원하지 않으면 --cut-dirs=2이전 응답에서 제안한대로 추가 플래그를 사용하십시오 .

wget -r --no-parent --cut-dirs=2 http://example.com/configs/.vim/

그리고 파일 트리를 ./.vim/

사실, 나는이 답변에서 첫 번째 줄을 wget manual 에서 정확하게 얻었습니다 .4.3 섹션의 끝 부분에 대해 매우 깨끗한 예가 있습니다.

-r을 추가하여 간단하게 수행 할 수 있어야합니다.

wget -r http://stackoverflow.com/

Wget 1.18이 더 잘 작동 할 수 있습니다. 예를 들어, 버전 1.12 버그에 물 렸습니다.

wget --recursive (...)

... 모든 파일 대신 index.html 만 검색합니다.

해결 방법은 약 301 리디렉션을 발견하고 새로운 위치를 시도하는 것입니다. 새 URL이 있으면 wget은 디렉토리의 모든 파일을 얻습니다.

이 버전은 재귀 적으로 다운로드되며 상위 디렉토리를 생성하지 않습니다.

wgetod() {
    NSLASH="$(echo "$1" | perl -pe 's|.*://[^/]+(.*?)/?$|\1|' | grep -o / | wc -l)"
    NCUT=$((NSLASH > 0 ? NSLASH-1 : 0))
    wget -r -nH --user-agent=Mozilla/5.0 --cut-dirs=$NCUT --no-parent --reject="index.html*" "$1"
}

용법:

~/.bashrc터미널에 추가 하거나 터미널에 붙여 넣기
wgetod "http://example.com/x/"

다음 옵션은 재귀 다운로드를 처리 할 때 완벽한 조합으로 보입니다.

wget -nd -np -P / dest / dir-재귀 http : // url / dir1 / dir2

편의를 위해 매뉴얼 페이지의 관련 스 니펫 :

   -nd
   --no-directories
       Do not create a hierarchy of directories when retrieving recursively.  With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the
       filenames will get extensions .n).


   -np
   --no-parent
       Do not ever ascend to the parent directory when retrieving recursively.  This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.

참고 : https://stackoverflow.com/questions/273743/using-wget-to-recursively-fetch-a-directory-with-arbitrary-files-in-it

'development' 카테고리의 다른 글

어떤 알고리즘이지도에서 지점 A에서 지점 B까지의 방향을 계산합니까? (0)	2020.02.10
Node.js (package.json) 용“devDependencies”NPM 모듈 설치를 어떻게 방지합니까? (0)	2020.02.10
.gitignore에 추가 한 후 원격 저장소에서 디렉토리를 제거하십시오. (0)	2020.02.10
브라우저가 CSS 선택기를 오른쪽에서 왼쪽으로 일치시키는 이유는 무엇입니까? (0)	2020.02.10
파이썬에서 환경 변수를 설정하는 방법 (0)	2020.02.10

현재글wget을 사용하여 임의의 파일이있는 디렉토리를 재귀 적으로 가져 오기

big-blog

wget을 사용하여 임의의 파일이있는 디렉토리를 재귀 적으로 가져 오기

wget을 사용하여 임의의 파일이있는 디렉토리를 재귀 적으로 가져 오기

'development' 카테고리의 다른 글

'development'의 다른글

티스토리툴바

wget을 사용하여 임의의 파일이있는 디렉토리를 재귀 적으로 가져 오기

wget을 사용하여 임의의 파일이있는 디렉토리를 재귀 적으로 가져 오기

'development' 카테고리의 다른 글

'development'의 다른글

관련글

티스토리툴바