Unix questions - wget a directory listing

This is Interesting: Free IT Magazines  
Home > Archive > Unix questions > February 2005 > wget a directory listing





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author wget a directory listing
Ed Morton

2005-02-21, 5:59 pm

Anyone know how to use "wget" or any other tool to get a list of the
files under a specific web directory. For example, I could presumably use:

wget http://www.gnu.org/people/people.html

to get a copy of the file "http://www.gnu.org/people/people.html", but
how can I just get a list of all the files under
"http://www.gnu.org/people"?

Regards,

Ed.
Bill Marcum

2005-02-21, 5:59 pm

On Mon, 21 Feb 2005 10:16:19 -0600, Ed Morton
<morton@lsupcaemnt.com> wrote:
> Anyone know how to use "wget" or any other tool to get a list of the
> files under a specific web directory. For example, I could presumably use:
>
> wget http://www.gnu.org/people/people.html
>
> to get a copy of the file "http://www.gnu.org/people/people.html", but
> how can I just get a list of all the files under
> "http://www.gnu.org/people"?
>

(What does this have to do with unix?)

I don't think the http protocol is designed to provide directory
information. You might try ftp://www.gnu.org/people, or "wget -r" to
fetch any URLs that are linked in the people.html page.
Ed Morton

2005-02-21, 5:59 pm



Bill Marcum wrote:
> On Mon, 21 Feb 2005 10:16:19 -0600, Ed Morton
> <morton@lsupcaemnt.com> wrote:
>
>
> (What does this have to do with unix?)


I'm running the wget command on UNIX and looking for any other UNIX tool
to do the job.

> I don't think the http protocol is designed to provide directory
> information. You might try ftp://www.gnu.org/people, or "wget -r" to
> fetch any URLs that are linked in the people.html page.


wget -r doesn't do it, it just creates the directory strcuture. ftp just
fails with "unknown host or invalid literal address".

Ed.
Alan Connor

2005-02-21, 5:59 pm

On Mon, 21 Feb 2005 10:16:19 -0600, Ed Morton
<morton@lsupcaemnt.com> wrote:

> Anyone know how to use "wget" or any other tool to get a list
> of the files under a specific web directory. For example, I
> could presumably use:
>
> wget http://www.gnu.org/people/people.html
>
> to get a copy of the file
> "http://www.gnu.org/people/people.html", but how can I just get
> a list of all the files under "http://www.gnu.org/people"?
>
> Regards,
>
> Ed.



Maybe the --spider option could be used this way. I've never
tried it with the -r (recursive) option, but....

AC


zentara

2005-02-22, 5:53 pm

On Mon, 21 Feb 2005 10:16:19 -0600, Ed Morton <morton@lsupcaemnt.com>
wrote:

>Anyone know how to use "wget" or any other tool to get a list of the
>files under a specific web directory. For example, I could presumably use:
>
>wget http://www.gnu.org/people/people.html
>
>to get a copy of the file "http://www.gnu.org/people/people.html", but
>how can I just get a list of all the files under
>"http://www.gnu.org/people"?
>
>Regards,


I don't think you can do it through http, unless there is no index.html
in that directory and "dirlist" is enabled.

If given a directory for a url, the http server will look in that
directory for a "default index file" specified in it's configuration
file, ( or specially overridden on a dir-by-dir basis using the
..htaccess files.). It is usually index.html or index.cgi.

If no default index exists, and "dirlisting" is enabled, you will
get a filelist of the directory. If it is not enabled, you will get
an http error, like "not found".

Even with wget, it won't retreive any pages, not available via
a browser. The page must have a link to it somewhere, or be
available in a dirlist.




--
I'm not really a human, but I play one on earth.
http://zentara.net/japh.html
Greg Beeker

2005-02-23, 6:01 pm


Ed Morton wrote:
> Anyone know how to use "wget" or any other tool to get a list of the
> files under a specific web directory. For example, I could presumably

use:
>
> wget http://www.gnu.org/people/people.html
>
> to get a copy of the file "http://www.gnu.org/people/people.html",

but
> how can I just get a list of all the files under
> "http://www.gnu.org/people"?
>

If those files are mentioned in the people.html document, you can use
unix commands to get them:

< people.html | grep HREF | awk -F\" '{print $2}' | grep -v http

made
/graphics/atypinggnu.html
people.cs.html
people.html
people.pt.html
<snip>

I know you really want all the files, but this is a partial list
anyway. Have you tried to contact the web site admin>?
webmasters@gnu.org

> Regards,
>
> Ed.


Ed Morton

2005-02-23, 6:01 pm



Greg Beeker wrote:
> Ed Morton wrote:
>
>
> use:
>
>
> but
>
>
> If those files are mentioned in the people.html document, you can use
> unix commands to get them:
>
> < people.html | grep HREF | awk -F\" '{print $2}' | grep -v http
>
> made
> /graphics/atypinggnu.html
> people.cs.html
> people.html
> people.pt.html
> <snip>
>
> I know you really want all the files, but this is a partial list
> anyway. Have you tried to contact the web site admin>?
> webmasters@gnu.org


I'm not trying to get those specific files, it was just an example. The
directory I'm actually getting the files from may have no HTML files in
it and if it does have any, they won't refer to any of the other files
in that directory.

Ed.
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com