This utility will scan a site domain HTML page by HTML page for links to geospatial data looking at their file extensions. In addition to putting the geospatial data links into a database it also outputs out to an HTML page. So at the end of a run you have a text file of HTML links indicating where the robot has been and an HTML page indicating what the robot has picked up along the way. The robot also takes the <title> </title> off of each traversed HTML page and uses that as a description for the links on that page. The following file formats are suported.
On some sites the geospatial datasets are linked to HTML pages that are generated by CGI scripts. The HTTP utility that I use can not access these pages and therefore can not parse them.
Note that the robot traverses HTML pages by taking links off of pages that it has previously visited. If the initial page has no HTML links on it then obviously the robot will not work past that page. You can check this by using Ctrl-U on your browser and look at the <a href=" "> </a> fields. Its best to start the robot off on pages that have lots of links that correspond to the site domain you entered.
NOTE, PAGES WITH HTML FRAMES USUALLY DON'T WORK BECAUSE THERE AREN'T ANY LINKS ON SUCH PAGES FOR THE ROBOT TO GRAB. DO A CTRL-U AND YOU'LL SEE WHAT I MEAN.
Go back to the Search Page.
Email: anp@geo.ed.ac.uk