This utility will scan a site domain HTML page by HTML page for links to geospatial data looking at their file extensions. In addition to putting the geospatial data links into a database it also outputs out to an HTML page. So at the end of a run you have a text file of HTML links indicating where the robot has been and an HTML page indicating what the robot has picked up along the way. The robot also takes the <title> </title> off of each traversed HTML page and uses that as a description for the links on that page. The following file formats are suported.
The robot is fully operational and can scan any web site submitted to it. However, as you know, many sites are in excess of hundreds, perhaps even thousands of pages, which mean the robot will litterally take hours to complete its operation on our server. The size of the web site and the fact that anyone can submit numerous processes over the web has lead me to construct this 'canned' deomonstration of how the robot operates. In actual operation you would submit a starting HTML page, associated ftp site, name and email; I would then run the robot with that information. However with this demonstration you can see the robot in action on about 10 to 12 HTML pages. The starting page is: http://www.geo.ed.ac.uk/~anp/public/pindex.htm and note there are duplicate as well as relative urls on this particular page. Also, on some of these pages there are 'dummy' geospatial data links, this was to facilitate the testing of the software.
In order for the robot to operate in this demo I've had to remove all the existing information of the 'demo site' from the database. So the information placed into the database as a result of the demo won't show up. I did this by enabling an SQL command call "ROLLBACK" in the software robot; that is why files like FTP.TXT occur over and over agian. However, I've got another test page in there if you like to try to query the database and see what the output looks like. For the demo you can see 'traverse' and 'html links' page described above.
Go back to the Search Page.
Email: anp@geo.ed.ac.uk