NAME nsite - tool for generating WWW site maps SYNOPSIS nsite.pl [ -verbose ] [ -help ] [ -doc ] [ -depth ] [ -proxy ] [ -[no]envproxy ] [ -agent ] [ -authen ] [ -format ] [ -summary ] [ -title ] [ -email ] [ -index ] [ -nolinks ] [ -stats ] [ -output ] [ -altstart ] -url DESCRIPTION nSite generates site maps for a given WWW site. It walks a site from the root URL and generates an HTML, TEXT, or XML link page which illustrates the structure of the site. The HTML site map consists of the page url, title, unique fingerprint, summary, and list of internal and external links. The links are 'clickable' with the internal links in blue and the external links in orange. The TEXT site map consists of the page url, title, and unique fingerprint. The XML site map is a list of XML / structures. The structure reflects the depth from the root page to the pages listed; i.e., the first-level bullets are pages accessible directly from the root page, at the next levels are pages accessible from those pages, etc. nSite assumes a typical, breadth-first, top-down site structure so pages may appear in a different order than originally intended. OPTIONS -url Option to specify a root URL to generate a site map for. This option is required. -depth Option to specify the depth of the site map generated. If not specified, nSite will generate a sitemap of unlimited depth. -email Option to specify the email address which is reported by the robot to the site where it gets pages from. -proxy Specify an HTTP proxy to use. -[no]envproxy If -envproxy is set, the proxy specified by the $http_proxy environment variable will be used (this is the default behaviour). Use -noenvproxy to suppress this. -proxy takes precedence over -envproxy. -agent Allows the user to specify an agent for the robot to pretend to be (e.g. 'Mozilla/4.5'). This can be necessary for sites that do browser sniffing for serving particular content, etc. -format Option for specifying the output format the site map. Possible values are html Simple HTML bulleted list (default). Consists of the page url, title, unique fingerprint, summary, and list of internal and external links. The links are 'clickable' with the internal links in blue and the external links in orange. text Plain text with indenting. Consists of the page url, title, and a unique fingerprint. xml An XML graph of linkage between pages. Consists of a list of XML / structures. none Do not output the site map. Useful when you want to just output the stats file. (see -stats) -summary Automatically extract a summary to display with the title. This will be truncated at the specified number of characters (default:200). To disable the summary display, set the number of chars to -1. -title Option to specify a page title for the site map. -authen Option to use LWP::AuthenAgent to get HTML pages. This allows the user to type a username / password for pages that are access controlled. -index Option to display an index (table of contents) for the site map. -nolinks Option to disable the display of the internal and external links for each page in the site map. -altstart Option to start the mapping at a specific file instead of the default index file. -stats Option to output a statistics file with lines containing the following: URLFINGERPRINTNUMBER_OF_LINKSDEPTHTITLE. -output Option to output the site map to a file. (Defaults to standard output.) -help Display a help message to standard output, with a brief description of nSite and its command-line switches. -doc Display the full documentation for nSite, generated from the embedded pod format documentation. -version Print out the current version number for nSite. -verbose Turn on verbose messages. ENVIRONMENT nSite makes use of the `$http_proxy' environment variable, if it is set. PREREQUISITES HTML::Entities Getopt::Long LWP::AuthenAgent LWP::UserAgent Pod::Usage BUGS XML support is very basic. It has been tested only on some Linux, Windows, and Irix systems. AUTHOR Steve Horsburgh CREDITS This script is based on the 1997 sitemapper.pl script by Ave Wrigley COPYRIGHT Copyright (c) 2000, Horsburgh.com. All rights reserved. This script is free software; you can redistribute it and/or modify it under GNU GPL. (See the file COPYING)