----------------------------------------------------------------------- # THEKLA v12.8.29 - Web spider for web application pentesting # (c) 2009- Volker Tanger # # http://www.wyae.de/software/thekla/ # ----------------------------------------------------------------------- When performing a web application security tests, you need a list of URLs and forms to test. For this a web pen tester needs a list of all forms and dynamic URLs which (s)he should check. Thekla is a web spider/crawler designed to extract paramter-containing URLs and forms. The result are text and CSV files, listing the URLs of dynamic web pages and their referers. Requirements: ------------- * Python 2.x (tested on 2.4 and 2.5, probably newer will work, too) Usage: ------ You need to configure target specifications at the top of the script file. First all starting URLs from which the spider should start: all_urls = [ 'http://www.wyae.de/' ] Then the domains the spider must not leave (as many as you like): valid_urls = ['http://www.wyae.de','http://www.xarsim.org/'] and domains the spider must not visit (as many as you like): forbidden_urls = ['http://www.wyae.de/software','http://www.wyae.de/lists'] If you need Cookies, set them in the "cookie" variable. If undefined, PHPSESSID will be set automatically. Then you can call the phython script which will spider the web site, showing URLs found, downloaded and to-be-worked-on. The results are saved in two sorted files each: .TXT - containing only the target URLs .CSV - tab-separated target URLs and their referers TODO - PROXIES DON'T WORK YET! This is due to a change from URLLIB to URLLIB2 You can configure a proxy in the top section of the file. If you need to use HTTPS/SSL via proxy, try using STUNNEL. Native HTTPS/SSL via proxy is not supported (yet?). Default is direct connection proxies = None # proxies = {'http': 'http://localhost:8080'} You can configure the filenames in the top opf the script. The files listing parameter-containing URLs are: filename_paramurls = 'paramurls.txt' (URLs only) filename_paramurlreferer = 'paramurls.csv' (URLs+referers) The files listing forms are: filename_forms = 'forms.txt' (URLs only) filename_formsreferer = 'forms.csv' (URLs+referers) Interesting URLs pointing to external domains (i.e. not matching the valid_url parameter) are listed as unsorted CSV files filename_ext_paramurls = 'external_paramurls.csv' filename_ext_forms = 'external_forms.csv' All requests are logged to filename_logfile = 'T_logfile.txt' and the URLs actually visited to filename_logurls = 'T_logurls.txt' Errors and JavaScript stuff found are witten to filename_errors = 'T_errors.txt' filename_js = 'T_javascript.txt' Known Bugs/Limitations: ----------------------- * cannot handle HTTPS/SSL via proxy * does not recognize redirects, ignores them - tag - JavaScript (window.location= ) * does not do much session-handling - no Basic/Digest/NTML auth - no session-headers * ignores /robots.txt and meta:nofollow tags * not overly fast Wishlist: --------- * scan for JavaScript - WebSocket *( - .innerHTML *= - eval *( - .submit *() - .write *( - location.href * scan for JavaScript events - captureEvents *( - handleEvent *( - routeEvent *( - onabort - onblur - onchange - onclick - ondblclick - onerror - onfocus - onkeydown - onkeypress - onkeyup - onload - onmousedown - onmousemove - onmouseout - onmouseover - onmouseup - onreset - onselect - onsubmit - onunload * scan for funky URLs (ws:// file:// etc) - blacklist for known URL schemas (http, https, ftp, mailto), note everything else * retry-loop ("#nr html failures, retry them (Y/N)?") für failed pages * calls/urls of known toolsets (jQuery, prototype, ...) * better eval: forms / paramurls to test ==> in TODO-Lists * scan for "notable" files: - robots.txt - crossdomain.xml - .htaccess - .htpasswd - phpinfo.php - info.php * scan for disclosures: - private IP addresses - emails into separate file - path disclosure (C:\ ... D:\ ...) - numbers (phone, creditcard, ...) Results / Evaluation -------------------- Thekla only does the numbing web site enumeration. It will not perform any security checks or vulnerability settings. It only extracts a number of URLs a web application tester should check as they are found to contain parameters. Beware: Depending on web application architecture Thekla can run into endless loops. Have an eye on it while it is running and abort if that seems to be the case. If you need a security consultant to check your network, systems or architecture, or to help you with a security problem/incident or (a better approach) check your security architecture and risks, simply contact me. ;-) ----------------------------------------------------------------------- # THEKLA - Web spider for web application pentesting # (c) 2009- Volker Tanger # # http://www.wyae.de/software/thekla/ # All rights reserved. Distributable under "Modified BSD" license ----------------------------------------------------------------------- Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditionsare met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR `AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE -----------------------------------------------------------------------