Intelligent Crawler ver 1.1

ICrawler (Intelligent Crawler) is an open source project that has three different properties:

Automatically and efficiently extracts necessary contents including headline, summary, main content and other contents
Automatically determines crawling depth
Quickly discovers new hyperlinks in web pages

Click for arff file that is used for training of the ICrawler.

Rule Editor

This application can be utilized for preparing rules to extract contents in a web page.

Paper of the ICrawler has been published in the Software: Practice and Experience.

Uzun, Erdinç; Güner, E.Serdar; Kılıçaslan, Yılmaz; Yerlikaya, Tarık & H.Agun, Volkan, (2013) “An effective and efficient Web content extractor for optimizing the crawling process”, Software: Practice and Experience, DOI: 10.1002/spe.2195