ICrawler (Intelligent Crawler) is an open source project that has three different properties:
- Automatically and efficiently extracts necessary contents including headline, summary, main content and other contents
- Automatically determines crawling depth
- Quickly discovers new hyperlinks in web pages
Click for arff file that is used for training of the ICrawler.
Rule Editor
This application can be utilized for preparing rules to extract contents in a web page.
Paper of the ICrawler has been published in the Software: Practice and Experience.
- Uzun, Erdinç; Güner, E.Serdar; Kılıçaslan, Yılmaz; Yerlikaya, Tarık & H.Agun, Volkan, (2013) “An effective and efficient Web content extractor for optimizing the crawling process”, Software: Practice and Experience, DOI: 10.1002/spe.2195