Intelligent Crawler ver 1.1

ICrawler (Intelligent Crawler) is an open source project that has three different properties:

  • Automatically and efficiently extracts necessary contents including headline, summary, main content and other contents
  • Automatically determines crawling depth
  • Quickly discovers new hyperlinks in web pages

Github link

Click for application

Click for source codes

Click for arff file that is used for training of the ICrawler.

Rule Editor

This application can be utilized for preparing rules to extract contents in a web page.

Paper of the ICrawler has been published in the Software: Practice and Experience.

  • Uzun, Erdinç; Güner, E.Serdar; Kılıçaslan, Yılmaz; Yerlikaya, Tarık & H.Agun, Volkan, (2013) “An effective and efficient Web content extractor for optimizing the crawling process”, Software: Practice and Experience, DOI: 10.1002/spe.2195