Changelog:
Pre-release 1.4 alpha (2015-02-09)
- New feature of postponed delete, cleanup and purging of resource data including content on disk storage and in the key-value db with periodic limited by load level and items number. Completely configurable schedule and selection of candidates to delete; separated MySQL database for tables for each site; balanced purging task with optimized load level of multi-host system and so on…
- New feature of completely separated crawling and processing tasks management including the tasks queue processing, scheduling, load level balancing, tasks competitions configuration, re-crawling and re-processing on demand and according the schedule and many more…
- New feature of completely multi-threaded re-crawling management with support of resources balancing and sites state protection including configurable cleanup, optimize and auto tune up of re-crawl period…
- New feature of completely separated deleted resources purging from the system including the load-balancing of purging tasks for multy-host configuration and scheduling…
- New feature of support of the MySQL-based blocking for per host DB operations to protect database structures from multi-process operations overlapping.
- Improvements of the scraping algorithms and the processing core including of support of fully customized real-time crawling and processing requests with fixed scraping templates and scraper selection.
- Many fixes for crawling and scraping features.
Latests unstable bundle archive can be downloaded here.