HCE Project Python language Distributed Tasks Manager Application, Distributed Crawler Application and client API bindings.  2.0.0-chaika
Hierarchical Cluster Engine Python language binding
dc_crawler.Constants Namespace Reference

Variables

int FETCHER_TIME_LIMIT_MAX = 100
 
float CONNECTION_TIMEOUT = 1.0
 
int MAX_HTTP_REDIRECTS_LIMIT = 5
 
int MAX_HTTP_SIZE_UNLIMIT = 0
 
int MAX_HTML_REDIRECTS_LIMIT = 1
 
string DB_SITES = "dc_sites"
 
string DB_URLS = "dc_urls"
 
string RTC_FINALIZER_APP_NAME = "rtc-finalizer"
 
string RTC_PREPROCESSOR_APP_NAME = "rtc-preprocessor"
 
list pubdateFeedNames = ["pubdate", "published", "pubDate", "published_parsed", "updated_parsed"]
 
string pubdateRssFeedHeaderName = "X-pubdateRssFeed"
 
string rssFeedUrlHeaderName = "X-feed_url"
 
string baseUrlHeaderName = "X-base_url"
 
int HTTP_CODE_200 = 200
 
int HTTP_CODE_304 = 304
 
int HTTP_CODE_400 = 400
 
int HTTP_CODE_403 = 403
 
list REDIRECT_HTTP_CODES = [301, 302, 303, 304]
 
list REDIRECT_HEADER_FIELDS_FOR_REMOVE = ['referer', 'content-type', 'Location', 'cookie']
 
dictionary charsetDetectorMap
 
dictionary standardEncodings
 

Detailed Description

  HCE project, Python bindings, Distributed Tasks Manager application.
  Event objects definitions.

  @package: dc
  @file Constants.py
  @author Oleksii <developers.hce@gmail.com>
  @author madk <developers.hce@gmail.com>
  @link: http://hierarchical-cluster-engine.com/
  @copyright: Copyright &copy; 2013-2014 IOIX Ukraine
  @license: http://hierarchical-cluster-engine.com/license/
  @since: 0.1

Variable Documentation

◆ baseUrlHeaderName

string dc_crawler.Constants.baseUrlHeaderName = "X-base_url"

Definition at line 34 of file Constants.py.

◆ charsetDetectorMap

dictionary dc_crawler.Constants.charsetDetectorMap
Initial value:
1 = {
2  'win-1251':'windows-1251',
3  'UTF-8':'utf8',
4  'utf-8':'utf8'
5 }

Definition at line 45 of file Constants.py.

◆ CONNECTION_TIMEOUT

float dc_crawler.Constants.CONNECTION_TIMEOUT = 1.0

Definition at line 17 of file Constants.py.

◆ DB_SITES

string dc_crawler.Constants.DB_SITES = "dc_sites"

Definition at line 24 of file Constants.py.

◆ DB_URLS

string dc_crawler.Constants.DB_URLS = "dc_urls"

Definition at line 25 of file Constants.py.

◆ FETCHER_TIME_LIMIT_MAX

int dc_crawler.Constants.FETCHER_TIME_LIMIT_MAX = 100

Definition at line 16 of file Constants.py.

◆ HTTP_CODE_200

int dc_crawler.Constants.HTTP_CODE_200 = 200

Definition at line 36 of file Constants.py.

◆ HTTP_CODE_304

int dc_crawler.Constants.HTTP_CODE_304 = 304

Definition at line 37 of file Constants.py.

◆ HTTP_CODE_400

int dc_crawler.Constants.HTTP_CODE_400 = 400

Definition at line 38 of file Constants.py.

◆ HTTP_CODE_403

int dc_crawler.Constants.HTTP_CODE_403 = 403

Definition at line 39 of file Constants.py.

◆ MAX_HTML_REDIRECTS_LIMIT

int dc_crawler.Constants.MAX_HTML_REDIRECTS_LIMIT = 1

Definition at line 22 of file Constants.py.

◆ MAX_HTTP_REDIRECTS_LIMIT

int dc_crawler.Constants.MAX_HTTP_REDIRECTS_LIMIT = 5

Definition at line 19 of file Constants.py.

◆ MAX_HTTP_SIZE_UNLIMIT

int dc_crawler.Constants.MAX_HTTP_SIZE_UNLIMIT = 0

Definition at line 20 of file Constants.py.

◆ pubdateFeedNames

list dc_crawler.Constants.pubdateFeedNames = ["pubdate", "published", "pubDate", "published_parsed", "updated_parsed"]

Definition at line 31 of file Constants.py.

◆ pubdateRssFeedHeaderName

string dc_crawler.Constants.pubdateRssFeedHeaderName = "X-pubdateRssFeed"

Definition at line 32 of file Constants.py.

◆ REDIRECT_HEADER_FIELDS_FOR_REMOVE

list dc_crawler.Constants.REDIRECT_HEADER_FIELDS_FOR_REMOVE = ['referer', 'content-type', 'Location', 'cookie']

Definition at line 42 of file Constants.py.

◆ REDIRECT_HTTP_CODES

list dc_crawler.Constants.REDIRECT_HTTP_CODES = [301, 302, 303, 304]

Definition at line 41 of file Constants.py.

◆ rssFeedUrlHeaderName

string dc_crawler.Constants.rssFeedUrlHeaderName = "X-feed_url"

Definition at line 33 of file Constants.py.

◆ RTC_FINALIZER_APP_NAME

string dc_crawler.Constants.RTC_FINALIZER_APP_NAME = "rtc-finalizer"

Definition at line 27 of file Constants.py.

◆ RTC_PREPROCESSOR_APP_NAME

string dc_crawler.Constants.RTC_PREPROCESSOR_APP_NAME = "rtc-preprocessor"

Definition at line 28 of file Constants.py.

◆ standardEncodings

dictionary dc_crawler.Constants.standardEncodings

Definition at line 52 of file Constants.py.