HCE Project Python language Distributed Tasks Manager Application, Distributed Crawler Application and client API bindings.  2.0.0-chaika
Hierarchical Cluster Engine Python language binding
dc_crawler.Fetcher.ContentFetcher Class Reference
Inheritance diagram for dc_crawler.Fetcher.ContentFetcher:
Collaboration diagram for dc_crawler.Fetcher.ContentFetcher:

Public Member Functions

def open (self, url, kwargs)
 
- Public Member Functions inherited from dc_crawler.Fetcher.BaseFetcher
def __init__ (self)
 
def open (self, url, method='get', headers=None, timeout=100, allow_redirects=True, proxies=None, auth=None, data=None, log=None, allowed_content_types=None, max_resource_size=None, max_redirects=CONSTS.MAX_HTTP_REDIRECTS_LIMIT, filters=None, executable_path=None, depth=None, macro=None)
 
def should_have_meta_res (self)
 
def getDomainNameFromURL (self, url, default='')
 

Additional Inherited Members

- Static Public Member Functions inherited from dc_crawler.Fetcher.BaseFetcher
def init (dbWrapper=None, siteId=None)
 
def get_fetcher (typ, dbWrapper=None, siteId=None)
 
- Public Attributes inherited from dc_crawler.Fetcher.BaseFetcher
 connectionTimeout
 
 logger
 
- Static Public Attributes inherited from dc_crawler.Fetcher.BaseFetcher
 fetchers = None
 
int TYP_NORMAL = 1
 
int TYP_DYNAMIC = 2
 
int TYP_URLLIB = 5
 
int TYP_CONTENT = 6
 
int TYP_AUTO = 7
 
float CONNECTION_TIMEOUT = 1.0
 

Detailed Description

Definition at line 1574 of file Fetcher.py.

Member Function Documentation

◆ open()

def dc_crawler.Fetcher.ContentFetcher.open (   self,
  url,
  kwargs 
)

Definition at line 1586 of file Fetcher.py.

1586  def open(self, url, **kwargs):
1587  try:
1588  localBuf = base64.b64decode(kwargs["inputContent"])
1589  except TypeError:
1590  localBuf = kwargs["inputContent"]
1591  res = Response()
1592  res.content_size = len(localBuf)
1593  res.headers = {}
1594  res.redirects = []
1595  res.status_code = 200
1596  res.url = url
1597  res.encoding = SimpleCharsetDetector().detect(localBuf)
1598  if res.encoding is None:
1599  res.encoding = "utf-8"
1600  res.unicode_content = localBuf
1601  res.str_content = localBuf
1602  res.rendered_unicode_content = localBuf
1603 
1604  return res
1605 
1606 
1607 
1608 # #The Response class
1609 # represents an web page response

The documentation for this class was generated from the following file: