HCE Project Python language Distributed Tasks Manager Application, Distributed Crawler Application and client API bindings.  2.0.0-chaika
Hierarchical Cluster Engine Python language binding
dc_crawler.RefererHeaderResolver.RefererHeaderResolver Class Reference
Inheritance diagram for dc_crawler.RefererHeaderResolver.RefererHeaderResolver:
Collaboration diagram for dc_crawler.RefererHeaderResolver.RefererHeaderResolver:

Public Member Functions

def __init__ (self, dbWrapper=None)
 
def fetchParentUrl (self, siteId, parentMd5, dbWrapper)
 
def resolveRefererHeader (self, headers, mode, url, siteId=None, parentMd5=None, dbWrapper=None)
 

Public Attributes

 dbWrapper
 

Static Public Attributes

int MODE_NONE = 0
 
int MODE_SIMPLE = 1
 
int MODE_DOMAIN = 2
 
int MODE_PARENT = 3
 
string HEADER_NAME = "Referer"
 

Detailed Description

Definition at line 22 of file RefererHeaderResolver.py.

Constructor & Destructor Documentation

◆ __init__()

def dc_crawler.RefererHeaderResolver.RefererHeaderResolver.__init__ (   self,
  dbWrapper = None 
)

Definition at line 31 of file RefererHeaderResolver.py.

31  def __init__(self, dbWrapper=None):
32  self.dbWrapper = dbWrapper
33 
34 
def __init__(self)
constructor
Definition: UIDGenerator.py:19

Member Function Documentation

◆ fetchParentUrl()

def dc_crawler.RefererHeaderResolver.RefererHeaderResolver.fetchParentUrl (   self,
  siteId,
  parentMd5,
  dbWrapper 
)

Definition at line 40 of file RefererHeaderResolver.py.

40  def fetchParentUrl(self, siteId, parentMd5, dbWrapper):
41  ret = None
42  if siteId is not None and parentMd5 is not None and dbWrapper is not None:
43  urlStatus = dc.EventObjects.URLStatus(siteId, parentMd5)
44  urlStatus.urlType = dc.EventObjects.URLStatus.URL_TYPE_MD5
45  drceSyncTasksCoverObj = DC_CONSTS.DRCESyncTasksCover(DC_CONSTS.EVENT_TYPES.URL_STATUS, [urlStatus])
46  responseDRCESyncTasksCover = dbWrapper.process(drceSyncTasksCoverObj)
47  row = responseDRCESyncTasksCover.eventObject
48  if row is not None and len(row) > 0 and row[0] is not None:
49  ret = row[0].url
50  return ret
51 
52 
Here is the caller graph for this function:

◆ resolveRefererHeader()

def dc_crawler.RefererHeaderResolver.RefererHeaderResolver.resolveRefererHeader (   self,
  headers,
  mode,
  url,
  siteId = None,
  parentMd5 = None,
  dbWrapper = None 
)

Definition at line 61 of file RefererHeaderResolver.py.

61  def resolveRefererHeader(self, headers, mode, url, siteId=None, parentMd5=None, dbWrapper=None):
62  mode = int(mode)
63 
64  for headerName in headers:
65  if headerName.lower() == self.HEADER_NAME.lower():
66  logger.info(">>> Referer field already in dict headers")
67  return
68 
69  if mode == self.MODE_NONE:
70  pass
71  elif mode == self.MODE_SIMPLE:
72  headers[self.HEADER_NAME] = url
73  elif mode == self.MODE_DOMAIN:
74  headers[self.HEADER_NAME] = Utils.UrlParser.generateDomainUrl(url)
75  elif mode == self.MODE_PARENT:
76  parentUrl = self.fetchParentUrl(siteId, parentMd5, dbWrapper if dbWrapper is not None else self.dbWrapper)
77  headers[self.HEADER_NAME] = parentUrl if parentUrl is not None else url
78 
79 
Here is the call graph for this function:

Member Data Documentation

◆ dbWrapper

dc_crawler.RefererHeaderResolver.RefererHeaderResolver.dbWrapper

Definition at line 32 of file RefererHeaderResolver.py.

◆ HEADER_NAME

string dc_crawler.RefererHeaderResolver.RefererHeaderResolver.HEADER_NAME = "Referer"
static

Definition at line 28 of file RefererHeaderResolver.py.

◆ MODE_DOMAIN

int dc_crawler.RefererHeaderResolver.RefererHeaderResolver.MODE_DOMAIN = 2
static

Definition at line 26 of file RefererHeaderResolver.py.

◆ MODE_NONE

int dc_crawler.RefererHeaderResolver.RefererHeaderResolver.MODE_NONE = 0
static

Definition at line 24 of file RefererHeaderResolver.py.

◆ MODE_PARENT

int dc_crawler.RefererHeaderResolver.RefererHeaderResolver.MODE_PARENT = 3
static

Definition at line 27 of file RefererHeaderResolver.py.

◆ MODE_SIMPLE

int dc_crawler.RefererHeaderResolver.RefererHeaderResolver.MODE_SIMPLE = 1
static

Definition at line 25 of file RefererHeaderResolver.py.


The documentation for this class was generated from the following file: