HCE Project Python language Distributed Tasks Manager Application, Distributed Crawler Application and client API bindings.  2.0.0-chaika
Hierarchical Cluster Engine Python language binding
dc_processor.alchemy_extractor.AlchemyExtractor Class Reference
Inheritance diagram for dc_processor.alchemy_extractor.AlchemyExtractor:
Collaboration diagram for dc_processor.alchemy_extractor.AlchemyExtractor:

Public Member Functions

def __init__ (self, config, templ=None, domain=None, processorProperties=None)
 
def extractTags (self, resource, reslt)
 
- Public Member Functions inherited from dc_processor.base_extractor.BaseExtractor
def __init__ (self, config, templ=None, domain=None, processorProperties=None)
 
def __str__ (self)
 
def __repr__ (self)
 
def loadScraperProperties (self, scraperPropFileName)
 
def isTagNotFilled (self, result, tagName)
 
def isTagValueNotEmpty (self, tagValue)
 
def tagValueElemValidate (self, tagValueElem, conditionElem)
 
def tagValueValidate (self, tagName, tagValue)
 
def addTag (self, result, tag_name, tag_value, xpath="", isDefaultTag=False, callAdjustment=True, tagType=None, allowNotFilled=False)
 
def calculateMetrics (self, response)
 
def rankReading (self, exctractorName)
 

Public Attributes

 name
 
- Public Attributes inherited from dc_processor.base_extractor.BaseExtractor
 config
 
 processorProperties
 
 name
 
 rank
 
 process_mode
 
 modules
 
 data
 
 db_dc_scraper_db
 
 DBConnector
 
 imgDelimiter
 
 tagsValidator
 

Additional Inherited Members

- Static Public Attributes inherited from dc_processor.base_extractor.BaseExtractor
 properties = None
 
dictionary tag
 
dictionary tagsMask
 

Detailed Description

Definition at line 21 of file alchemy_extractor.py.

Constructor & Destructor Documentation

◆ __init__()

def dc_processor.alchemy_extractor.AlchemyExtractor.__init__ (   self,
  config,
  templ = None,
  domain = None,
  processorProperties = None 
)

Definition at line 24 of file alchemy_extractor.py.

24  def __init__(self, config, templ=None, domain=None, processorProperties=None):
25  BaseExtractor.__init__(self, config, templ, domain, processorProperties)
26  self.name = CONSTS.EXTRACTOR_NAME_ALCHEMY
27  self.data["extractor"] = CONSTS.EXTRACTOR_NAME_ALCHEMY
28  logger.debug("Properties: %s", varDump(self.properties))
29 
30  # set module rank from module's properties
31  self.rankReading(self.__class__.__name__)
32 
33 
def varDump(obj, stringify=True, strTypeMaxLen=256, strTypeCutSuffix='...', stringifyType=1, ignoreErrors=False, objectsHash=None, depth=0, indent=2, ensure_ascii=False, maxDepth=10)
Definition: Utils.py:410
def __init__(self)
constructor
Definition: UIDGenerator.py:19

Member Function Documentation

◆ extractTags()

def dc_processor.alchemy_extractor.AlchemyExtractor.extractTags (   self,
  resource,
  reslt 
)

Definition at line 34 of file alchemy_extractor.py.

34  def extractTags(self, resource, reslt):
35  try:
36  logger.info("AAAAAAA")
37  parser = AlchemyAPI()
38  logger.info("BBBBBBB")
39  text = parser.text("html", resource.raw_html)
40  logger.info("CCCCCCC")
41  logger.info("Article's corpus: %s", text)
42  self.addTag(result=reslt, \
43  tag_name=CONSTS.TAG_CONTENT_UTF8_ENCODED, \
44  tag_value=text)
45  logger.info("DDDDDDD")
46  except Exception, err:
47  logger.info(varDump(err))
48  return reslt
49 
50 
51 
def varDump(obj, stringify=True, strTypeMaxLen=256, strTypeCutSuffix='...', stringifyType=1, ignoreErrors=False, objectsHash=None, depth=0, indent=2, ensure_ascii=False, maxDepth=10)
Definition: Utils.py:410
Here is the call graph for this function:

Member Data Documentation

◆ name

dc_processor.alchemy_extractor.AlchemyExtractor.name

Definition at line 26 of file alchemy_extractor.py.


The documentation for this class was generated from the following file: