Inheritance diagram for dc_processor.boilerpipe_extractor.BoilerpipeExtractor:

Collaboration diagram for dc_processor.boilerpipe_extractor.BoilerpipeExtractor:

Public Member Functions
def	__init__ (self, config, templ=None, domain=None, processorProperties=None)

def	extractTags (self, resource, reslt)

Public Member Functions inherited from dc_processor.base_extractor.BaseExtractor
def	__init__ (self, config, templ=None, domain=None, processorProperties=None)

def	__str__ (self)

def	__repr__ (self)

def	loadScraperProperties (self, scraperPropFileName)

def	isTagNotFilled (self, result, tagName)

def	isTagValueNotEmpty (self, tagValue)

def	tagValueElemValidate (self, tagValueElem, conditionElem)

def	tagValueValidate (self, tagName, tagValue)

def	addTag (self, result, tag_name, tag_value, xpath="", isDefaultTag=False, callAdjustment=True, tagType=None, allowNotFilled=False)

def	calculateMetrics (self, response)

def	rankReading (self, exctractorName)

Public Attributes
	name

Public Attributes inherited from dc_processor.base_extractor.BaseExtractor
	config

	processorProperties

	name

	rank

	process_mode

	modules

	data

	db_dc_scraper_db

	DBConnector

	imgDelimiter

	tagsValidator

Additional Inherited Members
Static Public Attributes inherited from dc_processor.base_extractor.BaseExtractor
	properties = None

dictionary	tag

dictionary	tagsMask

Detailed Description

Definition at line 22 of file boilerpipe_extractor.py.

Constructor & Destructor Documentation

◆ init()

def dc_processor.boilerpipe_extractor.BoilerpipeExtractor.__init__	(	self,
		config,
		templ = `None`,
		domain = `None`,
		processorProperties = `None`
	)

Definition at line 25 of file boilerpipe_extractor.py.

   def __init__(self, config, templ=None, domain=None, processorProperties=None):
     BaseExtractor.__init__(self, config, templ, domain, processorProperties)
     self.name = CONSTS.EXTRACTOR_NAME_BOILERPIPE
     self.data["extractor"] = CONSTS.EXTRACTOR_NAME_BOILERPIPE
     logger.debug("Properties: %s", varDump(self.properties))
 
     self.rankReading(self.__class__.__name__)
 
 

Member Function Documentation

◆ extractTags()

def dc_processor.boilerpipe_extractor.BoilerpipeExtractor.extractTags	(	self,
		resource,
		reslt
	)

Definition at line 34 of file boilerpipe_extractor.py.

   def extractTags(self, resource, reslt):
     try:
       extractor = Extractor(extractor='ArticleExtractor', html=resource.raw_html)
       text = extractor.getText()
       logger.info("Article's corpus: %s", text)
       self.addTag(result=reslt, tag_name=CONSTS.TAG_CONTENT_UTF8_ENCODED, tag_value=text)
     except Exception, err:
       ExceptionLog.handler(logger, err, 'extractTags:', (err), \
                            {ExceptionLog.LEVEL_NAME_ERROR:ExceptionLog.LEVEL_VALUE_DEBUG})
     return reslt
 

Here is the call graph for this function:

Member Data Documentation

◆ name

dc_processor.boilerpipe_extractor.BoilerpipeExtractor.name

Definition at line 27 of file boilerpipe_extractor.py.

The documentation for this class was generated from the following file:

sources/hce/dc_processor/boilerpipe_extractor.py

Public Member Functions

Public Attributes

Additional Inherited Members