HCE Project Python language Distributed Tasks Manager Application, Distributed Crawler Application and client API bindings.  2.0.0-chaika
Hierarchical Cluster Engine Python language binding
algorithms.MetricWCount.MetricWCount Class Reference
Inheritance diagram for algorithms.MetricWCount.MetricWCount:
Collaboration diagram for algorithms.MetricWCount.MetricWCount:

Public Member Functions

def __init__ (self, names)
 
def internalCalculating (self, dataDict, buf)
 
def precalculate (self, result, metricName)
 
- Public Member Functions inherited from algorithms.BaseMetric.BaseMetric
def __init__ (self, names)
 
def retForMultiNames (self, retDict, metricName)
 
def sortElementsByMetric (self, elements, metricName)
 
def selectElementsByMetric (self, elements, metricName, metricLimitMax, metricLimitMin)
 

Static Public Attributes

list CHAR_CATEGORIES_LIST = ['Lu', 'Ll', 'Lt', 'Lm', 'Lo', 'Nd', 'Nl', 'No']
 
list CHAR_NOT_LATIN_LIST = ['Lt', 'Lm', 'Lo']
 
string RE_SPLITTER = '\s'
 
int MIN_LATIN_WORD_LEN = 3
 
int W_TYPE_LATIN = 0
 
int W_TYPE_NOT_LATIN = 1
 
int W_TYPE_NUMBER = 2
 
int W_TYPE_BAD = 3
 

Additional Inherited Members

- Public Attributes inherited from algorithms.BaseMetric.BaseMetric
 names
 

Detailed Description

Definition at line 28 of file MetricWCount.py.

Constructor & Destructor Documentation

◆ __init__()

def algorithms.MetricWCount.MetricWCount.__init__ (   self,
  names 
)

Definition at line 45 of file MetricWCount.py.

45  def __init__(self, names):
46  super(MetricWCount, self).__init__(names)
47 
48 
def __init__(self)
constructor
Definition: UIDGenerator.py:19

Member Function Documentation

◆ internalCalculating()

def algorithms.MetricWCount.MetricWCount.internalCalculating (   self,
  dataDict,
  buf 
)

Definition at line 53 of file MetricWCount.py.

53  def internalCalculating(self, dataDict, buf):
54  if type(buf) is types.StringType:
55  buf = unicode(buf)
56  words = re.split(self.RE_SPLITTER, buf, flags=re.LOCALE)
57  for word in words:
58  wType = self.W_TYPE_LATIN
59  for ch in word:
60  chCategory = unicodedata.category(ch)
61  if chCategory in self.CHAR_CATEGORIES_LIST:
62  if chCategory in self.CHAR_NOT_LATIN_LIST:
63  wType = self.W_TYPE_NOT_LATIN
64  else:
65  wType = self.W_TYPE_BAD
66  break
67  if wType == self.W_TYPE_LATIN and len(word) < self.MIN_LATIN_WORD_LEN:
68  wType = self.W_TYPE_BAD
69  if wType != self.W_TYPE_BAD:
70  dataDict["validWordsCount"] += 1
71  dataDict["count"] += 1
72 
73 
Here is the caller graph for this function:

◆ precalculate()

def algorithms.MetricWCount.MetricWCount.precalculate (   self,
  result,
  metricName 
)

Definition at line 78 of file MetricWCount.py.

78  def precalculate(self, result, metricName):
79  ret = {"count": 0, "percent": 0, "validWordsCount": 0}
80  for key in result.tags:
81  if type(result.tags[key]) is types.DictType and "data" in result.tags[key]:
82  if type(result.tags[key]["data"]) in types.StringTypes:
83  self.internalCalculating(ret, result.tags[key]["data"])
84  elif type(result.tags[key]["data"]) is types.ListType:
85  for buf in result.tags[key]["data"]:
86  self.internalCalculating(ret, buf)
87  if ret["count"] > 0:
88  ret["percent"] = ret["validWordsCount"] * 100 / ret["count"]
89  ret = self.retForMultiNames(ret, metricName)
90  return ret
Here is the call graph for this function:

Member Data Documentation

◆ CHAR_CATEGORIES_LIST

list algorithms.MetricWCount.MetricWCount.CHAR_CATEGORIES_LIST = ['Lu', 'Ll', 'Lt', 'Lm', 'Lo', 'Nd', 'Nl', 'No']
static

Definition at line 31 of file MetricWCount.py.

◆ CHAR_NOT_LATIN_LIST

list algorithms.MetricWCount.MetricWCount.CHAR_NOT_LATIN_LIST = ['Lt', 'Lm', 'Lo']
static

Definition at line 32 of file MetricWCount.py.

◆ MIN_LATIN_WORD_LEN

int algorithms.MetricWCount.MetricWCount.MIN_LATIN_WORD_LEN = 3
static

Definition at line 34 of file MetricWCount.py.

◆ RE_SPLITTER

string algorithms.MetricWCount.MetricWCount.RE_SPLITTER = '\s'
static

Definition at line 33 of file MetricWCount.py.

◆ W_TYPE_BAD

int algorithms.MetricWCount.MetricWCount.W_TYPE_BAD = 3
static

Definition at line 39 of file MetricWCount.py.

◆ W_TYPE_LATIN

int algorithms.MetricWCount.MetricWCount.W_TYPE_LATIN = 0
static

Definition at line 36 of file MetricWCount.py.

◆ W_TYPE_NOT_LATIN

int algorithms.MetricWCount.MetricWCount.W_TYPE_NOT_LATIN = 1
static

Definition at line 37 of file MetricWCount.py.

◆ W_TYPE_NUMBER

int algorithms.MetricWCount.MetricWCount.W_TYPE_NUMBER = 2
static

Definition at line 38 of file MetricWCount.py.


The documentation for this class was generated from the following file: