hce-node application
1.4.3
HCE Hierarchical Cluster Engine node application
|
#include <Refine.hpp>
Public Member Functions | |
Refine (ComponentType inType=CT_DEFAULT) | |
virtual | ~Refine () |
Poco::SharedPtr< DataBase > | process (const Poco::SharedPtr< DataBase > inData) |
Public Member Functions inherited from HCE::component::ComponentBase | |
ComponentBase (ComponentType inType=CT_DEFAULT) | |
const std::atomic_bool & | getIsBusy () |
void | setIsBusy (bool isBusy) |
virtual | ~ComponentBase () |
ComponentBase (ComponentType inType=CT_DEFAULT) | |
bool | getIsBusy () |
void | setIsBusy (bool isBusy) |
virtual | ~ComponentBase () |
Public Member Functions inherited from HCE::DataBase | |
DataBase (ComponentType inType=CT_DEFAULT) | |
ComponentType | getType () |
virtual | ~DataBase () |
DataBase (ComponentType inType=CT_DEFAULT) | |
ComponentType | getType () |
virtual | ~DataBase () |
Additional Inherited Members | |
Protected Attributes inherited from HCE::component::ComponentBase | |
std::atomic_bool | _isBusy |
bool | _isBusy |
Definition at line 45 of file Refine.hpp.
HCE::component::Refine::Refine | ( | ComponentType | inType = CT_DEFAULT | ) |
< instance of the smth
Define content processing schema If input message hasn't provide it's own content processing schema Refine component apply default one:
< tagger pos reduce
< split content into the tokens Set type of the split content on the tokens Available tokenizers:
< or
< or
< or
< detect language for each token
< perform normalize for Japanese tokens
< perform normalize for other languages
< Part Of Speech
< CRC64
Definition at line 33 of file Refine.cpp.
|
virtual |
Definition at line 114 of file Refine.cpp.
|
virtual |
< timer statistic
<
< main processing loop
< fill OutDataRefine
<
< for each token extracted from content
<
< cword's instance
That fields must be inserted
unsigned char black; //!< refine unsigned short simClass; //!< refine two bytes morphology ( MorphChangeGrad ) unsigned int hCrc; //!< refine CRC32 word ( CRC word for highlight on CDR ) unsigned int offset; //!< refine unsigned int sentenceNumber; //!< refine (deprecated) number word's sentence, start from begin unsigned char lingIntegrity; //!< refine valuable of the word in the content ( val/unval content ) unsigned int initWordLen; //!< refine std::string normWord; //!< refine POSMaskBitset<POS_NUM> _posMask;
< set word blacklist
< set word morphology
< set word CRC for highlighting
< set word offset
< set word's sentence number
< set word's linguistic integrity
< set init word length
< set original word form
< set normalized word form
< set Part-Of-Speech word's mask
< set word's type
< insert cword to vector
< rword's instance
That fields must be inserted
std::string _word; unsigned long long _crc64; POSMaskBitset<POS_NUM> _posMask; MorphChangeGradBitset<MCG_NUM> _morphChangeGrad;
< set word blacklist
Implements HCE::component::ComponentBase.
Definition at line 117 of file Refine.cpp.