#include <Refine.hpp>

Inheritance diagram for HCE::component::Refine:

Collaboration diagram for HCE::component::Refine:

Public Member Functions
	Refine (ComponentType inType=CT_DEFAULT)
virtual	~Refine ()
Poco::SharedPtr< DataBase >	process (const Poco::SharedPtr< DataBase > inData)
Public Member Functions inherited from HCE::component::ComponentBase
	ComponentBase (ComponentType inType=CT_DEFAULT)
const std::atomic_bool &	getIsBusy ()
void	setIsBusy (bool isBusy)
virtual	~ComponentBase ()
	ComponentBase (ComponentType inType=CT_DEFAULT)
bool	getIsBusy ()
void	setIsBusy (bool isBusy)
virtual	~ComponentBase ()
Public Member Functions inherited from HCE::DataBase
	DataBase (ComponentType inType=CT_DEFAULT)
ComponentType	getType ()
virtual	~DataBase ()
	DataBase (ComponentType inType=CT_DEFAULT)
ComponentType	getType ()
virtual	~DataBase ()

Additional Inherited Members
Protected Attributes inherited from HCE::component::ComponentBase
std::atomic_bool	_isBusy
bool	_isBusy

Detailed Description

Definition at line 45 of file Refine.hpp.

Constructor & Destructor Documentation

HCE::component::Refine::Refine ( ComponentType inType = CT_DEFAULT )

< instance of the smth

Define content processing schema If input message hasn't provide it's own content processing schema Refine component apply default one:

Reduce tags from raw content
Split raw content on the tokens
Detect language's mask for each token in splitted content
Normalize Japanese tokens
Normalize European Languages tokens(Russian, English, etc.)
Part of speech of tokens
CRC64 of the normalized token's form

< tagger pos reduce

< split content into the tokens Set type of the split content on the tokens Available tokenizers:

ICU
Boost (methods: split and tokenizer)
MeCab

< or

< detect language for each token

< perform normalize for Japanese tokens

< perform normalize for other languages

< Part Of Speech

< CRC64

Definition at line 33 of file Refine.cpp.

HCE::component::Refine::~Refine ( )

virtual

Definition at line 114 of file Refine.cpp.

Member Function Documentation

Poco::SharedPtr< DataBase > HCE::component::Refine::process ( const Poco::SharedPtr< DataBase > inData )

virtual

< timer statistic

<

< main processing loop

< fill OutDataRefine

<

< for each token extracted from content

<

< cword's instance

That fields must be inserted

unsigned char black; //!< refine unsigned short simClass; //!< refine two bytes morphology ( MorphChangeGrad ) unsigned int hCrc; //!< refine CRC32 word ( CRC word for highlight on CDR ) unsigned int offset; //!< refine unsigned int sentenceNumber; //!< refine (deprecated) number word's sentence, start from begin unsigned char lingIntegrity; //!< refine valuable of the word in the content ( val/unval content ) unsigned int initWordLen; //!< refine std::string normWord; //!< refine POSMaskBitset<POS_NUM> _posMask;

< set word blacklist

< set word morphology

< set word CRC for highlighting

< set word offset

< set word's sentence number

< set word's linguistic integrity

< set init word length

< set original word form

< set normalized word form

< set Part-Of-Speech word's mask

< set word's type

< insert cword to vector

< rword's instance

That fields must be inserted

std::string _word; unsigned long long _crc64; POSMaskBitset<POS_NUM> _posMask; MorphChangeGradBitset<MCG_NUM> _morphChangeGrad;

< set word blacklist

Implements HCE::component::ComponentBase.

Definition at line 117 of file Refine.cpp.

Here is the call graph for this function:

Here is the caller graph for this function:

The documentation for this class was generated from the following files:

sources/utils/refine/src/Refine.hpp
sources/utils/refine/src/Refine.cpp

Public Member Functions

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation