highlighter application  1.1
HCE project utils : highlighter
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros
libstemmer.h File Reference
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Typedefs

typedef unsigned char sb_symbol

Functions

const char ** sb_stemmer_list (void)
struct sb_stemmer * sb_stemmer_new (const char *algorithm, const char *charenc)
void sb_stemmer_delete (struct sb_stemmer *stemmer)
const sb_symbolsb_stemmer_stem (struct sb_stemmer *stemmer, const sb_symbol *word, int size)
int sb_stemmer_length (struct sb_stemmer *stemmer)

Typedef Documentation

typedef unsigned char sb_symbol

Definition at line 7 of file libstemmer.h.

Function Documentation

void sb_stemmer_delete ( struct sb_stemmer *  stemmer)

Delete a stemmer object.

This frees all resources allocated for the stemmer. After calling this function, the supplied stemmer may no longer be used in any way.

It is safe to pass a null pointer to this function - this will have no effect.

Here is the caller graph for this function:

int sb_stemmer_length ( struct sb_stemmer *  stemmer)

Get the length of the result of the last stemmed word. This should not be called before sb_stemmer_stem() has been called.

const char** sb_stemmer_list ( void  )

Returns an array of the names of the available stemming algorithms. Note that these are the canonical names - aliases (ie, other names for the same algorithm) will not be included in the list. The list is terminated with a null pointer.

The list must not be modified in any way.

struct sb_stemmer* sb_stemmer_new ( const char *  algorithm,
const char *  charenc 
)
read

Create a new stemmer object, using the specified algorithm, for the specified character encoding.

All algorithms will usually be available in UTF-8, but may also be available in other character encodings.

Parameters
algorithmThe algorithm name. This is either the english name of the algorithm, or the 2 or 3 letter ISO 639 codes for the language. Note that case is significant in this parameter - the value should be supplied in lower case.
charencThe character encoding. NULL may be passed as this value, in which case UTF-8 encoding will be assumed. Otherwise, the argument may be one of "UTF_8", "ISO_8859_1" (ie, Latin 1), "CP850" (ie, MS-DOS Latin 1) or "KOI8_R" (Russian). Note that case is significant in this parameter.
Returns
NULL if the specified algorithm is not recognised, or the algorithm is not available for the requested encoding. Otherwise, returns a pointer to a newly created stemmer for the requested algorithm. The returned pointer must be deleted by calling sb_stemmer_delete().
Note
NULL will also be returned if an out of memory error occurs.
const sb_symbol* sb_stemmer_stem ( struct sb_stemmer *  stemmer,
const sb_symbol word,
int  size 
)

Stem a word.

The return value is owned by the stemmer - it must not be freed or modified, and it will become invalid when the stemmer is called again, or if the stemmer is freed.

The length of the return value can be obtained using sb_stemmer_length().

If an out-of-memory error occurs, this will return NULL.

Here is the caller graph for this function: