JSDoc: Class: BasicPreprocessor

Constructor

new BasicPreprocessor(splitOn, stopWords, punctuation) → {this}

Creates a new basic preprocessor.

Parameters:

Name	Type	Description
`splitOn`	RegExp	What to split terms on. Default is any whitespace.
`stopWords`	array	A list of common words to be skipped. Default is `[]`.
`punctuation`	RegExp	Any punctuation to be removed. Default is all common symbols on an English QWERTY keyboard.

Source:

index.js, line 39

Returns:

Type: this

Methods

appendTerm(terms, currentWord, wordOffset) → {array}

Potentially appends a new term to the terms list. This is largely an _internal_ method. This lowercases, then cleans the word. If there are any characters left post-cleaning, it will create a new `TermPosition`, and append it to the `terms` list **IN-PLACE**.

Parameters:

Name	Type	Description
`terms`	array	The existing term list.
`currentWord`	string	The word to added.
`wordOffset`	int	The offset of the word within the document.

Source:

index.js, line 83

Returns:

Type: array

clean(word) → {string}

Cleans a word. Currently, this is just stripping out basic punctuation characters.

Parameters:

Name	Type	Description
`word`	string	The word to be cleaned.

Source:

index.js, line 63

Returns:

Type: string

process(doc) → {array}

Processes a document into a list of terms (`TermPosition` objects).

Parameters:

Name	Type	Description
`doc`	string	The text to preprocess for the engine.

Source:

index.js, line 107

Returns:

Type: array