Diagrama de herencias de vsm.model.tfidf.TfIdf

Diagrama de colaboración para vsm.model.tfidf.TfIdf:

Métodos públicos
def	__init__

def	train

Métodos públicos estáticos
def	from_tf

Atributos públicos
	context_type

	corpus

	matrix

	undefined_rows

Descripción detallada

Transforms a term-frequency model into a term-frequency
inverse-document-frequency model.

A TF-IDF model is term frequency model whose rows, corresponding
to word types, are scaled by IDF values. The idea is that a word
type which occurs in most of the contexts (i.e., documents) does
less to distinguish the contexts semantically than does a word
type which occurs in few of the contexts. The document frequency
is the number of documents in which a word occurs divided by the
number of documents. The IDF is the log of the inverse of the
document frequency.

As with a term-frequency model, word types correspond to matrix
rows and contexts correspond to matrix columns.

The data structure is a sparse float matrix.

:See Also: :class:`vsm.model.TfSeq`, :class:`vsm.model.base`,
    :class:`scipy.sparse.coo_matrix`

:notes:
    A zero in the matrix might arise in two ways: (1) the word type
    occurs in every document, in which case the IDF value is 0; (2)
    the word type occurs in no document at all, in which case the IDF
    value is undefined.

Documentación del constructor y destructor

def vsm.model.tfidf.TfIdf.__init__	(	self,
		corpus = `None`,
		context_type = `None`,
		tf_matrix = `None`
	)

Initialize TfIdf.

:param corpus: A Corpus object containing the training data.
:type corpus: Corpus
    
:param context_type: A string specifying the type of context over
    which the model trainer is applied.
:type context_type: string 

:param tf_matrix: A matrix containing the term-frequency data.
:type tf_matrix: scipy.sparse matrix

Documentación de las funciones miembro

def vsm.model.tfidf.TfIdf.from_tf ( tf_model )

static

Takes a `Tf` model object and generates a `TfIdf` model.

def vsm.model.tfidf.TfIdf.train ( self )

Computes the IDF values for the input term-frequency matrix,
scales the rows by these values and stores the results in
`self.matrix`.

La documentación para esta clase fue generada a partir del siguiente fichero:

vsm/vsm/model/tfidf.py

Métodos públicos

Métodos públicos estáticos

Atributos públicos

Descripción detallada

Documentación del constructor y destructor

Documentación de las funciones miembro