Funciones
def	export_corpus

def	import_corpus

def	import_model

def	export_model

Descripción detallada

vsm.extensions.interop.ldac

Module containing functions for import/export between VSM and lda-c, which is
the original LDA implementation referenced in Blei, Ng, and Jordan (2003). 
lda-c is available at: http://www.cs.princeton.edu/~blei/lda-c/

Documentación de las funciones

def vsm.extensions.interop.ldac.export_corpus	(	corpus,
		outfolder,
		context_type = `'document'`
	)

Converts a vsm.corpus.Corpus object into a lda-c compatible data file.
Creates two files:
1.  "vocab.txt" - contains the integer-word mappings
2.  "corpus.dat" - contains the corpus object in the format described in 
    [lda-c documentation](http://www.cs.princeton.edu/~blei/lda-c/readme.txt):

        Under LDA, the words of each document are assumed exchangeable.  Thus,
        each document is succinctly represented as a sparse vector of word
        counts. The data is a file where each line is of the form:
    
            [M] [term_1]:[count] [term_2]:[count] ...  [term_N]:[count]
    
        where [M] is the number of unique terms in the document, and the
        [count] associated with each term is how many times that term appeared
        in the document.  Note that [term_1] is an integer which indexes the
        term; it is not a string.

:param corpus: VSM Corpus object to convert to lda-c file
:type corpus: vsm.corpus.Corpus

:param outfolder: Directory to output "vocab.txt" and "corpus.dat"
:type string: path

def vsm.extensions.interop.ldac.import_corpus	(	corpusfilename,
		vocabfilename,
		context_type = `'document'`,
		path = `None`
	)

Converts an lda-c compatible data file into a VSM Corpus object.

:param corpusfilename: path to corpus file, as defined in lda-c
documentation.
:type string:

:param vocabfilename: path to vocabulary file, one word per line
:type string:

Funciones

Descripción detallada

Documentación de las funciones