Sistema de Consulta Abierta
Sistema de consulta abierta con módulo de análisis semántico
 Todo Clases Namespaces Funciones Variables Páginas
Funciones
Referencia del Namespace vsm.extensions.interop.ldac

Funciones

def export_corpus
 
def import_corpus
 
def import_model
 
def export_model
 

Descripción detallada

vsm.extensions.interop.ldac

Module containing functions for import/export between VSM and lda-c, which is
the original LDA implementation referenced in Blei, Ng, and Jordan (2003). 
lda-c is available at: http://www.cs.princeton.edu/~blei/lda-c/

Documentación de las funciones

def vsm.extensions.interop.ldac.export_corpus (   corpus,
  outfolder,
  context_type = 'document' 
)
Converts a vsm.corpus.Corpus object into a lda-c compatible data file.
Creates two files:
1.  "vocab.txt" - contains the integer-word mappings
2.  "corpus.dat" - contains the corpus object in the format described in 
    [lda-c documentation](http://www.cs.princeton.edu/~blei/lda-c/readme.txt):

        Under LDA, the words of each document are assumed exchangeable.  Thus,
        each document is succinctly represented as a sparse vector of word
        counts. The data is a file where each line is of the form:
    
            [M] [term_1]:[count] [term_2]:[count] ...  [term_N]:[count]
    
        where [M] is the number of unique terms in the document, and the
        [count] associated with each term is how many times that term appeared
        in the document.  Note that [term_1] is an integer which indexes the
        term; it is not a string.

:param corpus: VSM Corpus object to convert to lda-c file
:type corpus: vsm.corpus.Corpus

:param outfolder: Directory to output "vocab.txt" and "corpus.dat"
:type string: path
def vsm.extensions.interop.ldac.import_corpus (   corpusfilename,
  vocabfilename,
  context_type = 'document',
  path = None 
)
Converts an lda-c compatible data file into a VSM Corpus object.

:param corpusfilename: path to corpus file, as defined in lda-c
documentation.
:type string:

:param vocabfilename: path to vocabulary file, one word per line
:type string: