topic_coherence.probability_estimation – Probability estimation module¶This module contains functions to perform segmentation on a list of topics.
gensim.topic_coherence.probability_estimation.p_boolean_document(corpus, segmented_topics)¶This function performs the boolean document probability estimation. Boolean document estimates the probability of a single word as the number of documents in which the word occurs divided by the total number of documents.
| Parameters: |
|
|---|---|
| Returns: |
|
| Return type: | accumulator |
gensim.topic_coherence.probability_estimation.p_boolean_sliding_window(texts, segmented_topics, dictionary, window_size, processes=1)¶This function performs the boolean sliding window probability estimation. Boolean sliding window determines word counts using a sliding window. The window moves over the documents one word token per step. Each step defines a new virtual document by copying the window content. Boolean document is applied to these virtual documents to compute word probabilities.
| Parameters: |
|
|---|---|
| Returns: |
|
| Return type: | accumulator |
gensim.topic_coherence.probability_estimation.unique_ids_from_segments(segmented_topics)¶Return the set of all unique ids in a list of segmented topics.
| Parameters: | segmented_topics – list of tuples of (word_id_set1, word_id_set2). Each word_id_set is either a single integer, or a numpy.ndarray of integers. |
|---|---|
| Returns: | set of unique ids across all topic segments. |
| Return type: | unique_ids |