summarization.textcleaner – Summarization pre-processing¶gensim.summarization.textcleaner.clean_text_by_sentences(text)¶Tokenizes a given text into sentences, applying filters and lemmatizing them. Returns a SyntacticUnit list.
gensim.summarization.textcleaner.clean_text_by_word(text, deacc=True)¶Tokenizes a given text into words, applying filters and lemmatizing them. Returns a dict of word -> syntacticUnit.
gensim.summarization.textcleaner.get_sentences(text)¶gensim.summarization.textcleaner.join_words(words, separator=' ')¶gensim.summarization.textcleaner.merge_syntactic_units(original_units, filtered_units, tags=None)¶gensim.summarization.textcleaner.replace_abbreviations(text)¶gensim.summarization.textcleaner.replace_with_separator(text, separator, regexs)¶gensim.summarization.textcleaner.split_sentences(text)¶gensim.summarization.textcleaner.tokenize_by_word(text)¶gensim.summarization.textcleaner.undo_replacement(sentence)¶