grub.base¶
Base functionality
-
class
grub.base.CodeSearcher(search_store: Union[str, Mapping, module] = <grub.base.DfltSearchStore object>, tfidf: sklearn.feature_extraction.text.TfidfVectorizer = TfidfVectorizer(token_pattern=re.compile('\n # Find words in a string. Order matters!\n [A-Z]+(?=[A-Z][a-z]) | # All upper case before a capitalized word\n [A-Z]?[a-z]+ | # Capitalized words / all lower case\n [A-Z]+ | # All u, re.VERBOSE)), knn: sklearn.neighbors._unsupervised.NearestNeighbors = NearestNeighbors(metric='cosine', n_neighbors=10))[source]¶ Text searcher with fitters
-
class
grub.base.CodeSearcherBase(search_store: Union[str, Mapping, module] = <grub.base.DfltSearchStore object at 0x7f9ce17fee50>, tfidf: sklearn.feature_extraction.text.TfidfVectorizer = TfidfVectorizer(token_pattern=re.compile('\n # Find words in a string. Order matters!\n [A-Z]+(?=[A-Z][a-z]) | # All upper case before a capitalized word\n [A-Z]?[a-z]+ | # Capitalized words / all lower case\n [A-Z]+ | # All u, re.VERBOSE)), knn: sklearn.neighbors._unsupervised.NearestNeighbors = NearestNeighbors(metric='cosine', n_neighbors=10))[source]¶
-
class
grub.base.SearchStore(store, n_neighbors: int = 10, tokenizer=<function camelcase_and_underscore_tokenizer>)[source]¶ Build a search index for anything (that is given a mapping interface with string values).
A store is anything with a
collections.Mappinginterface. Typically, a store’s backend comes from local files or data-base wrapped into a mapping (seepy2storefor tools to do that!). For testing purposes though, we’ll use adicthere:>>> store = { ... "Nelson Mandela": "The greatest glory in living lies not in never falling, but in rising every time we fall.", ... "Walt Disney": "The way to get started is to quit talking and begin doing.", ... "Steve Jobs": "Your time is limited, so don't waste it living someone else's life." ... } >>> search = SearchStore(store, n_neighbors=2) # our store is small, so need to restrict our result size to less >>> list(search('living')) ['Steve Jobs', 'Nelson Mandela']
A
SearchStoreinstance is picklable.>>> import pickle >>> unserialized_search = pickle.loads(pickle.dumps(search)) >>> list(unserialized_search('living')) ['Steve Jobs', 'Nelson Mandela']
-
class
grub.base.TextFilesSearcher(search_store: Union[str, Mapping] = <grub.base.DfltSearchStore object>, tfidf: sklearn.feature_extraction.text.TfidfVectorizer = TfidfVectorizer(token_pattern=re.compile('\n # Find words in a string. Order matters!\n [A-Z]+(?=[A-Z][a-z]) | # All upper case before a capitalized word\n [A-Z]?[a-z]+ | # Capitalized words / all lower case\n [A-Z]+ | # All u, re.VERBOSE)), knn: sklearn.neighbors._unsupervised.NearestNeighbors = NearestNeighbors(metric='cosine', n_neighbors=10))[source]¶ Fittable Text searcher with fitters
-
class
grub.base.TextFilesSearcherBase(search_store: Union[str, Mapping] = <grub.base.DfltSearchStore object at 0x7f9ce17feeb0>, tfidf: sklearn.feature_extraction.text.TfidfVectorizer = TfidfVectorizer(token_pattern=re.compile('\n # Find words in a string. Order matters!\n [A-Z]+(?=[A-Z][a-z]) | # All upper case before a capitalized word\n [A-Z]?[a-z]+ | # Capitalized words / all lower case\n [A-Z]+ | # All u, re.VERBOSE)), knn: sklearn.neighbors._unsupervised.NearestNeighbors = NearestNeighbors(metric='cosine', n_neighbors=10))[source]¶
-
class
grub.base.TfidfKnnFitMixin(search_store: Mapping = <grub.base.DfltSearchStore object>, tfidf: sklearn.feature_extraction.text.TfidfVectorizer = TfidfVectorizer(), knn: sklearn.neighbors._unsupervised.NearestNeighbors = NearestNeighbors(metric='cosine', n_neighbors=10))[source]¶
-
class
grub.base.TfidfKnnSearcher(search_store: Mapping = <grub.base.DfltSearchStore object at 0x7f9cfe50e5b0>, tfidf: sklearn.feature_extraction.text.TfidfVectorizer = TfidfVectorizer(), knn: sklearn.neighbors._unsupervised.NearestNeighbors = NearestNeighbors(metric='cosine', n_neighbors=10))[source]¶