grub.base

Base functionality

class grub.base.CodeSearcher(search_store: Union[str, Mapping, module] = <grub.base.DfltSearchStore object>, tfidf: sklearn.feature_extraction.text.TfidfVectorizer = TfidfVectorizer(token_pattern=re.compile('\n # Find words in a string. Order matters!\n [A-Z]+(?=[A-Z][a-z]) | # All upper case before a capitalized word\n [A-Z]?[a-z]+ | # Capitalized words / all lower case\n [A-Z]+ | # All u, re.VERBOSE)), knn: sklearn.neighbors._unsupervised.NearestNeighbors = NearestNeighbors(metric='cosine', n_neighbors=10))[source]

Text searcher with fitters

class grub.base.CodeSearcherBase(search_store: Union[str, Mapping, module] = <grub.base.DfltSearchStore object at 0x7f9ce17fee50>, tfidf: sklearn.feature_extraction.text.TfidfVectorizer = TfidfVectorizer(token_pattern=re.compile('\n # Find words in a string. Order matters!\n [A-Z]+(?=[A-Z][a-z]) | # All upper case before a capitalized word\n [A-Z]?[a-z]+ | # Capitalized words / all lower case\n [A-Z]+ | # All u, re.VERBOSE)), knn: sklearn.neighbors._unsupervised.NearestNeighbors = NearestNeighbors(metric='cosine', n_neighbors=10))[source]
class grub.base.DfltSearchStore(*args, **kwds)[source]
class grub.base.SearchStore(store, n_neighbors: int = 10, tokenizer=<function camelcase_and_underscore_tokenizer>)[source]

Build a search index for anything (that is given a mapping interface with string values).

A store is anything with a collections.Mapping interface. Typically, a store’s backend comes from local files or data-base wrapped into a mapping (see py2store for tools to do that!). For testing purposes though, we’ll use a dict here:

>>> store = {
... "Nelson Mandela": "The greatest glory in living lies not in never falling, but in rising every time we fall.",
... "Walt Disney": "The way to get started is to quit talking and begin doing.",
... "Steve Jobs": "Your time is limited, so don't waste it living someone else's life."
... }
>>> search = SearchStore(store, n_neighbors=2)  # our store is small, so need to restrict our result size to less
>>> list(search('living'))
['Steve Jobs', 'Nelson Mandela']

A SearchStore instance is picklable.

>>> import pickle
>>> unserialized_search = pickle.loads(pickle.dumps(search))
>>> list(unserialized_search('living'))
['Steve Jobs', 'Nelson Mandela']
class grub.base.SearchStoreMixin[source]
class grub.base.StoreMixin[source]
class grub.base.TextFilesSearcher(search_store: Union[str, Mapping] = <grub.base.DfltSearchStore object>, tfidf: sklearn.feature_extraction.text.TfidfVectorizer = TfidfVectorizer(token_pattern=re.compile('\n # Find words in a string. Order matters!\n [A-Z]+(?=[A-Z][a-z]) | # All upper case before a capitalized word\n [A-Z]?[a-z]+ | # Capitalized words / all lower case\n [A-Z]+ | # All u, re.VERBOSE)), knn: sklearn.neighbors._unsupervised.NearestNeighbors = NearestNeighbors(metric='cosine', n_neighbors=10))[source]

Fittable Text searcher with fitters

class grub.base.TextFilesSearcherBase(search_store: Union[str, Mapping] = <grub.base.DfltSearchStore object at 0x7f9ce17feeb0>, tfidf: sklearn.feature_extraction.text.TfidfVectorizer = TfidfVectorizer(token_pattern=re.compile('\n # Find words in a string. Order matters!\n [A-Z]+(?=[A-Z][a-z]) | # All upper case before a capitalized word\n [A-Z]?[a-z]+ | # Capitalized words / all lower case\n [A-Z]+ | # All u, re.VERBOSE)), knn: sklearn.neighbors._unsupervised.NearestNeighbors = NearestNeighbors(metric='cosine', n_neighbors=10))[source]
class grub.base.TfidfKnnFitMixin(search_store: Mapping = <grub.base.DfltSearchStore object>, tfidf: sklearn.feature_extraction.text.TfidfVectorizer = TfidfVectorizer(), knn: sklearn.neighbors._unsupervised.NearestNeighbors = NearestNeighbors(metric='cosine', n_neighbors=10))[source]
class grub.base.TfidfKnnSearcher(search_store: Mapping = <grub.base.DfltSearchStore object at 0x7f9cfe50e5b0>, tfidf: sklearn.feature_extraction.text.TfidfVectorizer = TfidfVectorizer(), knn: sklearn.neighbors._unsupervised.NearestNeighbors = NearestNeighbors(metric='cosine', n_neighbors=10))[source]
fvs(error_callback=None)[source]

The feature vectors for the search_store

grub.base.camelcase_and_underscore_tokenizer(string)[source]
>>> print(*camelcase_and_underscore_tokenizer('thereIsSomething i_wanted toShow'))
there is something i wanted to show