Metadata-Version: 2.1
Name: spacy-pattern-builder
Version: 0.0.1
Summary: Reverse engineer patterns for use with the SpaCy DependencyTreeMatcher
Home-page: https://github.com/cyclecycle/spacy-pattern-builder
Author: Nick Morley
Author-email: nick.morley111@gmail.com
License: MIT
Description: # SpaCy Pattern Builder
        
        Use training examples to build and refine patterns for use with SpaCy's DependencyTreeMatcher.
        
        ## Motivation
        
        Generating patterns programmatically from training data is more efficient than creating them manually.
        
        ## Installation
        
        With pip:
        
        ```bash
        pip install spacy-pattern-builder
        ```
        
        ## Usage
        
        ```python
        # Import a SpaCy model, parse a string to create a Doc object
        import en_core_web_sm
        
        text = 'We introduce efficient methods for fitting Boolean models to molecular data.'
        nlp = en_core_web_sm.load()
        doc = nlp(text)
        
        from spacy_pattern_builder import build_dependency_pattern
        
        # Provide a list of tokens we want to match.
        match_tokens = [doc[i] for i in [0, 1, 3]]  # [We, introduce, methods]
        
        ''' Note that these tokens must be fully connected. That is,
        all tokens must have a path to all other tokens in the list,
        without needing to traverse tokens outside of the list.
        Otherwise, spacy-pattern-builder will raise a TokensNotFullyConnectedError.
        You can get a connected set that includes your tokens with the following: '''
        from spacy_pattern_builder import util
        connected_tokens = util.smallest_connected_subgraph(match_tokens, doc)
        assert match_tokens == connected_tokens
        
        # Specify the token attributes / features to use
        feature_dict = {  # This here is equal to the default feature_dict
            'DEP': 'dep_',
            'TAG': 'tag_'
        }
        
        # Build the pattern
        pattern = build_dependency_pattern(doc, match_tokens, feature_dict=feature_dict)
        
        from pprint import pprint
        pprint(pattern)  # In the format consumed by SpaCy's DependencyTreeMatcher:
        '''
        [{'PATTERN': {'DEP': 'ROOT', 'TAG': 'VBP'}, 'SPEC': {'NODE_NAME': 'node1'}},
         {'PATTERN': {'DEP': 'nsubj', 'TAG': 'PRP'},
          'SPEC': {'NBOR_NAME': 'node1', 'NBOR_RELOP': '>', 'NODE_NAME': 'node0'}},
         {'PATTERN': {'DEP': 'dobj', 'TAG': 'NNS'},
          'SPEC': {'NBOR_NAME': 'node1', 'NBOR_RELOP': '>', 'NODE_NAME': 'node3'}}]
        '''
        
        # Create a matcher and add the newly generated pattern
        from spacy.matcher import DependencyTreeMatcher
        
        matcher = DependencyTreeMatcher(doc.vocab)
        matcher.add('pattern', None, pattern)
        
        # And match away
        matches = matcher(doc)
        for match_id, token_idxs in matches:
            tokens = [doc[i] for i in token_idxs]
            tokens = sorted(tokens, key=lambda w: w.i)
            print(tokens)  # [We, introduce, methods]
        
        ```
        
        ## Acknowledgements
        
        Uses:
        
        - [SpaCy](https://spacy.io)
        - [networkx](https://github.com/networkx/networkx)
Keywords: spacy-pattern-builder
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: Implementation :: PyPy
Description-Content-Type: text/markdown
