Metadata-Version: 2.1
Name: pynysiis
Version: 1.0.6
Summary: NYSIIS phonetic encoding algorithm.
Home-page: https://finbarrs.eu/
Author: Finbarrs Oketunji
Author-email: f@finbarrs.eu
License: MIT
Project-URL: Bug Tracker, https://github.com/0xnu/nysiis/issues
Project-URL: Changes, https://github.com/0xnu/nysiis/blob/main/CHANGELOG.md
Project-URL: Documentation, https://github.com/0xnu/nysiis/blob/main/README.md
Project-URL: Source Code, https://github.com/0xnu/nysiis
Keywords: nysiis,phonetic,encoding,algorithm,name matching,fuzzy matching,sound matching
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.8
Description-Content-Type: text/x-rst
License-File: LICENSE

pynysiis
=========

.. image:: https://badge.fury.io/py/pynysiis.svg
    :target: https://badge.fury.io/py/pynysiis
    :alt: NYSIIS Python Package Version


The `pynysiis` package provides a Python implementation of the `New York State Identification and Intelligence System`_ (NYSIIS) phonetic encoding algorithm. NYSIIS encodes names based on pronunciation, which is helpful in name-matching and searching applications.

.. _New York State Identification and Intelligence System: https://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System


Requirements
-------------

Python 2.7 and later.


Setup
------

You can install this package by using the pip tool and installing:

.. code-block:: bash

	$ pip install pynysiis

Or:

.. code-block:: bash

	$ easy_install pynysiis

Basic Usage
-----------

.. code-block:: python

    from nysiis import NYSIIS

    encoder = NYSIIS()
    name = "Watkins"
    encoded_name = encoder.encode(name)
    print(encoded_name)  # Output: WATCAN

Name Comparison
---------------

.. code-block:: python

    from nysiis import NYSIIS

    encoder = NYSIIS()

    # Compare similar names
    name1 = "John Smith"
    name2 = "John Smyth"

    encoded_name1 = encoder.encode(name1)
    encoded_name2 = encoder.encode(name2)

    if encoded_name1 == encoded_name2:
        print("Names match phonetically")
    else:
        print("Names are phonetically different")

    # Output: Names match phonetically

Multi-Language Support
----------------------

The NYSIIS encoder handles names from various languages:

.. code-block:: python

    from nysiis import NYSIIS

    encoder = NYSIIS()

    # Sample names from different languages
    names = [
        # English names
        "Watkins",
        "Robert Johnson",
        
        # Yoruba name
        "Olanrewaju Akinyele",
        
        # Igbo name
        "Obinwanne Obiora",
        
        # Hausa name
        "Abdussalamu Abubakar",
        
        # Hindi name
        "Virat Kohli",
        
        # Urdu name
        "Usman Shah"
    ]

    # Process each name
    for name in names:
        encoded_name = encoder.encode(name)
        print(f"{name:<20} -> {encoded_name}")

    # Output:
    # Watkins              -> WATCAN
    # Robert Johnson       -> RABART
    # Olanrewaju Akinyele -> OLANRA
    # Obinwanne Obiora    -> OBAWAN
    # Abdussalamu Abubakar-> ABDASA
    # Virat Kohli         -> VARATC
    # Usman Shah          -> USNANS

Common Use Cases
----------------

Database Search Optimisation
----------------------------

.. code-block:: python

    def find_similar_names(search_name, database_names):
        encoder = NYSIIS()
        search_code = encoder.encode(search_name)
        
        matches = [
            name for name in database_names
            if encoder.encode(name) == search_code
        ]
        return matches

Name Deduplication
------------------

.. code-block:: python

    def find_duplicates(names):
        encoder = NYSIIS()
        encoded_names = {}
        
        for name in names:
            code = encoder.encode(name)
            encoded_names.setdefault(code, []).append(name)
            
        return {
            code: names 
            for code, names in encoded_names.items() 
            if len(names) > 1
        }

Fuzzy Name Matching
-------------------

.. code-block:: python

    def match_names(name1, name2, encoder=None):
        if encoder is None:
            encoder = NYSIIS()
            
        return encoder.encode(name1) == encoder.encode(name2)

Best Practices
--------------

Reuse the Encoder Instance
--------------------------

.. code-block:: python

    # Good - create once, use many times
    encoder = NYSIIS()
    for name in large_name_list:
        encoded = encoder.encode(name)

    # Less efficient - creating new instance repeatedly
    for name in large_name_list:
        encoded = NYSIIS().encode(name)

Handle Empty Inputs
-------------------

.. code-block:: python

    def process_name(name):
        if not name or not name.strip():
            return None
        
        encoder = NYSIIS()
        return encoder.encode(name)

Case Sensitivity
----------------

.. code-block:: python

    # The encoder handles case automatically
    encoder = NYSIIS()
    print(encoder.encode("smith"))  # Same as "SMITH"
    print(encoder.encode("SMITH"))  # Same result

Reference
----------

.. code-block:: bibtex

    @inproceedings{Rajkovic2007,
      author    = {Petar Rajkovic and Dragan Jankovic},
      title     = {Adaptation and Application of Daitch-Mokotoff Soundex Algorithm on Serbian Names},
      booktitle = {XVII Conference on Applied Mathematics},
      editors   = {D. Herceg and H. Zarin},
      pages     = {193--204},
      year      = {2007},
      publisher = {Department of Mathematics and Informatics, Novi Sad},
      url       = {https://jmp.sh/hukNujCG}
    }


Additional References
----------------------

+ `Commission Implementing Regulation (EU) 2016/480`_
+ `Commission Implementing Regulation (EU) 2023/2381`_

.. _Commission Implementing Regulation (EU) 2016/480: https://www.legislation.gov.uk/eur/2016/480/contents
.. _Commission Implementing Regulation (EU) 2023/2381: https://eur-lex.europa.eu/eli/reg_impl/2023/2381/oj


License
--------

This project is licensed under the `MIT License`_.  

.. _MIT License: https://gist.github.com/0xnu/d11da49c85eeb7272517a9010bbdf1ab


Copyright
----------

Copyright |copy| 2024 `Finbarrs Oketunji`_. All Rights Reserved.

.. |copy| unicode:: 0xA9 .. copyright sign
.. _Finbarrs Oketunji: https://finbarrs.eu/
