[pypi-image]: https://badge.fury.io/py/cwest-polymer.svg
[pypi-url]: https://pypi.org/project/cwest-polymer/
[pypi-download]: https://static.pepy.tech/badge/cwest-polymer
[docs-image]: https://img.shields.io/badge/docs-latest-blue

# 'cwest-polymer' Polymer Analysis Package

[![PyPI Version][pypi-image]][pypi-url] [![pypi download][pypi-download]][pypi-url] 
[![DOI](https://zenodo.org/badge/851879885.svg)](https://doi.org/10.5281/zenodo.16746435)

This python package used for reading, analyzing, and interpretting polymer species within mass spectrometry data using 
fractional mass remainder (fmr), a generalized kendrick mass defect (KMD) algorithm. This is done using circular 
distance metrics with cluster analysis to classify polymer groups more efficiently. 

`cwest-polymer` (Welsh for "polymer quest") uses the data science python package[`piblin` (Welsh for 
"pipeline")](https://pypi.org/project/piblin/), which is able to comprehensively capture analytical data from a variety 
of sources along with their metadata. This packages shows basic implementations of file readers and transforms for 
polymer analysis, but can be extended to include more complex data processing and analysis pipelines. Further examples
of piblin implementations can be found in the [`hermes rheo` a rheological data analysis package](https://pypi.org/project/hermes-rheo/).

More details on these concepts can be found in the reprint below. This includes literature references to KMD for futher
background as well.
### [ASMS 2025 Poster Reprint: Improvements and Analysis of KMD](https://github.com/3mcloud/cwest-polymer/blob/main/ASMS_25_WP309_Improvements%20and%20Analysis%20of%20KMD.pdf)

# Installation
`cwest-polymer` is in PyPI. You can install it using pip:
```
pip install cwest-polymer
```


# Fractional Mass Remainder (fMR) Transforms and Clustering Results
Cluster analysis performed on .csv file imported. Generated by applying data transformation pipelines described below:
## Theoretical PEG Cluster Analysis: [data folder here](https://github.com/3mcloud/cwest-polymer/tree/main/Results/spreadsheet_results)
PEG mass values were calculated based on theoretical repeat unit and arbitrary end-groups. Transformed with scripts shown below, colored by grouping.

<table>
    <tr>
       <td>
          <img src="https://github.com/3mcloud/cwest-polymer/blob/main/Results/spreadsheet_results/plot_custom_columns_C2%20H4%20O_1.png" width="500">
       </td>
    </tr>
</table>


#
## Lipids Clusted by Alkanes (CH2) / alkenes (C2H2): [data folder here](https://github.com/3mcloud/cwest-polymer/tree/main/Results/lipid_results)
Fatty acid (FA) mass values were calculated by molecular formula. Transformed with scripts shown below, colored by grouping.


<table>
    <tr>
       <td>
          <img src="https://github.com/3mcloud/cwest-polymer/blob/main/Results/lipid_results/plot_lipid_data_alkanes_1.png" width="300">
       </td>
       <td>
          <img src="https://github.com/3mcloud/cwest-polymer/blob/main/Results/lipid_results/plot_lipid_data_alkenes_1.png" width="300">
       </td>
       <td>
          <img src="https://github.com/3mcloud/cwest-polymer/blob/main/Results/lipid_results/plot_lipid_data_dehydrogenate_1.png" width="300">
       </td>
    </tr>
</table>


# Package Parameters
#### Default repeat units based on package parameters.
```
from cwest_polymer import DEFAULT_REPEAT_UNITS

DEFAULT_REPEAT_UNITS = {
    "PEG": "C2 H4 O",
    "PPG": "C3 H6 O",
    "PTHF": "C4 H8 O",
    "PET": "C10 H8 O4",
    "PE": "C2 H4",
    "PP": "C3 H6",
    "Perfluoro": "C F2",
    "PDMS": "C2 H6 Si O",
    "BPA": "C18 H20 O3",
    "Acrylamide": "C3 H5 N O",
    "Acrylic acid": "C3 H4 O2",
    "Nylon 6 6": "C12 H22 N2 O2",
}
```

#### Create transforms with different repeat units, parsed by molmass or simply float values. 
The `repeat_unit` parameter can be a list or dictionary to supply different repeat units for the transformation 
pipeline. The list or dictionary values can be a combination of formulas (str) or mass values (float). Dictionaries 
enable custom repeat unit labels. 

Fractional values, `fractional`, default to 1 for single charged ions, but can be a list of integers for multiply 
charged species. The `default_list` parameter adds all repeat units from `DEFAULT_REPEAT_UNITS` with the supplies values.

```
from cwest_polymer import transforms

fmr_transform1 = transforms.FractionalMRTransform.create(repeat_units=['C1 H2 O3'], fractional_values=1, default_list=True)
fmr_transform2 = transforms.FractionalMRTransform.create(repeat_units=[123.45, 67.89], fractional_values=[1,2,3], default_list=False, kmd=True)
```

#### The following headers can be detected within a given spreadsheet (.csv and .xlsx)
Column headers are read using `cwest_polymer.fmr_filereaders.fmr_mass_spreadsheet_reader.MassSpreadsheetReader` class. Custom column fields can be used to modify the reader and an example of such is shown [here](https://github.com/3mcloud/cwest-polymer/tree/main/Example) 
```
from cwest_polymer import ACCEPTED_COLUMN_HEADERS

ACCEPTED_COLUMN_HEADERS = ['mass', 'mz', 'm/z', 'rt', 'retention time', 'abundance', 'intensity', 'area', 'x_pos', 'y_pos']
```

# Implementing fractional mass-remainder (fMR) polymer detection algorithm
#### Python imports of `cwest_polymer` and `piblin` for file reading, transform set-up and following data transform 
```
from cwest_polymer import MassSpreadsheetReader, transforms
from cwest_polymer import fmr_parameters as p

from pathlib import Path
import os
import pandas as pd
import numpy as np
```

#### Set parameters
```
spreadsheet_path = r"PATH/TO/DATA/FOLDER"
result_path = r"PATH/TO/RESULT/FOLDER"
ppm_tol = 10
mz_tol = 0.005
min_samples = 3

repeat_units = {
    'alkanes': 'C H2',
    'alkenes': 'C2 H2'
}
```

#### Read directories with the spreadsheet data file reader (.csv and .xlsx files)
```
data = MassSpreadsheetReader().data_from_filepath(filepath=path)
```

#### Create transform classes to calculate fMR values and determine polymer clusters
```
# transform to fMR
calc_fmr = transforms.FractionalMRTransform.create(repeat_units=repeat_units, fractional_values=1, default_list=False)

# cluster based on fMR
cluster_data = transforms.ClusterTransform.create(mz_tol=mz_tol, ppm_tol=ppm_tol, min_samples=min_samples)

# filter by group size
filter_cluster = transforms.FilterByClusterSize.create(min_samples=min_samples)

# create transformation pipeline (thus the name piblin)
pipeline = calc_fmr + cluster_data + filter_cluster
```

#### Apply transforms to data
```
# run pipeline on data
fmr_clusters = pipeline(data)
```

#### Creating results files
 - Export .csv results
 - Generate .png fmr plots 


<table>
    <tr>
       <td>
          <img src="https://github.com/3mcloud/cwest-polymer/blob/main/Results/path_to_file.png" width="400">
       </td>
    </tr>
</table>

```
results = fmr_clusters.split_by_condition_name('file_name')

for result in results:
    for measurement in result.measurements:
        if measurement.datasets[0].number_of_points() == 0:
            continue
        ru_name = measurement.details['repeat_unit_information'][0]

        # export data to .csv file
        df = pd.DataFrame(np.array(measurement.datasets[0].data_arrays).T, columns=measurement.datasets[0].data_array_names)
        df.to_csv(os.path.join(result_path, f'result_{name}_{ru_name}_filtered.csv'))

        # generate fmr plot figure
        fig, _ = measurement.visualize()
        fig.savefig(os.path.join(result_path, f'plot_{name}_{ru_name}.png'), dpi=1000, bbox_inches='tight')
```
