Metadata-Version: 2.4
Name: bioflow-insight
Version: 1.1.5
Summary: A software to extract and analyze the structure and associated metadata from a Nextflow workflow.
Author: George Marchment
Project-URL: Homepage, https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight
Project-URL: Issues, https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight/-/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: graphviz==0.20.1
Requires-Dist: networkx~=3.2.1; python_version >= "3.9"
Requires-Dist: networkx~=3.1; python_version == "3.8"
Requires-Dist: numpy~=1.26.1; python_version >= "3.9"
Requires-Dist: numpy~=1.24.4; python_version == "3.8"
Requires-Dist: sympy==1.9
Requires-Dist: parsimonious==0.10.0
Requires-Dist: click
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: coverage; extra == "dev"
Requires-Dist: black~=23.12.0; extra == "dev"
Dynamic: license-file

# BioFlow-Insight


[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-green.svg)](https://www.gnu.org/licenses/gpl-3.0) [![Version 1.0](https://img.shields.io/badge/version-v1.0-yellow)]()



## Description

**BioFlow-Insight** is a Python-based open-source command-line tool designed to automatically analyse Nextflow workflow code, gathering useful information, particularly in the form of visual graphs that illustrate the workflow's structure and its various steps. Additionally, it is capable of detecting certain programming errors and generates a RO-Crate JSON-LD file that describes the workflow.

**BioFlow-Insight** is easily installable as a CLI (see [here](https://pypi.org/project/bioflow-insight/)). It is also freely accessible as a free [web service](https://bioflow-insight.pasteur.cloud/). For more information and to start using BioFlow-Insight, visit [here](https://bioflow-insight.pasteur.cloud/) (https://bioflow-insight.pasteur.cloud/).


## Table of Contents

- [BioFlow-Insight](#bioflow-insight)
  - [Description](#description)
  - [Table of Contents](#table-of-contents)
  - [Installation](#installation)
    - [Installing via pip](#installing-via-pip)
    - [Using from source](#using-from-source)
  - [Usage](#usage)
    - [Input](#input)
    - [Output](#output)
  - [Citing BioFlow-Insight](#citing-bioflow-insight)
  - [License](#license)
  - [Funding](#funding)

## Installation

### Installing via pip

**BioFlow-Insight** is easily installable as a CLI.

To install it using *pip*, use the following command :

```
pip install bioflow-insight
```


### Using from source

To access its source code, simply clone its GitLab repository. **BioFlow-Insight** is developed using Python 3

**BioFlow-Insight**'s dependencies are given in the `requirements.txt` file.

> Note : To install graphviz, in linux you might need to execute this command `sudo apt install graphviz`


## Usage

**BioFlow-Insight** is a Python-based open-source command-line tool designed to automatically analyse Nextflow workflow code, gathering useful information, particularly in the form of visual graphs that illustrate the workflow's structure and its various steps. Additionally, it is capable of detecting certain programming errors and generates a RO-Crate JSON-LD file that describes the workflow.


For an explanation of the different elements composing a Nextflow workflow, see [its documentation](https://www.nextflow.io/docs/latest/index.html).

The 3 different graphs generated by **BioFlow-Insight** are : 

* *Specification graph:* **BioFlow-Insight** reconstructs the workflow’s specification graph from its source code without having to execute it. The specification graph is defined as a directed graph where nodes are processes and operations, and edges are channels that are directed from one vertex to another (steps of the workflow are ordered). This graph **represents all the possible interactions between processes and operations through channels** that are defined in the workflow code. Within the specification graph, we define two types of operations: operations are categorised in two groups: the *following operations* defined as operations that have at least one input, and the *starting operations* defined as operations without any inputs.
  
* *Dependency graph:* From the specification graph, **BioFlow-Insight** also generates the dependency graph which **represents starting operations, along with processes (as nodes) and their dependencies (edges)**. This graph is obtained by removing the following operations and linking the remaining elements if a path exists between them in the original specification graph. In this representation, **the edges no longer represent interaction between its elements, but their dependencies**.  

* *Process dependency graph:* Finally **BioFlow-Insight** also generates the process dependency graph which represents only **processes (nodes) and their dependencies (edges)**. Similar to the dependency graph, this graph is constructed by removing all operations, leaving only processes, and linking them based on their dependencies in the original specification graph. Again in this representation, **the edges no longer represent interaction between its elements, but their dependencies**.

> For a more in-depth explanation of BioFlow-Insight functionnalities, visit its webpage [here](https://bioflow-insight.pasteur.cloud/specification/) (https://bioflow-insight.pasteur.cloud/specification/).

> To examplify **BioFlow-Insight** utilisation, let's use the rnaseq-nf workflow proposed by Nextflow (its source code can be found [here](https://github.com/nextflow-io/rnaseq-nf/tree/8253a586cc5a9679d37544ac54f72167cced324b)). Examples of the output are given below. 

### Input 

In this example, we are going to use the **BioFlow-Insight** tool to analyse the rna-seq workflow. After installing **BioFlow-Insight** via pip, and cloning the the rnaseq-nf repository. Simply run this command line :


```
bioflow-insight rnaseq-nf/main.nf
```

### Output

After the workflow has been analysed and the graphs generated, the outputs are saved in the `results` folder.

The structure of this folder is organised as such :

```
.
├── debug
│   ├── calls.nf
│   ├── operations_in_call.nf
│   └── operations.nf
├── graphs
│   ├── dependency_graph.dot
│   ├── dependency_graph.json
│   ├── dependency_graph.mmd
│   ├── dependency_graph.png
│   ├── dependency_graph_wo_labels.dot
│   ├── dependency_graph_wo_labels.mmd
│   ├── dependency_graph_wo_labels.png
│   ├── dependency_graph_wo_orphan_operations.dot
│   ├── dependency_graph_wo_orphan_operations.mmd
│   ├── dependency_graph_wo_orphan_operations.png
│   ├── dependency_graph_wo_orphan_operations_wo_labels.dot
│   ├── dependency_graph_wo_orphan_operations_wo_labels.mmd
│   ├── dependency_graph_wo_orphan_operations_wo_labels.png
│   ├── metadata_dependency_graph.json
│   ├── metadata_process_dependency_graph.json
│   ├── metadata_specification_graph.json
│   ├── process_dependency_graph.dot
│   ├── process_dependency_graph.json
│   ├── process_dependency_graph.mmd
│   ├── process_dependency_graph.png
│   ├── specification_graph.dot
│   ├── specification_graph.json
│   ├── specification_graph.mmd
│   ├── specification_graph.png
│   ├── specification_graph_wo_labels.dot
│   ├── specification_graph_wo_labels.mmd
│   ├── specification_graph_wo_labels.png
│   ├── specification_wo_orphan_operations.dot
│   ├── specification_wo_orphan_operations.mmd
│   ├── specification_wo_orphan_operations.png
│   ├── specification_wo_orphan_operations_wo_labels.dot
│   ├── specification_wo_orphan_operations_wo_labels.mmd
│   └── specification_wo_orphan_operations_wo_labels.png
└── ro-crate-metadata.json
```

* The `ro-crate-metadata.json` describes the workflow following an extended Workflow [RO-Crate](https://www.researchobject.org/ro-crate/) profile. The description of this extended profile can be found [here](https://gitlab.liris.cnrs.fr/sharefair/posters/swat4hcls-2024).
* the `debug` folder contains different intermediary files which are ussefull for debugging
* the `graphs` folder contains the different graphs which are generated. For each of the 3 graphs described above, **BioFlow-Insight** generates :
  * A `json` file which describes the graph using **BioFlow-Insight** specific format
  * A `json` file which describes the metadata which are extracted from the graph
  * Where possible **BioFlow-Insight** also generates the graphs without labels on the operations and channels. Additionaly there is also a variant where the orphan operations (operations which don't have any inputs or outputs) are not represented.

> For each graph **BioFlow-Insight** generates it in the `mermaid` format and the dot `dot` format. If the `render_graphs` option is set to `True`, the `png` image is also generated.

Here are some of the graphs which are generated by **BioFlow-Insight**, they are rendered using Graphviz (png).

| <img align="center" src="https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight/-/raw/main/img/specification_graph.png" >  | <img align="center" src="https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight/-/raw/main/img/dependency_graph.png">   | <img align="center" src="https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight/-/raw/main/img/process_dependency_graph.png" >   |
|:-:|:-:|---|
| Specification Graph  |  Dependency Graph | Process Dependency Graph  |


## Citing BioFlow-Insight

**Please cite BioFlow-Insight in any research that uses or extends BioFlow-Insight.**

To cite **BioFlow-Insight**, please use the following publication:

> George Marchment, Bryan Brancotte, Marie Schmit, Frédéric Lemoine, Sarah Cohen-Boulakia, BioFlow-Insight: facilitating reuse of Nextflow workflows with structure reconstruction and visualization, *NAR Genomics and Bioinformatics*, Volume 6, Issue 3, September 2024, lqae092, [https://doi.org/10.1093/nargab/lqae092](https://doi.org/10.1093/nargab/lqae092)


## License

This project is licensed under the [GNU Affero General Public License](https://www.gnu.org/licenses/agpl-3.0.en.html).


## Funding

This work received support from the National Research Agency under the France 2030 program, with reference to ANR-22-PESN-0007.

___
<br>

<img align="left" src="https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight/-/raw/main/img/logo.png" width="16%">
<img align="left" src="https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight/-/raw/main/img/paris_saclay.png" width="16%">
<img align="left" src="https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight/-/raw/main/img/lisn.png" width="16%">
<img align="left" src="https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight/-/raw/main/img/pasteur.png" width="16%">
<img align="left" src="https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight/-/raw/main/img/sharefair.png" width="16%">
<img align="left" src="https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight/-/raw/main/img/france2030.png" width="16%">

<br/><br/>
<br/><br/>

