Metadata-Version: 2.4
Name: BatchalignHK
Version: 0.7.19.post22
Summary: Python Speech Language Sample Analysis
Author: Brian MacWhinney, Houjun Liu
Author-email: macw@cmu.edu, houjun@cmu.edu
Classifier: Development Status :: 3 - Alpha
Classifier: Topic :: Utilities
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.4
Requires-Dist: nltk>=3.8
Requires-Dist: praatio<6.1.0,>=6.0.0
Requires-Dist: torch>=2.6.0
Requires-Dist: torchaudio
Requires-Dist: opencc-python-reimplemented
Requires-Dist: pydub
Requires-Dist: plotly>=5.3.0
Requires-Dist: transformers>=4.37
Requires-Dist: tokenizers>=0.14.1
Requires-Dist: pycountry>=22.3
Requires-Dist: stanza[transformers]>=1.10.1
Requires-Dist: scipy~=1.11
Requires-Dist: rev_ai>=2.18.0
Requires-Dist: rich~=13.6
Requires-Dist: click~=8.1
Requires-Dist: matplotlib<4.0.0,>=3.8.0
Requires-Dist: pyfiglet==1.0.2
Requires-Dist: setuptools>=78.1.1
Requires-Dist: soundfile~=0.12.0
Requires-Dist: rich-click>=1.7.0
Requires-Dist: typing-extensions
Requires-Dist: num2words
Requires-Dist: tiktoken
Requires-Dist: blobfile
Requires-Dist: sentencepiece
Requires-Dist: tencentcloud-sdk-python-common
Requires-Dist: tencentcloud-sdk-python-asr
Requires-Dist: googletrans
Requires-Dist: aliyun-python-sdk-core>=2.13.3
Requires-Dist: oss2
Requires-Dist: openai-whisper>=20240930
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Provides-Extra: train
Requires-Dist: accelerate~=0.27; extra == "train"
Provides-Extra: docs
Requires-Dist: mkdocs-material; extra == "docs"
Requires-Dist: mkdocs-click; extra == "docs"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: summary

# TalkBank | Batchalign2

Welcome! **Batchalign2** is a Python suite of language sample analysis (LSA) software from the TalkBank project. It is used to interact with conversation audio files and their transcripts, and provides a whole host of analyses within this space.

The TalkBank Project, of which Batchalign is a part, is supported by NIH grant HD082736.

----

## Quick Start

The following instructions provide a quick start to installing Batchalign. For most users aiming to process CHAT and audio with Batchalign, we recommend more detailed usage instructions: for [usage](https://talkbank.org/0info/BA2-usage.pdf) and [human transcript cleanup](https://talkbank.org/0info/BA2-cleanup.pdf). The following provides a quick start guide for the program.

### Install and Update the Package
Batchalign is on PyPi (as `batchalign`). We recommend the use of UV to install Batchalign:

#### macOS / Linux

```
curl -LsSf https://astral.sh/uv/install.sh | sh
UV_PYTHON=3.11 uv tool install batchalign
```

#### Windows

```
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
uv tool install batchalign
```

### Rock and Roll
There are two main ways of interacting with Batchalign. Batchalign can be used as a program to batch-process CHAT (hence the name), or as a Python LSA library.

- to get started with the Batchalign program, [tap here](#quick-start-command-line)
- to get started on the Batchalign Library (assumes familiarity with Python), [tap here](#quick-start-python)

## Quick Start: Command Line

### Basic Usage 

Once installed, you can invoke the Batchalign program by typing `batchalign` into the Terminal (MacOS) or Command Prompt (Windows).

It is used in the following basic way:

```
batchalign [verb] [input_dir] [output_dir]
```

Where `verb` includes:

1. `transcribe` - by placing only an audio of video file (`.mp3/.mp4/.wav`) in the input directory, this function performs ASR on the audio, diarizes utterances, identifies some basic conversational features like retracing and filled pauses, and generates word-level alignments. You must supply a language code flag: `--lang=[three letter ISO language code]` for the ASR system to know what language the transcript is in. You can choose the flags `--rev` to use Rev.AI, a commercial ASR service, or `--whisper`, to use a local copy of OpenAI Whisper.
2. `align` - by placing both an audio of video file (`.mp3/.mp4/.wav`) and an *utterance-aligned* CHAT file in the input directory, this function recovers utterance-level time alignments (if they are not already annotated) and generates word-level alignments. The @Languages header in the CHAT file tells the program which language is in the transcript.
3. `morphotag` - by placing a CHAT file in the input directory, this function uses Stanford NLP Stanza to generate morphological and dependency analyses. The @Languages header in the CHAT file tells the program which language is in the transcript. You must supply a language code flag: `--lang=[three letter ISO language code]` for the alignment system to know what language the transcript is in. 
<!-- 4. `bulletize` - placing both an audio of video file (`.mp3/.mp4/.wav`) and an *unlinked* CHAT file in the input directory, generate utterance-level alignments through ASR -->

You can get a CHAT transcript to experiment with [at the TalkBank website](https://talkbank.org/), under any of the "Banks" that are available. You can also generate and parse a CHAT transcript via [the Python program](https://github.com/TalkBank/batchalign2?tab=readme-ov-file#chat).

### Sample Commands
For input files (CHAT and audio for `align`, CHAT only for `morphotag`, and audio only for `transcribe`), located in `~/ba_input` dumping the output to `~/ba_output`, one could write:

#### ASR + Segmentation

```
batchalign transcribe --lang=eng ~/ba_input ~/ba_output
```

#### morphosyntactic analysis

```
batchalign morphotag ~/ba_input ~/ba_output
```

#### forced alignment

```
batchalign align ~/ba_input ~/ba_output
```


-----

Follow instructions from

```
batchalign --help
```

and 

```
batchalign [verb] --help
```

to learn more about other options.

### Verbosity

Placing one or multiple `-v` *behind the word `batchalign`* (i.e. behind the `[verb]` will not work) increases the verbosity of Batchalign. The default mode and one `-v` will use the normal Batchalign interface, whereas Batchalign with more than 1 `-v` will switch to the text-based "logging" interface.

For instance, here is the instruction for running Batchalign to perform forced-alignment:

```
batchalign align input output
```

With one `-v`, you can get stack trace information about any files that crashes: 

```
batchalign -v align input output
```

and with two `-vv`, we will ditch the loading bar user interface and instead switch to a logging-based interface that has more information about what Batchalign is doing under the hood:

```
batchalign -vv align input output
```

## Quick Start: Python

Let's begin!

```python
import batchalign as ba
```

### Document
The `Document` is the most basic object in Bachalign. All processing pipelines expect `Document` as input, and will spit out `Document` as output.

```python
doc = ba.Document.new("Hello, this is a transcript! I have two utterances.", 
                      media_path="audio.mp3", lang="eng")

# navigating the document
first_utterance = doc[0]
first_form = doc[0][0]
the_comma = doc[0][1]

assert the_comma.text == ','
assert the_comma.type == ba.TokenType.PUNCT

# taking a transcript
sentences = doc.transcript(include_tiers=False, strip=True)
```

Notably, if you have a Document that you haven't transcribed yet, you still can make a Document!

```python
doc = ba.Document.new(media_path="audio.mp3", lang="eng")
```

### Pipelines
<!-- You can process the language samples you got (perform ASR, forced alignment, utterance segmentation, and more!) via `BatchalignPipeline`. There are two levels of access to this API: you can either create a pipeline and use our default settings, or create and customize the underlying `BatchalignEngine`s yourself to perform processing. -->

#### Quick Pipeline
Say you wanted to perform ASR, and then tag morphology of the resulting output.

```python
nlp = ba.BatchalignPipeline.new("asr,morphosyntax", lang="eng", num_speakers=2)
doc = ba.Document.new(media_path="audio.mp3", lang="eng")
doc = nlp(doc) # this is equivalent to nlp("audio.mp3"), we will make the initial doc for you

first_word_pos = doc[0][0].morphology
first_word_time = doc[0][0].time
first_utterance_time = doc[0].alignment
```

The quick API (right now) has support for the following tasks, which you can pass in a comma-separated list in the first argument:

- `asr`: ASR!
- `morphosyntax`: PoS and dependency analysis
- `fa`: Forced Alignment (require utterance-level timings already)

We will support many, many, many more tasks soon with this API. For now, to gain access to the whole suite of tools, use the second pipeline API discussed below.

#### Manual Pipeline
Batchalign ships with a plurality of engines which preform the actual processing. For instance, to recreate the demo we had above using the Engines API, we would write

```python
# ASR
whisper = ba.WhisperEngine(lang="eng")
# retracing and disfluency analysis
retrace = ba.NgramRetraceEngine()
disfluency = ba.DisfluencyReplacementEngine()
# morphosyntax
morphosyntax = ba.StanzaEngine()

# create a pipeline
nlp = ba.BatchalignPipeline(whisper, retrace, disfluency, morphosyntax)
                             
# and run it!                             
doc = nlp("audio.mp3") 
```

[Here's a list](https://github.com/TalkBank/batchalign2/blob/master/batchalign/pipelines/__init__.py) of available engines.

### Formats
We currently support reading and writing two transcript formats: TalkBank CHAT, and Praat TextGrid.

#### CHAT

Here's how to read and write a CHAT file to parse a TalkBank transcript!

```python
# reading
chat = ba.CHATFile(path="chat.cha")
doc = chat.doc

# writing
chat = ba.CHATFile(doc=doc)
chat.write("chat.cha")
```

We will automatically detect audio files located within the same directory as the CHAT file, and associate it with the Batchalign Document.

#### TextGrid

Importantly, there are two ways a TextGrid could be written: we can either place each **utterance** in an individual `IntervalTier`, or each **word** in its own `IntervalTier`; we leave that decision up to you. To learn more about TextGrid, [visit this page](https://github.com/timmahrt/praatIO).

```python
# reading; recall we can either interpret each IntervalTier as a word or utterance
tg_utterance = ba.TextGridFile("utterance", path="tg_ut.TextGrid", lang="eng")
tg_word = ba.TextGridFile("word", path="tg_w.TextGrid", lang="eng")

doc1 = tg_utterance.doc
doc2 = tg_word.doc

# writing
tg_utterance = ba.TextGridFile("utterance", doc=doc1)
tg_word = ba.TextGridFile("word", doc=doc2)

tg_utterance.write("tg_ut.TextGrid")
tg_word.write("tg_w.TextGrid")
```
## Questions?
If you have any questions or concerns, please reach out! If something isn't working right, [open an issue on GitHub](https://github.com/TalkBank/batchalign2/issues); if you need support, please feel free to email `houjun@cmu.edu` and `macw@cmu.edu`.

