# MalwareClassifier

MalwareClassifier is a Python package that provides a **template** for building a malware classification system with a **built-in logging system** and **configurable settings**.
It is designed to be **modular**, **extensible**, and **easy to install** using `pip`.

---

## Table of Contents

* [Installation](#installation)
* [Quick Start](#quick-start)
* [Configuration](#configuration)
* [Logging Usage](#logging-usage)
* [Publishing](#publishing)
* [License](#license)
* [Contact](#contact)

---

## Installation

### Option A: Install using pip

```bash
pip install MalwareClassifier
```

### Option B: Standard installation

```bash
# Clone the repository
git clone git@github.com:cchunhuang/MalwareClassifier.git
cd MalwareClassifier

# Install
pip install .

# Install additional dependencies if needed
pip install -r requirements.txt
```

---

## Quick Start

```python
from MalwareClassifier import MalwareClassifier, setup_logging, get_logger

class subMalwareClassifier(MalwareClassifier):
    def __init__(self, config_path="./config.json"):
        super().__init__(config_path)
        setup_logging(log_dir=self.config.folder.log)
        self.logger = get_logger(__name__)

    def getFeature(self):
        self.logger.info("Extracting features.")
    
    def getVector(self):
        self.logger.info("Vectorizing.")

    def getModel(self, action:str="train"):
        self.logger.info(f"Using model for action: {action}.")
    
    def getPrediction(self):
        self.logger.info("Predicting.")
    
if __name__ == "__main__":
    classifier = subMalwareClassifier()
    classifier.getFeature()
    classifier.getVector()
    classifier.getModel()
    classifier.getPrediction()
```

- The `MalwareClassifier` class in `MalwareClassifier.py` defines the **workflow skeleton**. Subclass it to override `getFeature()`, `getVector()`, `getModel()`, and `getPrediction()`.
- You can specify your own config_path.

---

## Configuration

The package includes a default `config.json`:

```json
{
  "file": {
    "label": "./dataset/label.csv"
  },
  "folder": {
    "log": "./output/log/",
    "dataset": "./dataset/",
    "feature": "./output/feature/",
    "vectorize": "./output/vectorize/",
    "model": "./output/model/",
    "predict": "./output/predict/"
  },
  "parameter": {
    "general": {
        "detection": true
    },
    "feature": { 
        "save": true, 
        "load": false 
    },
    "vectorize": { 
        "save": true, 
        "load": false 
    },
    "model": { 
        "save": true, 
        "load": false 
    },
    "predict": { 
        "save": true, 
        "load": false 
    }
  }
}
```

### After calling `super().__init__()`

- You can directly access the data using dot notation, such as `self.config.file.label` or `self.config.parameter.feature.save`.
- The contents of this `config.json` are fully customizable. For example, if you add `"new_dic": { "new_val": 123 }` under `general`, you can access it via `self.config.parameter.general.new_dic.new_val`.
- Folders in `folder` will be created automatically (or using `self.mkdir()`).

---

## Logging Usage

The logging system is defined in `src/MalwareClassifier/Logging.py`.

### Available functions

* `setup_logging(config=None, reset_handlers=True, log_dir=None)`
  Initialize logging with optional config overrides.
* It is recommended to use `setup_logging(log_dir=self.config.folder.log)`
* `get_logger(name=None)`
  Retrieve a logger for any module.

### Default behavior

* Logs are written both to **console** and **file**.
* Log files are automatically named as:
  `malware_classifier-YYYYMMDD-HHMMSS.log`

### Environment variables

| Variable                 | Description                                  | Example         |
| ------------------------ | -------------------------------------------- | --------------- |
| `MALCLASS_LOG_LEVEL`     | Set log level                                | `DEBUG`, `INFO` |
| `MALCLASS_LOG_FILE`      | Full path for the log file                   | `/tmp/log.txt`  |
| `MALCLASS_LOG_DIR`       | Directory for log files                      | `./output/log`  |
| `MALCLASS_LOG_FORMATTER` | Choose formatter: `basic`, `verbose`, `json` | `verbose`       |

**Note:** JSON logging requires installing [`python-json-logger`](https://pypi.org/project/python-json-logger/).

### Example usage in modules

```python
from MalwareClassifier import setup_logging, get_logger

setup_logging()
logger = get_logger(__name__)
logger.info("This is an info message")
logger.debug("This is a debug message")
```

---

## Publishing

### Check version

```
pip install -e .
python -c "import MalwareClassifier as MC; print(MC.__version__)"
```

### Publish to PyPI

```
git tag v0.1.0  # Replace the version
git push origin v0.1.0
# git action will publish it automatically.
```

### Publish to PyPI (Manual)

```bash
pip install build twine
python -m build
twine upload dist/*
```

---

## License

This project is licensed under the terms of the [MIT License](LICENSE).

---

## Contact

* **Homepage:** [https://github.com/cchunhuang/MalwareClassifier](https://github.com/cchunhuang/MalwareClassifier)
* **Issues:** [https://github.com/cchunhuang/MalwareClassifier/issues](https://github.com/cchunhuang/MalwareClassifier/issues)
