########################
Hotpep-Python Edition

Author: Peter Busk
Busk PK, Pilgaard B, Lezyk MJ, Meyer AS, Lange L. Homology to peptide
pattern for annotation of carbohydrate-active enzymes and prediction
of function. BMC Bioinformatics. 2017;18:214. doi:10.1186/s12859-017-1625-9.

Modified for dbCAN_meta by Tanner Yohe under
the supervision of Dr.Yin the the Yin Lab at 
Northern Illinois University

Last updated 4/23/18
#########################

REQUIREMENTS:
Hotpep-Python requires Python (versions 2 and 3 supported) and the natsort python package.

SET UP:
Create a directory within the Hotpep directory. In this directory,
copy over your input file. Split this file into the number of threads
that you would like to use. Name each file orfs#.txt where # is replaced
by the index, starting at 1 (ex. orfs1.txt, orfs2.txt, orfs3.txt, orfs4.txt
would be the input files for a 4 threaded run)

RUNNING:
python train_many_organisms_many_families.py [inputDir] [#threads] [hits] [freq]
	[inputDir] = the directory you created earlier holding the input files
	[#threads] = The number of threads to use. Must match the number of orfs#.txt
				files that you created
	[hits] = integer > 0. Represents the minimum number of unique camers
			found to annotate a gene.
	[freq] = float > 0. Represents the minimum frequency required for a
			gene to be annotated.
	example:
	python train_many_organisms_many_families.py Ecoli 8 5 5
		This will run hotpep on 8 threads against the input files found in the
		Ecoli directory with a minimum number of hits at 5 and a frequency of 5.0.

OUTPUT:
The ouput can be found in the input directory in a directory called Results
in a file named output.txt. This file has the following column headings:
CAZyFam	PPRsubFam	geneID	freq	hits	camers
