infernal
- integron_finder.infernal.expand(replicon, window_beg, window_end, max_elt, circular, dist_threshold, model_attc_path, max_attc_size=200, min_attc_size=40, evalue_attc=1.0, search_left=False, search_right=False, out_dir='.', cpu=1, cmsearch_bin='cmsearch')[source]
for a given element, we can search on the left hand side (if integrase is on the right for instance) or right hand side (opposite situation) or both side (only integrase or only attC sites)
- Parameters:
replicon (a
Bio.Seq.SeqRecordobject.) – The Replicon to annotatewindow_beg (int) – start of window to search for attc (position of protein)
window_end (int) – end of window to search for attc (position of protein)
max_elt (
pandas.DataFrameobject) –DataFrame with columns:
Accession_number cm_attC cm_debut cm_fin pos_beg pos_end sens evalue
and each row is an occurrence of attc site
df_max (
pandas.DataFrameobject) –DataFrame with columns
Accession_number cm_attC cm_debut cm_fin pos_beg pos_end sens evalue
and each row is an occurrence of attc site
circular (bool) – True if replicon topology is circular otherwise False.
dist_threshold (int) – Two elements are aggregated if they are distant of dist_threshold [4kb] or less
max_attc_size (int) – The maximum value for the attC size
min_attc_size (int) – The minimum value for the attC size
model_attc_path (str) – the path to the attc model file
evalue_attc (float) – evalue threshold to filter out hits above it
search_left (bool) – trigger the local_max search on the left of the already detected element
search_right (bool) – trigger the local_max search on the right of the already detected element
out_dir (str) – The path to directory where to write results
cpu (int) – the number of cpu use by expand
cmsearch_bin (str) – the path to the cmsearch binary to use
- Returns:
a copy of max_elt with attC hits
- Return type:
pandas.DataFrameobject
- integron_finder.infernal.find_attc(replicon_path, replicon_id, cmsearch_path, out_dir, model_attc, incE=1.0, cpu=1)[source]
Call cmsearch to find attC sites in a single replicon.
- Parameters:
replicon_path (str) – the path of the fasta file representing the replicon to analyse.
replicon_id (str) – the id of the replicon to analyse.
cmsearch_path (str) – the path to the cmsearch executable.
out_dir (str) – the path to the directory where cmsearch outputs will be stored.
model_attc (str) – path to the attc model (Covariance Matrix).
incE (float) – consider sequences <= this E-value threshold as significant (to get the alignment with -A)
cpu (int) – the number of cpu used by cmsearch.
- Returns:
None, the results are written on the disk.
- Raises:
RuntimeError – when cmsearch run failed.
- integron_finder.infernal.local_max(replicon, window_beg, window_end, model_attc_path, strand_search='both', evalue_attc=1.0, max_attc_size=200, min_attc_size=40, cmsearch_bin='cmsearch', out_dir='.', cpu=1)[source]
- Parameters:
replicon (
Bio.Seq.SeqRecordobject.) – The replicon to analysewindow_beg (int) – Start of window to search for attc (position of protein).
window_end (int) – End of window to search for attc (position of protein).
model_attc_path (str) – The path to the covariance model for attc (eg: attc_4.cm) used by cmsearch to find attC sites
strand_search (str) –
The strand on which to looking for attc. Available values:
’top’: Only search the top (Watson) strand of target sequences.
’bottom’: Only search the bottom (Crick) strand of target sequences
’both’: search on both strands
evalue_attc (float) – evalue threshold to filter out hits above it
max_attc_size (int) – The maximum value fot the attC size
min_attc_size (int) – The minimum value fot the attC size
cmsearch_bin (str) – The path to cmsearch
out_dir (str) – The path to directory where to write results
cpu (int) – The number of cpu used by cmsearch
- Returns:
DataFrame with same structure as the DataFrame returns by
read_infernal()where position are converted on position on replicon and attc are filtered by evalue, min_attc_size, max_attc_size also write a file with intermediate results <replicon_id>_subseq_attc_table_end.res this file store the local_max results before filtering by max_attc_size and min_attc_size- Return type:
pandas.DataFrameobject
- integron_finder.infernal.read_infernal(infile, replicon_id, replicon_size, len_model_attc, evalue=1, size_max_attc=200, size_min_attc=40)[source]
Function that parse cmsearch –tblout output and returns a pandas DataFrame
- Parameters:
infile (str) – the path to the output of cmsearch in tabulated format (–tblout)
replicon_id (str) – the id of the replicon are the integrons were found.
len_model_attc (int) – the length of the attc model
evalue (float) – evalue threshold to filter out hits above it
size_max_attc (int) – The maximum value fot the attC size
size_min_attc (int) – The minimum value fot the attC size
- Returns:
table with columns:
”Accession_number”, “cm_attC”, “cm_debut”, “cm_fin”, “pos_beg”, “pos_end”, “sens”, “evalue”and each row is a hit that match the attc covariance model.- Return type:
pandas.DataFrameobject