attc
- integron_finder.attc.find_attc_max(integrons, replicon, distance_threshold, model_attc_path, max_attc_size, min_attc_size, evalue_attc=1.0, circular=True, out_dir='.', cmsearch_bin='cmsearch', cpu=1)[source]
Look for attC site with cmsearch –max option which remove all heuristic filters. As this option make the algorithm way slower, we only run it in the region around a hit. We call it local_max or eagle_eyes.
Default hit
attC __________________-->____-->_________-->_____________ ______<--------______________________________________ intI ^-------------------------------------^ Search-space with --local_maxUpdated hit
attC *** *** __________________-->____-->___-->___-->___-->_______ ______<--------______________________________________ intI- Parameters:
integrons (list of
Integronobjects.) – the integrons may contain or not attC or intI.replicon (
Bio.Seq.SeqRecordobject.) – replicon where the integrons were found (genomic fasta file).distance_threshold (int) – the maximal distance between 2 elements to aggregate them.
evalue_attc (float) – evalue threshold to filter out hits above it.
model_attc_path (str) – path to the attc model (Covariance Matrix).
max_attc_size (int) – maximum value for the attC size.
min_attc_size (int) – minimum value for the attC size.
circular (bool) – True if replicon is circular, False otherwise.
out_dir (str) – The directory where to write results used indirectly by some called functions as
infernal.local_max()or infernal.expand.cmsearch_bin (str) – The path to the cmsearch_bin binary to use
cpu (int) – call local_max with the right number of cpu
- Returns:
a table of attC site
- Return type:
pd.DataFrameobject with monotonic indexes
- integron_finder.attc.search_attc(attc_df, keep_palindromes, dist_threshold, replicon_size, rep_topology)[source]
Parse the attc data set (sorted along start site) for the given replicon and return list of arrays. One array is composed of attC sites on the same strand and separated by a distance less than dist_threshold.
- Parameters:
attc_df (
pandas.DataFrame)keep_palindromes (bool) – True if the palindromes must be kept in attc result, False otherwise
dist_threshold (int) – the maximal distance between 2 elements to aggregate them
replicon_size (int) – the replicon number of base pair
rep_topology (str) – the replicon topology should be ‘lin’ or ‘circ’
- Returns:
a list attC sites found on replicon
- Return type:
list of
pandas.DataFrameobjects