Scripts: Exploration

Description:

#TODO

compare_padmet

Description:

#Compare 1-n padmet and create a folder output with files: genes.csv:

fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, ‘present’ (if in padmet_a), ‘present’ (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]
reactions.csv:
fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, ‘present’ (if in padmet_a), ‘present’ (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]
pathways.csv:
fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]
compounds.csv:
fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’‘,rxn-1,’‘]
usage:
    compare_padmet.py --padmet=FILES/DIR --output=DIR [--padmetRef=FILE] [-v]

option:
    -h --help    Show help.
    --padmet=FILES/DIR    pathname of the padmet files, sep all files by ',', ex: /path/padmet1.padmet;/path/padmet2.padmet OR a folder
    --output=DIR    pathname of the output folder
    --padmetRef=FILE    pathanme of the database ref in padmet
padmet_utils.exploration.compare_padmet.compare_padmet(padmet_path, output, padmetRef=None, verbose=False)[source]

#Compare 1-n padmet and create a folder output with files: genes.csv:

fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, ‘present’ (if in padmet_a), ‘present’ (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]
reactions.csv:
fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, ‘present’ (if in padmet_a), ‘present’ (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]
pathways.csv:
fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]
compounds.csv:
fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’‘,rxn-1,’‘]
Parameters:
  • padmet_path (str) – pathname of the padmet files, sep all files by ‘,’, ex: /path/padmet1.padmet;/path/padmet2.padmet OR a folder
  • output (str) – pathname of the output folder
  • padmetRef (padmet.classes.PadmetRef) – padmet containing the database of reference, need to calculat pathway completion rate
  • verbose (bool) – if True print information
padmet_utils.exploration.compare_padmet.main()[source]

compare_sbml

Description:

compare reactions in two sbml.

Returns if a reaction is missing

And if a reaction with the same id is using different species or different reversibility

usage:
    compare_sbml.py --sbml1=FILE --sbml2=FILE

option:
    -h --help    Show help.
    --sbml1=FILE    path of the first sbml file
    --sbml2=FILE    path of the second sbml file

compare_sbml_padmet

Description:
compare reactions in sbml and padmet file
usage:
    compare_sbml_padmet.py --padmet=FILE --sbml=FILE

option:
    -h --help    Show help.
    --padmet=FILE    path of the padmet file
    --sbml=FILE    path of the sbml file
padmet_utils.exploration.compare_sbml_padmet.compare_sbml_padmet(sbml_document, padmet)[source]

compare reactions ids in sbml vs padmet, return nb of reactions in both and reactions id not in sbml or not in padmet

Parameters:
  • padmet (padmet.classes.PadmetSpec) – padmet to udpate
  • sbml_file (libsbml.document) – sbml document
padmet_utils.exploration.compare_sbml_padmet.main()[source]

convert_sbml_db

Description:

This tool is use the MetaNetX database to check or convert a sbml. Flat files from MetaNetx are required to run this tool. They can be found in the aureme workflow or from the MetaNetx website. To use the tool set:

mnx_folder= the path to a folder containing MetaNetx flat files. the files must be named as ‘reac_xref.tsv’ and ‘chem_xref.tsv’ or set manually the different path of the flat files with:

mnx_reac= path to the flat file for reactions

mnx_chem= path to the flat file for chemical compounds (species)

To check the database used in a sbml:
to check all element of sbml (reaction and species) set:
to–map=all
to check only reaction of sbml set:
to–map=reaction
to check only species of sbml set:
to–map=species
To map a sbml and obtain a file of mapping ids to a given database set:
to-map:
as previously explained
db_out:
the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only
output:
the path to the output file

For a given sbml using a specific database.

Return a dictionnary of mapping.

the output is a file with line = reaction_id/or species in sbml, reaction_id/species in db_out database

ex:
For a sbml based on kegg database, db_out=metacyc: the output file will contains for ex:

R02283 ACETYLORNTRANSAM-RXN

usage:
    convert_sbml_db.py --mnx_reac=FILE --mnx_chem=FILE --sbml=FILE --to-map=STR [-v]
    convert_sbml_db.py --mnx_folder=DIR --sbml=FILE --to-map=STR [-v]
    convert_sbml_db.py --mnx_folder=DIR --sbml=FILE --output=FILE --db_out=ID --to-map=STR [-v]
    convert_sbml_db.py --mnx_reac=FILE --mnx_chem=FILE --sbml=FILE --output=FILE --db_out=ID --to-map=STR [-v]

options:
    -h --help     Show help.
    --to-map=STR     select the part of the sbml to check or convert, must be in ['all', 'reaction', 'species']
    --mnx_reac=FILE     path to the MetaNetX file for reactions
    --mnx_chem=FILE     path to the MetaNetX file for compounds
    --sbml=FILE     path to the sbml file to convert
    --output=FILE     path to the file containing the mapping, sep = "      "
    --db_out=FILE     id of the output database in ["BIGG","METACYC","KEGG"]
    -v     verbose.
padmet_utils.exploration.convert_sbml_db.check_sbml_db(sbml_file, to_map, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]

Check sbml database of a given sbml.

Parameters:
  • sbml_file (str) – path to the sbml file to convert
  • to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’]
  • verbose (bool) – if true: more info during process
  • mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder)
  • mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder)
  • mnx_folder (str) – the path to a folder containing MetaNetx flat files
Returns:

(name of the best matching database, dict of matching)

Return type:

tuple

padmet_utils.exploration.convert_sbml_db.get_from_mnx(mnx_dict, element_id, db_out)[source]
padmet_utils.exploration.convert_sbml_db.intern_mapping(id_to_map, db_out, _type)[source]
padmet_utils.exploration.convert_sbml_db.main()[source]
padmet_utils.exploration.convert_sbml_db.map_sbml(sbml_file, to_map, db_out, output, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]

map a sbml and obtain a file of mapping ids to a given database.

Parameters:
  • sbml_file (str) – path to the sbml file to convert
  • to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’]
  • db_out (str) – the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only
  • output (str) – path to the file containing the mapping, sep = ” “
  • verbose (bool) – if true: more info during process
  • mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder)
  • mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder)
  • mnx_folder (str) – the path to a folder containing MetaNetx flat files
Returns:

(name of the best matching database, dict of matching)

Return type:

tuple

padmet_utils.exploration.convert_sbml_db.mnx_reader(input_file, db_out)[source]

dendrogram_reactions_distance

Description:

Use reactions.csv file from compare_padmet.py to create a dendrogram using a Jaccard distance.

From the matrix absence/presence of reactions in different species computes a Jaccard distance between these species. Apply a hierarchical clustering on these data with a complete linkage. Then create a dendrogram. Apply also intervene to create an upset graph on the data.

usage:
    dendrogram_reactions_distance.py --reactions=FILE --output=FILE [--padmetRef=STR] [--pvclust] [--upset=INT] [-v]

option:
    -h --help    Show help.
    -r --reactions=FILE    pathname of the file containing reactions in each species of the comparison.
    -o --output=FOLDER    path to the output folder.
    --pvclust    launch pvclust dendrogram using R
    --padmetRef=STR    path to the padmet Ref file
    -u --upset=INT    number of cluster in the upset graph.
    -v    verbose mode.

flux_analysis

Description:
Run flux balance analyse with cobra package. If the flux is >0. Run also FVA and return result in standard output
usage:
    flux_analysis.py --sbml=FILE
    flux_analysis.py --sbml=FILE --seeds=FILE --targets=FILE
    flux_analysis.py --sbml=FILE --all_species


option:
    -h --help    Show help.
    --sbml=FILE    pathname to the sbml file to test for fba and fva.
    --seeds=FILE    pathname to the sbml file containing the seeds (medium).
    --targets=FILE    pathname to the sbml file containing the targets.
    --all_species    allow to make FBA on all the metabolites of the given model.

get_pwy_from_rxn

Description:
From a file containing a list of reaction, return the pathways where these reactions are involved. ex: if rxn-a in pwy-x => return, pwy-x; all rxn ids in pwy-x; all rxn ids in pwy-x FROM the list; ratio
usage:
    get_pwy_from_rxn.py --reaction_file=FILE --padmetRef=FILE  --output=FILE

options:
    -h --help     Show help.
    --reaction_file=FILE    pathname of the file containing the reactions id, 1/line
    --padmetRef=FILE    pathname of the padmet representing the database.
    --output=FILE    pathname of the file with line = pathway id, all reactions id, reactions ids from reaction file, ratio. sep = "        "
padmet_utils.exploration.get_pwy_from_rxn.dict_pwys_to_file(dict_pwy, output)[source]

Create csv file from dict_pwy. dict_pwy is obtained with extract_pwys()

Parameters:
  • dict_pwy (dict) – dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
  • output (str) – path to output file
padmet_utils.exploration.get_pwy_from_rxn.extract_pwys(padmet, reactions)[source]

#extract from padmet pathways containing 1-n reactions from a set of reactions ‘reactions’ Return a dict of data. dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Parameters:
  • padmet (padmet.classes.PadmetSpec) – padmet to udpate
  • reactions (set) – set of reactions to match with pathways
Returns:

dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Return type:

dict

padmet_utils.exploration.get_pwy_from_rxn.main()[source]

padmet_stats

Description:
From a file containing a list of reaction, return the pathways where these reactions are involved. ex: if rxn-a in pwy-x => return, pwy-x; all rxn ids in pwy-x; all rxn ids in pwy-x FROM the list; ratio
usage:
    get_pwy_from_rxn.py --reaction_file=FILE --padmetRef=FILE  --output=FILE

options:
    -h --help     Show help.
    --reaction_file=FILE    pathname of the file containing the reactions id, 1/line
    --padmetRef=FILE    pathname of the padmet representing the database.
    --output=FILE    pathname of the file with line = pathway id, all reactions id, reactions ids from reaction file, ratio. sep = "        "
padmet_utils.exploration.get_pwy_from_rxn.dict_pwys_to_file(dict_pwy, output)[source]

Create csv file from dict_pwy. dict_pwy is obtained with extract_pwys()

Parameters:
  • dict_pwy (dict) – dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
  • output (str) – path to output file
padmet_utils.exploration.get_pwy_from_rxn.extract_pwys(padmet, reactions)[source]

#extract from padmet pathways containing 1-n reactions from a set of reactions ‘reactions’ Return a dict of data. dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Parameters:
  • padmet (padmet.classes.PadmetSpec) – padmet to udpate
  • reactions (set) – set of reactions to match with pathways
Returns:

dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Return type:

dict

padmet_utils.exploration.get_pwy_from_rxn.main()[source]

report_network

Description:

Create reports of a padmet file.

all_pathways.tsv: header = [“dbRef_id”, “Common name”, “Number of reaction found”, “Total number of reaction”, “Ratio (Reaction found / Total)”]

all_reactions.tsv: header = [“dbRef_id”, “Common name”, “formula (with id)”, “formula (with common name)”, “in pathways”, “associated genes”]

all_metabolites.tsv: header = [“dbRef_id”, “Common name”, “Produced (p), Consumed (c), Both (cp)”]

usage:
    report_network.py --padmetSpec=FILE --output_dir=dir [--padmetRef=FILE] [-v]

options:
    -h --help     Show help.
    --padmetSpec=FILE    pathname of the padmet file.
    --padmetRef=FILE    pathname of the padmet file used as database
    --output_dir=dir    directory for the results.
    -v   print info.
padmet_utils.exploration.report_network.main()[source]

visu_path