Source code for padmet_utils.connection.extract_orthofinder

# -*- coding: utf-8 -*-
"""
Description:
    After running orthofinder on n fasta file, read the output file 'Orthogroups.csv'
    
    Require a folder 'orthology_based_folder' with this archi:
    
        |-- model_a
            -- model_a.sbml
        |-- model_b
            --model_b.sbml

    And the name of the studied organism 'study_id'

    1. Read the orthogroups file, extract orthogroups in dict 'all_orthogroups', and all org names

    2. In orthology folder search for sbml files 'extension = .sbml'

    3. For each models regroup all information in a dict dict_data:
        
        {'study_id': study_id,
        'model_id' : model_id,
        'sbml_template': path to sbml of model',
        'output': path to the output sbml,
        'verbose': bool, if true print information
        }

        The output is by default:
            \output_orthofinder_from_'model_id'.sbml

    4. Store all previous dict_data in a list all_dict_data

    5. iter on dict from all_dict_data and use function dict_data_to_sbml

    Use a dict of data dict_data and dict of orthogroups dict_orthogroup to create sbml files.
    
    dict_data and dict_orthogroup are obtained with fun orthofinder_to_sbml
    
    6./ Read dict_orthogroups and check if model associated to dict_data and study org share orthologue
    
    7./ Read sbml of model, parse all reactions and get genes associated to reaction.
    
    8./ For each reactions:
        
        Parse genes associated to sub part (ex: (gene-a and gene-b) or gene-c) = [(gene-a,gene-b), gene-c]
        
        Check if study org have orthologue with at least one sub part (gene-a, gene-b) or gene-c
        
        if yes: add the reaction to the new sbml and change genes ids by study org genes ids
        
        Create the new sbml file.
    
::
   
    usage:
        extract_orthofinder --sbml=FILE/DIR --orthologues=DIR --study_id=STR --output=DIR [--workflow=STR] [-v]
        extract_orthofinder --sbml=DIR --orthogroups=FILE --study_id=STR --output=DIR [--workflow=STR] [-v]

    option:
        -h --help    Show help.
        --sbml=DIR   Folder with sub folder named as models name within sbml file name as model_name.sbml
        --orthogroups=FILE   Output file of Orthofinder run Orthogroups.tsv
        --orthologues=DIR   Output directory of Orthofinder run Orthologues
        --study_id=ID   name of the studied organism
        --workflow=ID   worklow id in ['aureme','aucome']. specific run architecture where to search sbml files
       --output=DIR   folder where to create all sbml output files
        -v   print info

"""
import docopt
from padmet.utils.connection import extract_orthofinder


[docs]def main(): args = docopt.docopt(__doc__) verbose = args["-v"] sbml = args["--sbml"] orthogroups_file = args["--orthogroups"] orthologue_folder = args["--orthologues"] output_folder = args["--output"] study_id = args["--study_id"] workflow = args["--workflow"] all_model_sbml = extract_orthofinder.get_sbml_files(sbml, workflow, verbose) if orthogroups_file: extract_orthofinder.orthogroups_to_sbml(orthogroups_file, all_model_sbml, output_folder, study_id, verbose) elif orthologue_folder: extract_orthofinder.orthologue_to_sbml(orthologue_folder, all_model_sbml, output_folder, study_id, verbose)
if __name__ == "__main__": main()