Changes

Jump to: navigation, search

CrocoBLAST:Job management

491 bytes added, 05:22, 25 July 2016
no edit summary
__NOTOC__
CrocoBLAST is built to help you plan your BLAST jobs and run them efficiently. CrocoBLAST operates with the concept of queue, which is basically a list of BLAST jobs scheduled to run. Thus, you can plan several BLAST job and let CrocoBLAST manage their execution for you.
<div class="toccolours mw-collapsible mw-collapsed">
=Create BLAST jobs=
<div class="mw-collapsible-content">
</div>
<div class="mw-collapsible mw-collapsed">
=Manage databases=
<div class="mw-collapsible-content">
To submit a BLAST job, you must specify which database you wish to align against. The first time you indicate a database for a BLAST job, CrocoBLAST will remember it and add it to its index, so that in the future it is easier for you to access this database. You can see which databases are already indexed in CrocoBLAST:
There are two ways to add a new database to the CrocoBLAST index. In both cases, you should provide a simple name for new each database, so that you may later refer this database easily whenever you need to run a BLAST job.
</div>
</div>
<div class="mw-collapsible mw-collapsed">===Retrieve database from the NCBI servers===<div class="mw-collapsible-content">
In the most typical scenario, you will use the [ftp://ftp.ncbi.nlm.nih.gov/refseq/release/ established reference sequence databases maintained by NCBI]. CrocoBLAST allows you to specify the name of such a database, and will download or update the database for you:
When adding or updating a database in this manner, you need not worry about the format of the database, as NCBI provides pre-formatted database files.
</div>
</div>
<div class="mw-collapsible mw-collapsed">===Add database from your computer===<div class="mw-collapsible-content">
If you have already downloaded the databases from NCBI, or if you do not have internet connection, you may add to the CrocoBLAST index database files stored on your computer. Remember to provide a unique and representative name for each database you add, so that it is easy to call the databases later. If the database files are appropriately formatted (e.g., psq or nsq):
CrocoBLAST -add_database --sequence_file '''protein''' <span style="color:green">fastq_file</span> <span style="color:orange">database_name</span> <span style="color:green">output_folder</span>
</code>
</div>
</div>
#===Manage CrocoBLAST queue===
The efficiency of CrocoBLAST lies in its ability to parallelize the execution of your BLAST jobs. This is related to breaking each big calculation into smaller pieces, and then organizing the execution of the pieces. Having smaller pieces means that you need less memory to run each job, and if you can analyze several pieces at once you can speed up the total calculation time. CrocoBLAST takes care of these things for you.
<div class="mw-collapsible mw-collapsed">##====Execution====<div class="mw-collapsible-content">
Say you have ''created one or more BLAST jobs'' and are ready to start munching some sequences. It's easy:
This will provide you with information regarding which jobs are queued, with full details regarding the job ID and BLAST setup, as well as a description about the progress of the alignment. The progress of each job is described in three main directions: fragmentation of the input file, alignment, and assembly of results.
##====Administration====
If you want to change anything about the queue (say, pause one job and start another, or change the order of the jobs in a queue), you need to first pause or stop the current run. Subsequently, you may perform operations like adding, removing, or reordering jobs in the queue:
CrocoBLAST -status
</code>
</div>
</div>
<div class="mw-collapsible mw-collapsed">#===Generate report===<div class="mw-collapsible-content">
CrocoBLAST logs all your activity, so that you may always review your work. To output this information:
CrocoBLAST -generate_report
</code>
</div></div>
'''Return to the [[CrocoBLAST:UserManual | Table of contents]]'''

Navigation menu