CrocoBLAST:Terminology

From WebChem Wiki
Jump to: navigation, search

There are a few basic terms you need to keep in mind when running BLAST within CrocoBLAST.

Input file and Database

It its essence, BLAST takes an unknown nucleotide or protein sequence, tries to align it against a set of reference sequences, and then reports the score of each alignment, in an effort to help you identify the unknown sequence. In practice, this translates into taking an input file with many query sequences, and aligning each of the query sequences against a database of known sequences. Such databases are typically stored in suitable repositories such as NCBI, or may be obtained in-house.

Therefore, in order to run BLAST, you will need to specify an input file containing the query sequences, and a database file containing the reference sequences. CrocoBLAST accepts input files in FASTA and FASTQ format. BLAST uses a specific database format for database file. You may indicate the database file either in database format or in FASTA or FASTQ format, which will be converted to database format before BLAST is run. Within CrocoBLAST you may directly download databases from the NCBI server.

BLAST program

Depending on the nature of the query and reference sequences, there are several BLAST programs you may use within CrocoBLAST:

  • blastp - compares an amino acid query sequence against a protein sequence database
  • blastn - compares a nucleotide query sequence against a nucleotide sequence database
  • blastx - compares a nucleotide query sequence translated in all reading frames against a protein sequence database
  • tblastn - compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames
  • tblastx - compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database

Therefore, in order to run BLAST, you will need to indicate which BLAST program you intend to use.

BLAST options

The BLAST algorithm for sequence alignment is relatively complex, and the default settings are not always optimal for identifying suitable hits in a database or retrieving only the relevant results. You may wish to fiddle with the default BLAST settings by changing the general BLAST options, as well as the options specific to each BLAST program. Please see the NCBI web pages for a full description of accessible BLAST options.

Job

Within CrocoBLAST, a job is defined by the BLAST program (with or without non-default options), the database, the input file, and the output location (folder). When created, each job receives a unique job ID that can be referenced whenever you wish to perform an operation on that job.

Queue

All BLAST jobs created within the CrocoBLAST environment are included in a list, which we further refer to as queue. The concept of queue is useful because it allows you to plan your work in advance and manage your jobs as you need. While CrocoBLAST only runs one job at a time, all your interaction with the created jobs will be via the queue. For example, you may pause one job to obtain the partial alignment results, and start another job while you analyze the partial results of the original job. This enables you to retain the settings and progress of the original job, which you may later choose to resume.



Return to the Table of contents.