BLAST Parameters: Ten Things Worth Considering Before You Hit the BLAST Button
- Choose the Right Program and the Right Database
- there isn't just one blast program; for starters, you should at least know about the "big five"
- there's also some interesting variations: megablast, psi-blast (repeatedly derive scoring matrix from "cousins"), phi-blast (allow regular expressions)
- here is a small summary to help you choose
- and here is a more complete summary for deciding
- Know Where You Are
pay attention to the defaults - different for different installations (brief helps on options)
- Look Out for Really Short or Really Long Sequences
- for short (length 30 or less) sequences, blastn and blastp automatically adjust parameters, but translating programs (e.g. tblastx) do not
- some notes on dealing with short sequences
- also really long sequences can run out of memory
- Brush Up on Your Statistics
- You Can Limit Search by Organism
- options exist to include only specific organisms or taxonomic groups (on left side is Database and choose other, then appears Organism text box)
- can also exclude one or more of these via the Organism box
- Too Few or Too Many Hits
- "No significant similarity found"
- sequenced genome might have gaps
- might be too short (change default parameters)
- read the footer to see thresholds
- Know When to Cut Filters Off
by default, BLAST filters out repeats and low-complexity regions, but you may need these to find divergent pairs
- If You are in the Twilight Zone, Try a Shuffle to Check Result
twilight line is 25 percent of amino acids present, 70 percent of nucleotides present
- Be Skeptical of Hypothetical Proteins
- Consider Ungapped Alignment for blastx, tblastn, tblastx
The NCBI site has plenty of tutorials including this example-driven tutorial. Also you can grab the BLAST book once it's put on reserve in the library.