Tools‎ > ‎BLAST‎ > ‎

BLAST word-size

Length of an exact sequence match, as start region for the final alignment

 blastn  -query genes.ffn  -subject genome.fna  -word_size 11

A BLAST search starts with finding a perfect sequence match of length given by -word_size. This initial region of an exact sequence match is then extended in both direction allowing gaps and substitutions based on the scoring thresholds.

Changing the initial word-size can help to find more, but less accurate hits; or to limit the results to almost perfect hits.
  • Decreasing the word-size will increase the number of detected homologous sequences, but hits can include alignments of higher fragmentation due to gaps and substitutions (example: search for homologous genes between distant species, see also: -task blastn)
  • Increasing the word-size will give less hits as it requires a longer continuous regions of exact match. If the word-size is chosen to be almost the size of the query, BLAST will search for almost exact matches (example: search for location of gene sequences in the original genome of the gene)
For short sequences, word-size must be less than half the query length, otherwise reliable hits can be missed.

default word-sizes
nucleotide sequence search blastn with default megablast  (bastn):  -word_size 28  
nucleotide sequence search blastn only (bastn -task blastn):  -word_size 11 
amino acid search (blastp):  -word_size 3

 → BLAST command-line options

Setting the word-size to a very low value (-word_size 5)  makes a blastn search very slow.