BLAST word-size

BLAST software tool

Length of an exact sequence match, as start region for the final alignment

 blastn  -query genes.fasta  -subject genome.fasta  -word_size 11

A BLAST search starts with finding a perfect sequence match of length given by -word_size. This initial region of an exact sequence match is then extended in both direction allowing gaps and substitutions based on the scoring thresholds.

Changing the initial word-size can help to find more, but less accurate hits; or to limit the results to almost perfect hits.

For short sequences, word-size must be less than half the query length, otherwise reliable hits can be missed.

Default word-sizes

nucleotide sequence search blastn with default megablast  (bastn):   -word_size 28   

nucleotide sequence search blastn only (bastn -task blastn):   -word_size 11  

amino acid search (blastp):   -word_size 3 

 → BLAST command-line options

Setting the word-size to a very low value ( -word_size 5 )  makes a blastn search very slow.