Tools‎ > ‎Sequence data‎ > ‎

NCBI ftp genome download

How to download all reference genomes of a selected species from NCBI    (Ubuntu/Linux)

1) Download list of all available reference genomes

download complete list of manually reviewed genomes (RefSeq database, subset of GenBank)
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_refseq.txt
or, download list of all available genomes (GenBank), may include bad quality genomes
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_genbank.txt

→ read more at NCBI

2) Search for available genomes of a species

Example: Eubacterium rectale  (RefSeq database, check columns 8,9,14,15,16)

grep -E 'Eubacterium.*rectale' assembly_summary_refseq.txt | cut -f 8,9,14,15,16
[Eubacterium rectale] ATCC 33656 strain=ATCC 33656 Full 2009/06/04 ASM2060v1
[Eubacterium] rectale    strain=2789STDY5608860    Full 2015/10/02 13414_6#44
[Eubacterium] rectale    strain=2789STDY5834884    Full 2015/10/02 14207_7#7
[Eubacterium] rectale    strain=2789STDY5834968    Full 2015/10/02 14207_7#91
[Eubacterium] rectale    strain=T1-815             Full 2015/10/08 T1815

3) Get FTP download link

# for selected genomes (Eubacterium rectale), get NCBI ftp download folder (column 20)
grep -E 'Eubacterium.*rectale' assembly_summary_refseq.txt | cut -f 20 > ftp_folder.txt

head ftp_folder.txt
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/020/605/GCF_000020605.1_ASM2060v1
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/404/855/GCF_001404855.1_13414_6_44
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/405/295/GCF_001405295.1_14207_7_7
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/406/375/GCF_001406375.1_14207_7_91
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/406/835/GCF_001406835.1_T1815



# extend download folder: create an exact genome (fna or gbff) download link

awk 'BEGIN{FS=OFS="/";filesuffix="genomic.fna.gz"}{ftpdir=$0;asm=$10;file=asm"_"filesuffix;print "wget "ftpdir,file}' ftp_folder.txt > download_fna_files.sh

awk 'BEGIN{FS=OFS="/";filesuffix="genomic.gbff.gz"}{ftpdir=$0;asm=$10;file=asm"_"filesuffix;print "wget "ftpdir,file}' ftp_folder.txt > download_dbff_files.sh

head download_fna_files.sh
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/020/605/GCF_000020605.1_ASM2060v1/GCF_000020605.1_ASM2060v1_genomic.fna.gz
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/404/855/GCF_001404855.1_13414_6_44/GCF_001404855.1_13414_6_44_genomic.fna.gz
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/405/295/GCF_001405295.1_14207_7_7/GCF_001405295.1_14207_7_7_genomic.fna.gz
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/406/375/GCF_001406375.1_14207_7_91/GCF_001406375.1_14207_7_91_genomic.fna.gz
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/406/835/GCF_001406835.1_T1815/GCF_001406835.1_T1815_genomic.fna.gz


4) Run download

download the .fna genome files (fasta format)

source download_fna_files.sh

ls
GCF_000020605.1_ASM2060v1_genomic.fna.gz
GCF_001404855.1_13414_6_44_genomic.fna.gz
GCF_001405295.1_14207_7_7_genomic.fna.gz
GCF_001406375.1_14207_7_91_genomic.fna.gz
GCF_001406835.1_T1815_genomic.fna.gz



# decompress genome files
gzip -d *.gz

ls
GCF_000020605.1_ASM2060v1_genomic.fna
GCF_001404855.1_13414_6_44_genomic.fna
GCF_001405295.1_14207_7_7_genomic.fna
GCF_001406375.1_14207_7_91_genomic.fna
GCF_001406835.1_T1815_genomic.fna



# get description (top line) of genome .fna files  (more metadata are in file assembly_summary_refseq.txt)

head -1 *.fna
==> GCF_000020605.1_ASM2060v1_genomic.fna <==
>NC_012781.1 Eubacterium rectale ATCC 33656, complete genome

==> GCF_001404855.1_13414_6_44_genomic.fna <==
>NZ_CYYW01000001.1 [Eubacterium] rectale strain 2789STDY5608860, whole genome shotgun sequence

==> GCF_001405295.1_14207_7_7_genomic.fna <==
>NZ_CZAJ01000001.1 [Eubacterium] rectale strain 2789STDY5834884, whole genome shotgun sequence

==> GCF_001406375.1_14207_7_91_genomic.fna <==
>NZ_CYXM01000001.1 [Eubacterium] rectale strain 2789STDY5834968, whole genome shotgun sequence

==> GCF_001406835.1_T1815_genomic.fna <==
>NZ_CVRQ01000001.1 [Eubacterium] rectale strain T1-815 genome assembly, contig: T1815_10, whole genome shotgun sequence



Alternatively: manual download

Download manually genome.fna files from the NCBI website:
https://ftp.ncbi.nlm.nih.gov/genomes/refseq/