"An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph"
Microbial community assembly (metagenomics)
http://www.ncbi.nlm.nih.gov/pubmed/25609793
Install
https://github.com/voutcn/megahit
Example
Input: metagenomics sample as paired-end fastq files _R1 and _R2
megahit -1 SAMPLE_R1.fastq.gz -2 SAMPLE_R2.fastq.gz -t 12 -o megahit_result
-t 12 use 12 threads (number of parallel processors)
-m 0.5 use 50% of available memory (default: 90%, -m 0.9)
Result: assembled contigs are in fasta file:
megahit_result/final.contigs.fa
Intro & Tutorial
https://github.com/voutcn/megahit
https://github.com/voutcn/megahit/wiki/An-example-of-real-assembly
Memory settings
https://github.com/voutcn/megahit/wiki/MEGAHIT-Memory-setting
Help
megahit -h
MEGAHIT v1.0.2
Copyright (c) The University of Hong Kong & L3 Bioinformatics Limited
contact: Dinghua Li <dhli@cs.hku.hk>
Usage:
megahit [options] {-1 <pe1> -2 <pe2> | --12 <pe12> | -r <se>} [-o <out_dir>]
Input options that can be specified for multiple times (supporting
plain text and gz/bz2 extensions)
-1 <pe1> comma-separated list of fasta/q paired-end #1 files,
paired with files in <pe2>
-2 <pe2> comma-separated list of fasta/q paired-end #2 files,
paired with files in <pe1>
--12 <pe12> comma-separated list of interleaved fasta/q
paired-end files
-r/--read <se> comma-separated list of fasta/q single-end files
Input options that can be specified for at most ONE time (not recommended):
--input-cmd <cmd> command that outputs fasta/q reads to stdout;
taken by MEGAHIT as SE reads
Optional Arguments:
Basic assembly options:
--min-count <int> minimum multiplicity for filtering (k_min+1)-mers, default 2
--k-min <int> minimum kmer size (<= 127), must be odd number, default 21
--k-max <int> maximum kmer size (<= 127), must be odd number, default 99
--k-step <int> increment of kmer size of each iteration (<= 28),
must be even number, default 20
--k-list <int,int,..> comma-separated list of kmer size (all must be odd,
in the range 15-127, increment <= 28);
override `--k-min', `--k-max' and `--k-step'
Advanced assembly options:
--no-mercy do not add mercy kmers
--no-bubble do not merge bubbles
--merge-level <l,s> merge complex bubbles of length <= l*kmer_size
and similarity >= s, default 20,0.98
--prune-level <int> strength of local low depth pruning (0-2), default 2
--low-local-ratio <float> ratio threshold to define low local coverage
contigs, default 0.2
--max-tip-len <int> remove tips less than this value; default 2*k for
iteration of kmer_size=k
--no-local disable local assembly
--kmin-1pass use 1pass mode to build SdBG of k_min
Presets parameters:
--presets <str> override a group of parameters;
possible values:
meta '--min-count 2 --k-list 21,41,61,81,99'
(generic metagenomes, default)
meta-sensitive '--min-count 2 --k-list 21,31,41,51,61,71,81,91,99'
(more sensitive but slower)
meta-large '--min-count 2 --k-list 27,37,47,57,67,77,87'
(large & complex metagenomes, like soil)
bulk '--min-count 3 --k-list 31,51,71,91,99 --no-mercy'
(experimental, standard bulk sequencing with >= 30x depth)
single-cell '--min-count 3 --k-list 21,33,55,77,99,121 --merge_level 20,0.96'
(experimental, single cell data)
Hardware options:
-m/--memory <float> max memory in byte to be used in SdBG construction;
default 0.9 (if set between 0-1, fraction of the
machine's total memory)
--mem-flag <int> SdBG builder memory mode, default 1
0: minimum; 1: moderate; others: use all memory
specified by '-m/--memory'.
--use-gpu use GPU
--gpu-mem <float> GPU memory in byte to be used. Default: auto detect
to use up all free GPU memory.
-t/--num-cpu-threads <int> number of CPU threads, at least 2.
Default: auto detect to use all CPU threads.
Output options:
-o/--out-dir <string> output directory, default ./megahit_out
--out-prefix <string> output prefix (the contig file will be
OUT_DIR/OUT_PREFIX.contigs.fa)
--min-contig-len <int> minimum length of contigs to output, default 200
--keep-tmp-files keep all temporary files
Other Arguments:
--continue continue a MEGAHIT run from its last available check point.
please set the output directory correctly when using this option.
-h/--help print the usage message
-v/--version print version
--verbose verbose mode