Tools‎ > ‎Assembly‎ > ‎

MEGAHIT

"An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph"
"for assembling large and complex metagenomics data"
http://www.ncbi.nlm.nih.gov/pubmed/25609793

Download

https://github.com/voutcn/megahit

Example

Input: metagenomics sample as paired-end fastq files _1 and _2
./megahit -1 SAMPLE_1.fastq  -2 SAMPLE_2.fastq  -m 0.5  -t 12  -o megahit_result

  -m 0.5  use 50% of available memory (default: 90%,  -m 0.9)
  -t 12    use 12 threads (number of parallel processors)

Result: assembled contigs are in fasta file:
megahit_result/final.contigs.fa

Memory settings

https://github.com/voutcn/megahit/wiki/MEGAHIT-Memory-setting


Help

megahit -h

MEGAHIT v1.0.2

Copyright (c) The University of Hong Kong & L3 Bioinformatics Limited
contact: Dinghua Li <dhli@cs.hku.hk>

Usage:
 megahit [options] {-1 <pe1> -2 <pe2> | --12 <pe12> | -r <se>} [-o <out_dir>]

 Input options that can be specified for multiple times (supporting
 plain text and gz/bz2 extensions)
  -1        <pe1>   comma-separated list of fasta/q paired-end #1 files,
                    paired with files in <pe2>
  -2        <pe2>   comma-separated list of fasta/q paired-end #2 files,
                    paired with files in <pe1>
  --12      <pe12>  comma-separated list of interleaved fasta/q
                    paired-end files
  -r/--read  <se>   comma-separated list of fasta/q single-end files

 Input options that can be specified for at most ONE time (not recommended):
  --input-cmd  <cmd>   command that outputs fasta/q reads to stdout;
                       taken by MEGAHIT as SE reads

Optional Arguments:
 Basic assembly options:
  --min-count <int> minimum multiplicity for filtering (k_min+1)-mers, default 2
  --k-min     <int> minimum kmer size (<= 127), must be odd number, default 21
  --k-max     <int> maximum kmer size (<= 127), must be odd number, default 99
  --k-step    <int> increment of kmer size of each iteration (<= 28),
                    must be even number, default 20
  --k-list    <int,int,..> comma-separated list of kmer size (all must be odd,
                           in the range 15-127, increment <= 28);
                           override `--k-min', `--k-max' and `--k-step'

 Advanced assembly options:
  --no-mercy                do not add mercy kmers
  --no-bubble               do not merge bubbles
  --merge-level     <l,s>   merge complex bubbles of length <= l*kmer_size
                            and similarity >= s, default 20,0.98
  --prune-level     <int>   strength of local low depth pruning (0-2), default 2
  --low-local-ratio <float> ratio threshold to define low local coverage
                            contigs, default 0.2
  --max-tip-len     <int>   remove tips less than this value; default 2*k for
                            iteration of kmer_size=k
  --no-local                disable local assembly
  --kmin-1pass              use 1pass mode to build SdBG of k_min

 Presets parameters:
  --presets <str>  override a group of parameters;
    possible values:
    meta           '--min-count 2 --k-list 21,41,61,81,99'
          (generic metagenomes, default)
    meta-sensitive '--min-count 2 --k-list 21,31,41,51,61,71,81,91,99'
          (more sensitive but slower)
    meta-large     '--min-count 2 --k-list 27,37,47,57,67,77,87'      
          (large & complex metagenomes, like soil)
    bulk           '--min-count 3 --k-list 31,51,71,91,99 --no-mercy' 
          (experimental, standard bulk sequencing with >= 30x depth)
    single-cell '--min-count 3 --k-list 21,33,55,77,99,121 --merge_level 20,0.96'
          (experimental, single cell data)

 Hardware options:
  -m/--memory  <float> max memory in byte to be used in SdBG construction;
                       default 0.9 (if set between 0-1, fraction of the
                       machine's total memory)
  --mem-flag   <int>   SdBG builder memory mode, default 1
                       0: minimum; 1: moderate; others: use all memory
                       specified by '-m/--memory'.
  --use-gpu            use GPU
  --gpu-mem   <float>  GPU memory in byte to be used. Default: auto detect
                       to use up all free GPU memory.
  -t/--num-cpu-threads  <int>  number of CPU threads, at least 2.
                               Default: auto detect to use all CPU threads.

 Output options:
  -o/--out-dir   <string>  output directory, default ./megahit_out
  --out-prefix   <string>  output prefix (the contig file will be
                           OUT_DIR/OUT_PREFIX.contigs.fa)
  --min-contig-len  <int>  minimum length of contigs to output, default 200
  --keep-tmp-files         keep all temporary files


 Other Arguments:
  --continue    continue a MEGAHIT run from its last available check point.
                please set the output directory correctly when using this option.

  -h/--help     print the usage message
  -v/--version  print version
  --verbose     verbose mode