Tools‎ > ‎QIIME‎ > ‎

QIIME pre-processing

Raw data processing

1) Join forward and reverse reads


# Sequencing .fastq files are in folder RawData/ (two files R1 and R2 per sample)
ls Rawdata
  SampleA_L001_R1_001.fastq.gz
 SampleA
_L001_R2_001.fastq.gz
 SampleB
_L001_R1_001.fastq.gz
 SampleB_L001_R2_001.fastq.gz

# join forward and reverse reads (multiple samples)
multiple_join_paired_ends.py -i RawData -o JoinedReads

# Result: one folder per sample (each containing a file: fastqjoin.join.fastq)
ls JoinedReads
 SampleA_L001_R1_001
 SampleB_L001_R1_001



2) Quality filter

filter out low base quality and rename samples

split_libraries_fastq.py -i sequence-files --sample_ids new-sample-names -o SEQ/ -q 19 --barcode_type 'not-barcoded'

# Example (sample-list is separated by comma without space behind comma)
split_libraries_fastq.py -i JoinedReads/SampleA_L001_R1_001/fastqjoin.join.fastq,JoinedReads/SampleB_L001_R1_001/fastqjoin.join.fastq --sample_ids SampleA,SampleB -o SEQ/ -q 19 --barcode_type 'not-barcoded'

  -o SEQ/  - output: save results to folder  "SEQ"
  -q 19    - accept base quality Phred >= Q20
  --barcode_type 'not-barcoded'  - barcode not present in sequence (already removed)

# Result: all sequences in a single file seq.fna
SEQ/seqs.fna
 >SampleA_1 
 CCTACGGGAG...
 >SampleA_2
 CCTACGGGAG...


# check total number of sequences in file seqs.fna
cat SEQ/seqs.fna | grep '>' | wc -l
12517932