split_libraries_fastq.py – This script performs demultiplexing of Fastq sequence data where barcodes and sequences are contained in two separate fastq files (common on Illumina runs).¶
Description:
Usage: split_libraries_fastq.py [options]
Input Arguments:
Note
[REQUIRED]
- -i, --sequence_read_fps
- The sequence read fastq files (comma-separated if more than one)
- -o, --output_dir
- Directory to store output files
- -m, --mapping_fps
- Metadata mapping files (comma-separated if more than one)
[OPTIONAL]
- -b, --barcode_read_fps
- The barcode read fastq files (comma-separated if more than one) [default: None]
- --store_qual_scores
- Store qual strings in .qual files [default: False]
- --sample_ids
- Comma-separated list of samples id to be applied to all sequences, must be one per input file path (used when data is not multiplexed) [default: None]
- --store_demultiplexed_fastq
- Write demultiplexed fastq files [default: False]
- --retain_unassigned_reads
- Retain sequences which don’t map to a barcode in the mapping file (sample ID will be “Unassigned”) [default: False]
- -r, --max_bad_run_length
- Max number of consecutive low quality base calls allowed before truncating a read [default: 3]
- -p, --min_per_read_length_fraction
- Min number of consecutive high quality base calls to include a read (per single end read) as a fraction of the input read length [default: 0.75]
- -n, --sequence_max_n
- Maximum number of N characters allowed in a sequence to retain it – this is applied after quality trimming, and is total over combined paired end reads if applicable [default: 0]
- -s, --start_seq_id
- Start seq_ids as ascending integers beginning with start_seq_id [default: 0]
- --rev_comp_barcode
- Reverse complement barcode reads before lookup [default: False]
- --rev_comp_mapping_barcodes
- Reverse complement barcode in mapping before lookup (useful if barcodes in mapping file are reverse complements of golay codes) [default: False]
- --rev_comp
- Reverse complement sequence before writing to output file (useful for reverse-orientation reads) [default: False]
- -q, --phred_quality_threshold
- The maximum unacceptable Phred quality score (e.g., for Q20 and better, specify -q 19) [default: 3]
- --last_bad_quality_char
- DEPRECATED: use -q instead. This method of setting is not robust to different versions of CASAVA.
- --barcode_type
- The type of barcode used. This can be an integer, e.g. for length 6 barcodes, or golay_12 for golay error-correcting barcodes. Error correction will only be applied for golay_12 barcodes. [default: golay_12]
- --max_barcode_errors
- Maximum number of errors in barcode [default: 1.5]
- --phred_offset
- The ascii offset to use when decoding phred scores - warning: in most cases you don’t need to pass this value [default: determined automatically]
Output:
Demultiplex and quality filter (at Phred >= Q20) one lane of Illumina fastq data and write results to ./slout_q20.:
split_libraries_fastq.py -i lane1_read1.fastq.gz -b lane1_barcode.fastq.gz --rev_comp_mapping_barcodes -o slout_q20/ -m map.txt -q 19
Demultiplex and quality filter (at Phred >= Q20) one lane of Illumina fastq data and write results to ./slout_q20. Store trimmed quality scores in addition to sequence data.:
split_libraries_fastq.py -i lane1_read1.fastq.gz -b lane1_barcode.fastq.gz --rev_comp_mapping_barcodes -o slout_q20/ -m map.txt --store_qual_scores -q 19
Demultiplex and quality filter (at Phred >= Q20) two lanes of Illumina fastq data and write results to ./slout_q20.:
split_libraries_fastq.py -i lane1_read1.fastq.gz,lane2_read1.fastq.gz -b lane1_barcode.fastq.gz,lane2_barcode.fastq.gz --rev_comp_mapping_barcodes -o slout_q20/ -m map.txt,map.txt --store_qual_scores -q 19
Quality filter (at Phred >= Q20) one non-multiplexed lane of Illumina fastq data and write results to ./slout_single_sample_q20.:
split_libraries_fastq.py -i lane1_read1.fastq.gz --sample_id my.sample -o slout_single_sample_q20/ -m map_not_multiplexed.txt -q 19 --barcode_type 'not-barcoded'
Quality filter (at Phred >= Q20) one non-multiplexed lane of Illumina fastq data and write results to ./slout_single_sample_q20.:
split_libraries_fastq.py -i lane1_read1.fastq.gz --sample_id my.sample.1 -o slout_single_sample_q20/ -m map_not_multiplexed.txt -q 19 --barcode_type 'not-barcoded'
Quality filter (at Phred >= Q20) two non-multiplexed lanes of Illumina fastq data with different samples in each and write results to ./slout_not_multiplexed_q20.:
split_libraries_fastq.py -i lane1_read1.fastq.gz,lane2_read1.fastq.gz --sample_id my.sample.1,my.sample.2 -o slout_not_multiplexed_q20/ -m map_not_multiplexed.txt -q 19 --barcode_type 'not-barcoded'