Hi All,
I have a bunch of paired-end reads and I aligned them against a custom database (viruses) using LASTAL:
I then used the "last-pair-probs" script to create a single maf file containing the entries:
Followed by conversion into SAM format (to get the reads):
So far, so good, except I think the SAM file doesn't correctly assign the two mates. Here's what the resultant SAM file looks like:
As you can see, the two mates are in there, but when I run picard to convert to FASTQ format (required for downstream analyses), then I only get the first mate ("G771_2.sub.sam.reads1.fastq " containing all /1 and /2 reads), but not the second ("G771_2.sub.sam.reads.reads2.fastq" - this file is empty):
Any thoughts on how to resolve this? In other word, how to take a bunch of paired-end reads, align them against a database using LASTAL followed by extraction of the aligned (or for other purposes - unaligned) reads in the correct mate pairs?
Thanks very much in advance.
I have a bunch of paired-end reads and I aligned them against a custom database (viruses) using LASTAL:
Code:
lastal -Q1 /idi/sabetilab/kandersen/references/blast/arena G771_2.reads1.sub.fastq | maf-sort.sh -n2 > G771_2.reads1.sub.maf lastal -Q1 /idi/sabetilab/kandersen/references/blast/arena G771_2.reads2.sub.fastq | maf-sort.sh -n2 > G771_2.reads2.sub.maf
Code:
last-pair-probs.py G771_2.reads1.sub.maf G771_2.reads2.sub.maf > G771_2.sub.maf
Code:
maf-convert.py sam G771_2.sub.maf > G771_2.sub.sam
Code:
@HD VN:1.3 SO:unknown D0N2CACXX120229:2:1101:21311:18913/1 0 LASV-G771-S-Sierra_Leone-2010H 14 24 101M * 0 0 CCTAGGCATTTTTGGTTGCGCAATTCAAGTGT CCTATTTAAAATGGGACAGATAGTGACATTCTTCCAGGAAGTGCCTCATGTAATAGAAGAGGTGATGAA @@@FFFDEHHHHFEGIJGBFHIIGG>DD>GIGGIDDFEEF@>DG>DC3BF;=<@FC@F@FGGC;D@=>AEA;?)7?.;@3>CE;>>;( 353559@A##### NM:i:0 AS:i:584 D0N2CACXX120229:2:1101:21311:18913/2 16 LASV-G771-S-Sierra_Leone-2010H 646 24 101M * 0 0 TGGTATTTACATTGCTCTTGACTCAGGCCGTG ACCGGTGGGACTGTATTATGACTAGTTATCAATATCTGATAATCCAAAATACGACCTGGGAAGATCACT ################@;@>5==A?DFEECA/@E@BF@=B0?4B4DFDB<??>HCFB<<D<>G?9D?ED?GFEE@GEFJIEEGGBAHF
Code:
SamToFastq.jar INPUT=G771_2.sub.sam FASTQ=G771_2.sub.sam.reads1.fastq SECOND_END_FASTQ=G771_2.sub.sam.reads.reads2.fastq VALIDATION_STRINGENCY=SILENT
Code:
@D0N2CACXX120229:2:1101:21311:18913/1 CCTAGGCATTTTTGGTTGCGCAATTCAAGTGTCCTATTTAAAATGGGACAGATAGTGACATTCTTCCAGGAAGTGCCTCATGTAATAGAAGAGGTGATGAA + @@@FFFDEHHHHFEGIJGBFHIIGG>DD>GIGGIDDFEEF@>DG>DC3BF;=<@FC@F@FGGC;D@=>AEA;?)7?.;@3>CE;>>;(353559@A##### @D0N2CACXX120229:2:1101:21311:18913/2 AGTGATCTTCCCAGGTCGTATTTTGGATTATCAGATATTGATAACTAGTCATAATACAGTCCCACCGGTCACGGCCTGAGTCAAGAGCAATGTAAATACCA + @?@DDDDDHDCDFFHABGGEEIJFEG@EEFG?DE?D9?G><D<<BFCH>??<BDFD4B4?0B=@FB@E@/ACEEFD?A==5>@;@################
Thanks very much in advance.
Comment