Hi everyone!
I have to convert some RNA-Seq bam files into corrensponding paired-end fastq files.
I tried to use "samtools view" and Picard "SamToFastq"
It resulted in this error:
and empty fastq files.
This is the sample.sam
I understood there are some lines with MRNM not specified, such as:
but I don't understand why I cannot retrieve the other corrected reads in output fastq files.
I also tried to include these two options in Piacard SamToFastq
and it resulted in all reads unpaired and empty fastq files.
Then I tried with another tool, TopHat2 bam2fastqx.
First, I sorted sample.bam by chr name
resulting in
and then I used TopHat2 bam2fastx
resulting in this error
Could someone explain this issue? Have you got any suggestion?
Thanks!
I have to convert some RNA-Seq bam files into corrensponding paired-end fastq files.
I tried to use "samtools view" and Picard "SamToFastq"
Code:
samtools view -h -o sample.sam sample.bam
Code:
java -jar SamToFastq.jar INPUT=sample.sam FASTQ=sample_1.fastq SECOND_END_FASTQ=sample_2.fastq
Code:
Error parsing text SAM file. MRNM not specified but flags indicate mate mapped
This is the sample.sam
Code:
@HD VN:1.0 SO:unsorted @SQ SN:chr1 LN:249250621 @SQ SN:chr10 LN:135534747 @SQ SN:chr11 LN:135006516 @SQ SN:chr12 LN:133851895 @SQ SN:chr13 LN:115169878 @SQ SN:chr14 LN:107349540 @SQ SN:chr15 LN:102531392 @SQ SN:chr16 LN:90354753 @SQ SN:chr17 LN:81195210 @SQ SN:chr18 LN:78077248 @SQ SN:chr19 LN:59128983 @SQ SN:chr2 LN:243199373 @SQ SN:chr20 LN:63025520 @SQ SN:chr21 LN:48129895 @SQ SN:chr22 LN:51304566 @SQ SN:chr3 LN:198022430 @SQ SN:chr4 LN:191154276 @SQ SN:chr5 LN:180915260 @SQ SN:chr6 LN:171115067 @SQ SN:chr7 LN:159138663 @SQ SN:chr8 LN:146364022 @SQ SN:chr9 LN:141213431 @SQ SN:chrM_rCRS LN:16569 @SQ SN:chrX LN:155270560 @SQ SN:chrY LN:59373566 @RG ID:110624_UNC14-SN744_0134_AD0CVTABXX_8_ PL:illumina PU:barcode LB:TruSeq SM:110624_UNC14-SN744_0134_AD0CVTABXX_8_ UNC14-SN744_134:8:2102:15138:99673/2 147 chr7 99998918 69 42M2357N8M = 99998683 -2642 CCAAGGCCTTGCTCTGGGGAGCTTTAAATTTTTTCTTAGGGCTGTTTTCT IIIGHGGGIJIIIGHAHGHH@JIGJJJIHEJJIJJJJHHHHHFFFFFCCC XF:Z:CTAC, RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:0 XS:A:- UNC14-SN744_134:8:2101:1447:161692/2 147 chr7 99998918 69 42M2357N8M = 99998797 -2528 CCAAGGCCTTGCTCTGGGGAGCTTTAAATTTTTTCTTAGGGCTGTTTTCT HGHGHDCHGGGIIDGIIHIHEIHGGJIGGHIIIJJJJHHGHHFFFFF@C@ XF:Z:CTAC, RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:0 XS:A:- UNC14-SN744_134:8:2207:13624:39322/2 147 chr7 99998920 69 40M2357N10M = 99998689 -2638 AAGGCCTTGCTCTGGGGAGCTTTAAATTTTTTCTTAGGGCTGTTTTCTCT @HF<JIGIJIHCCGD9CIGIHGGJIGDJIGJJJJJJJHHGHHEDDDFCB@ XF:Z:CTAC, RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:0 XS:A:- UNC14-SN744_134:8:2108:11461:118679/2 163 chr7 99998929 60 31M2357N19M = 100001809 2930 CTTTGGGGAGCTTTAAATTTTTTCTTAGGGCTGTTTTCTCTCCTTCCTCC CCCFFFFFFHHHHJJJJJJJJJIJJIJJJJJJIHIJJIJJIJJJJIJDIH XF:Z:CTAC, RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:1 XS:A:- UNC14-SN744_134:8:1107:2904:31086/1 99 chr7 99998929 60 31M2357N19M = 100001809 2930 CTTTGGGGAGCTTTAAATTTTTTCTTAGGGCTGTTTTCTCTCCTTCCTCC BCCFFFFFFHHHHJJJJJJJJJJJJJJIJIJJGHHJJJJJJJJJJJJJJJ XF:Z:CTAC, RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:1 XS:A:- UNC14-SN744_134:8:2107:8382:2405/1 83 chr7 99998936 69 24M2357N26M = 99998696 -2647 GAGCTTTAAATTTTTTCTTAGGGCTGTTTTCTCTCCTTCCTCCTTTTCCA JJJIIJJJJIGGJJJJJIJJJJJJJIJJJIHHJIJJJHHHFHFFFFDCCB XF:Z:CTAC, RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:0 XS:A:- UNC14-SN744_134:8:2106:3457:77846/1 83 chr7 99999623 69 42M474N8M = 99998870 -1277 TCCTGCCTCGGCCATCTGCTGTGCCTGCATCACCCCCAAGCCCTCTTGGC DDDDDFHJJJJJJJJJJJJJJIJJJJJIGGD?JJJJJHHHHHFFFFFCCC XF:Z:CTAC, RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:0 XS:A:- UNC14-SN744_134:8:2107:3652:145199/2 163 chr7 99999624 69 41M474N9M = 100001398 2216 CCTGCCTCGGCCATCTGCTGTGCCTGCATCACCCCCAAGCCCTCTTGGCT CCCFFFFFFGHHHJJJIJJJIJJJJJJJJGHIJGIEIGHIJJJIJIJGIG XF:Z:CTAC, RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:0 XS:A:- UNC14-SN744_134:8:1201:13771:91534/2 163 chr7 99999624 69 41M474N9M = 100001333 1759 CCTGCCTCGGCCATCTGCTGTGCCTGCATCACCCCCAAGCCCTCTTGGCT BCCFFFFFHHGHHJHJFIJIGHIIHGIIHIIEIHHHIIJJIJIIGCGIIG XF:Z:CTAC, RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:0 XS:A:- UNC14-SN744_134:8:2103:11276:160481/1 83 chr7 99999642 69 23M474N27M = 99998948 -1218 TGTGCCTGCATCACCCCCAAGCCCTCTTGGCTTGGTTTTTTGGGTCTGTA DEBFFFFHFHEB;IIIIIJJIJGGEIIJJIJIJJJIFHHHGHFFFFFCCC XF:Z:CTAC, RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:0 XS:A:-
Code:
UNC14-SN744_134:8:2206:10660:87358/2 145 chr7 100001077 60 50M * 0 0 ATCCGCTTCCCTCGGCCTCCCAAAGTGCTGGGATCACAGGCGTGAGCCAC 9:BBAF@5'HEAIJGIGEHF<HEBA;D@?HHGGBCA@AD<?4;FFFF@BB RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:1
I also tried to include these two options in Piacard SamToFastq
Code:
INCLUDE_NON_PF_READS=TRUE VALIDATION_STRINGENCY=SILENT
Then I tried with another tool, TopHat2 bam2fastqx.
First, I sorted sample.bam by chr name
Code:
samtools sort -n sample.bam sample_sn
Code:
@HD VN:1.0 SO:unsorted @SQ SN:chr1 LN:249250621 @SQ SN:chr10 LN:135534747 @SQ SN:chr11 LN:135006516 @SQ SN:chr12 LN:133851895 @SQ SN:chr13 LN:115169878 @SQ SN:chr14 LN:107349540 @SQ SN:chr15 LN:102531392 @SQ SN:chr16 LN:90354753 @SQ SN:chr17 LN:81195210 @SQ SN:chr18 LN:78077248 @SQ SN:chr19 LN:59128983 @SQ SN:chr2 LN:243199373 @SQ SN:chr20 LN:63025520 @SQ SN:chr21 LN:48129895 @SQ SN:chr22 LN:51304566 @SQ SN:chr3 LN:198022430 @SQ SN:chr4 LN:191154276 @SQ SN:chr5 LN:180915260 @SQ SN:chr6 LN:171115067 @SQ SN:chr7 LN:159138663 @SQ SN:chr8 LN:146364022 @SQ SN:chr9 LN:141213431 @SQ SN:chrM_rCRS LN:16569 @SQ SN:chrX LN:155270560 @SQ SN:chrY LN:59373566 @RG ID:110624_UNC14-SN744_0134_AD0CVTABXX_8_ PL:illumina PU:barcode LB:TruSeq SM:110624_UNC14-SN744_0134_AD0CVTABXX_8_ UNC14-SN744_134:8:1101:1284:144798/1 83 chr7 100276741 21 50M = 100276627 -164 ATTTTTATTATATTTTCAGTTTTTCCATAAAGGAGCCAATTCCAACNCTG ###############################################CC@ RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8IH:i:1 HI:i:1 NM:i:1 UNC14-SN744_134:8:1101:1284:144798/2 163 chr7 100276627 59 50M = 100276741 164 CAGGAGGCCCTCATCCTTCTGCTGCCCTGGCGTTGGGGCCTCACCCCTCT BCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJIJJHHIIHIJJJJJJJJJ RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8IH:i:1 HI:i:1 NM:i:1 UNC14-SN744_134:8:1101:1295:171825/1 99 chr7 100210452 69 50M = 100210588 383 GTCCGGGGCCCCCTGGGCGGGGGTCCCGGGGCGCCCCTCCTCCCTTGGGA @@BFF>DFHHHGHIJJIJJJJDD7@BBDDBBDBBBDDDDDDDDD8@CCD8 RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8IH:i:1 HI:i:1 NM:i:0 UNC14-SN744_134:8:1101:1295:171825/2 147 chr7 100210588 69 32M197N18M = 100210452 -383 TAACCCCACAGGAACTGCGCTTCGCTTCCGAGTCCTGTGCACAGCACCTG AHGIIHHFGIIJJIJJJIIFCAJGHGGJJIGGGIIGGAHHHHDDD=F@@B XF:Z:GTAG, RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:0 XS:A:+ UNC14-SN744_134:8:1101:1296:110092/1 65 chr7 100417813 52 50M * 0 0 CGGCACTGGCAGACGGCTGATCCAATGGTGTTAGAGTGGCTAATAGCTGG @@@DDDDDHHHHFGADG@AGCBH*?:9D*::B>DHGBFHD9?B####### RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:2 UNC14-SN744_134:8:1101:1296:110092/2 129 chr7 100417873 57 50M * 0 0 CAGGACCCTTCTCCTGACAGGGGCTTGAAGGTGCCCTGGGCACTGGCAGG CCCFFFFFHHHHHJJJGHIJJJJJJIA>GDH?BBHHBDGGB>B98B#### RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:3 UNC14-SN744_134:8:1101:1298:165228/1 83 chr7 100463356 69 50M = 100459519 -3887 ACACGTTGGTCCTAGGTTTCTACGATGACGCTCCACCGCAGGACCATTTC IGGJJJJIJJJIJIJJJJGJJIIJJJJJJJJIIHEIIHHHHHFFFFF@@B RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8IH:i:1 HI:i:1 NM:i:0 UNC14-SN744_134:8:1101:1298:165228/2 163 chr7 100459519 69 15M769N35M = 100463356 3887 CCCTGGGAGACCTCGACTCCCTGCCCTCGGACCCTGTACAGCCGCAGTAT CCCFFFFFHHHHHJIIJJJJIIJJJIJJJJJJJJJJHIHIJCHJIHIHHE XF:Z:GTAG, RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8_ IH:i:1 HI:i:1 NM:i:0 XS:A:+ UNC14-SN744_134:8:1101:1306:60600/1 99 chr7 100417799 69 50M = 100419893 2144 GGAAGTACCCGACGCGGCACTGGCAGACGGCTGATCCAATGGTGTTAGAG BCCFFDDFHHHFHJJJJJJJGIJIIJ;F@FA@B=ACH;B;@C);.;;>C> RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8IH:i:1 HI:i:1 NM:i:0 UNC14-SN744_134:8:1101:1306:60600/2 147 chr7 100419893 69 50M = 100417799 -2144 CTCGGCACTTGGTGTTCCCCTCAGCTGCCTCGAACCCCGGAGCACAGCTG <B>HHECHFIIIHCHGIIIGGEIIJIIJJIJIHFJJJHHHHHFDFFFCCC RG:Z:110624_UNC14-SN744_0134_AD0CVTABXX_8IH:i:1 HI:i:1 NM:i:0
Code:
bam2fastx -q -A -o sample.fastq -P -N sample_sn.bam
Code:
Error: couldn't retrieve both reads for pair UNC14-SN744_134:8:1101:1284:144798/1. Perhaps the input file is not sorted by name? (using 'samtools sort -n' might fix this)
Thanks!
Comment