Hi everybody,
I used AB SOLID to sequence smallRNAs, especially miRNAs, and I want to obtain ~ 18-25nt length sequences. For that, I try some softwares to map the color reads against a genome reference (hundreds of thousands contigs).
I tried SHRIMP, but it can’t map under 24nt length. Here are the commands I used:
./utils/splitreads.py 250000 reads1_bcSample1_F3.csfasta
./bin/rmapper-cs -P 0_to_249999.csfasta genome.fasta > 0_to_249999.csfasta.lib1.out
At the end, only 3.5% of the reads were mapped on the reference genome, which is expected since the large majority of my smallRNAs are under 24nt.
I also tried MAQ, but results were poor. I tried with those commands:
perl solid2fastq.pl reads1_bcSample1_ lib1;
gunzip lib1.single.fastq.gz;
maq fastq2bfq -n 1000000 lib1.single.fastq lib1.bfq;
maq fasta2csfa genome.fasta > genome.csfa;
maq fasta2bfa genome.csfa genome.csbfa;
for f in *.bfq;do maq map –d Primer.lib1.txt -c $f.aln.cs.map genome.csbfa $f 2> aln.log;done;
maq mapmerge lib1.map $(ls *aln.cs.map);
maq mapview lib1.map > lib1.map.view.txt
Results are weird, because I just obtain 35nt length sequences, and just 0.5% of reads are mapped.
In SHRIMP or MAQ, I never see any traces of adapters, and I shouldn’t since adapters are not supposed to map on the genome. Also, using short seeds didn’t improve results.
Others exists, and recommended by some people on this forum, like Bfast, ABySS, but both are limited, and can’t map less than 25 nt. There is also BLAT, SOAP, BWA, etc… but I didn’t try them. According to SOAP article, they claim it could map between 18 to 26bp.
So, for a lot of people, I read MAQ is the reference, but I don’t understand why I have bad results with it… And also, did someone ever try SOAP?
Thank you all
Mike
I used AB SOLID to sequence smallRNAs, especially miRNAs, and I want to obtain ~ 18-25nt length sequences. For that, I try some softwares to map the color reads against a genome reference (hundreds of thousands contigs).
I tried SHRIMP, but it can’t map under 24nt length. Here are the commands I used:
./utils/splitreads.py 250000 reads1_bcSample1_F3.csfasta
./bin/rmapper-cs -P 0_to_249999.csfasta genome.fasta > 0_to_249999.csfasta.lib1.out
At the end, only 3.5% of the reads were mapped on the reference genome, which is expected since the large majority of my smallRNAs are under 24nt.
I also tried MAQ, but results were poor. I tried with those commands:
perl solid2fastq.pl reads1_bcSample1_ lib1;
gunzip lib1.single.fastq.gz;
maq fastq2bfq -n 1000000 lib1.single.fastq lib1.bfq;
maq fasta2csfa genome.fasta > genome.csfa;
maq fasta2bfa genome.csfa genome.csbfa;
for f in *.bfq;do maq map –d Primer.lib1.txt -c $f.aln.cs.map genome.csbfa $f 2> aln.log;done;
maq mapmerge lib1.map $(ls *aln.cs.map);
maq mapview lib1.map > lib1.map.view.txt
Results are weird, because I just obtain 35nt length sequences, and just 0.5% of reads are mapped.
In SHRIMP or MAQ, I never see any traces of adapters, and I shouldn’t since adapters are not supposed to map on the genome. Also, using short seeds didn’t improve results.
Others exists, and recommended by some people on this forum, like Bfast, ABySS, but both are limited, and can’t map less than 25 nt. There is also BLAT, SOAP, BWA, etc… but I didn’t try them. According to SOAP article, they claim it could map between 18 to 26bp.
So, for a lot of people, I read MAQ is the reference, but I don’t understand why I have bad results with it… And also, did someone ever try SOAP?
Thank you all
Mike
Comment