Seqanswers Leaderboard Ad

**rbagnall** · 06-16-2013, 02:23 AM

Hi YOLO69SWAG,

I assume you are staring with single end reads, not paired end as stated. Trimmomatic may be able to split 100bp single end reads into two 50bp read files.

USADELLAB.org - Trimmomatic: A flexible read trimming tool for Illumina NGS data

http://www.usadellab.org/cms/index.php?page=trimmomatic

First run 'crop' with 50 for the 'forward' reads:

CROP:<length>
length: The number of bases to keep, from the start of the read.

Then run headcrop with 50 for the reverse reads:

HEADCROP:<length>
length: The number of bases to remove from the start of the read.

What I am not so sure about is turning the /1 into /2 in the 'headcrop' reads. perhaps try:

sed 's/\/1$/\/2/g' headcrop.fastq >ending_in_2_headcrop.fastq

which should turn all lines ending in /1 into /2

One other consideration; does the headcropped sequence need to be reverse complimented? And another comment is that finding 'real' indels of hundreds of kilobases will be difficult - perhaps look into pindel or breakdancer or both.

**mastal** · 06-16-2013, 03:01 AM

Splitting Sequencing Files

yes, you would need to reverse complement the sequences for R2, and reverse the string with the base qualities.

**YOLO69SWAG** · 06-16-2013, 09:05 AM

Thanks

Thanks,

I did start with a paired end experiment, but I'm just showing one of the output files (the one from the first reaction). I've got another file where it is the second reaction and the names end in /2. I hope I'm not missing something.....

I will try out those programs now and try to write something to reverse compliment and rearrange the confidence scores. If I can't do it, there's always fivver

Thanks again.

If anyone really fluent in code sees this, feel free to whip out something for me to try.

**rbagnall** · 06-16-2013, 01:02 PM

Gaaaaarrrrrhhhh, STOP!!!

You already have paired end read data. The first file contains the forward reads and the second file contains the reverse reads.

**Jeremy** · 06-16-2013, 07:30 PM

Splitting the files like that should not help much and may even result in less accurate mapping, if you do expect an indel in the middle of a read you should be able to easily identify it by mapping the 100 bp paired end reads anyway, one pair will partially map and the paired read will map a large distance away.

**YOLO69SWAG** · 06-17-2013, 09:17 AM

Yes I do have the paired end data, but I need to find the junction reads to finish quantification of this phenomena. The reads containing the real indels are high enough quality to map correctly since I can blast them manually on my own.

**Jeremy** · 06-17-2013, 06:02 PM

Maybe you should look at a viewer program such as IGV.

**JamieHeather** · 06-20-2013, 03:42 AM

Also that sed command above will change some of the quality scores, as "/1" is a legitimate quality string.

**YOLO69SWAG** · 08-01-2013, 01:22 AM

Maybe I should just write my own programs to do what I need since none of these suggestions or structural variant programs are any use. Oh wait, I did. Message me if you want it.

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Yesterday, 06:35 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 21 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 18 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Splitting Sequencing Files

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News