I have two files: file 1 has 3 columns(SNP, Chromosome,position) and file 2 has 3 columns(Chromosome, peak_start and peak_end). All columns are numeric except for the SNP column.
If chromosome in file 1 matches file 2, i want to check if the position of the SNP lies in between peak_start and peak_end. If yes, show which SNP falls in which peak (preferably write output to a tab-delimited file). I would prefer to split the file, and use hashes where the chromosome is the key. I have found only a few questions remotely similar to this but i could not understand well the suggested solutions.
Here is the example of my code. It is only meant to illustrate my question and so far doesn't do anything so think of it as "pseudocode". Also, please forgive the formatting, this is my first time to ask a question here and i am a biologist trying to learn perl programming!
#!usr/bin/perl
use strict;
use warnings;
my (%peaks, %X81_05);
my @array;
#open file or die
unless (open (FIRST_SAMPLE, "X81_05.txt")) {
die "could not open X81_05.txt";
}
#split the tab-delimited file into respective fields
while (<FIRST_SAMPLE>) {
chomp $_;
next if (m/Chromosome/); #skip the header
@array = split("\t", $_);
($chr1,$pos,$sample) = @array;
$X81_05{'$array[0]'} = (
'position' =>'$array[1]'
)
}
close (FIRST_SAMPLE);
#open file using file handle
unless (open (PEAKS, "peaks.txt")) {
die "could not open peaks.txt";
}
my ($chr, $peak_start, $peak_end);
while (<PEAKS>) {
chomp $_;
next if (m/Chromosome/); #skip header
($chr, $peak_start, $peak_end) = split(/\t/);
$peaks{$chr}{'peak_start'} = $peak_start;
$peaks{$chr}{'peak_end'} = $peak_end;
}
close (PEAKS);
for my $chr1 (keys %X81_05) {
my $val = $X81_05{$chr1}{'position'};
for my $chr (keys %peaks) {
my $min = $peaks{$chr}{'peak_start'};
my $max = $peaks{$chr}{'peak_end'};
if ( ($val > $min) and ($val < $max) ) {
#print $val, " ", "lies between"," ", $min, " ", "and", " ", $max, "\n";
}else {
#print $val, " ", "does not lie between"," ", $min, " ", "and", " ", $max, "\n";
}
}
}
more awesome code
No comments:
Post a Comment