Monday, May 14, 2012

using perl hashes to handle tab-delimited files

I have two files: file 1 has 3 columns(SNP, Chromosome,position) and file 2 has 3 columns(Chromosome, peak_start and peak_end). All columns are numeric except for the SNP column.



If chromosome in file 1 matches file 2, i want to check if the position of the SNP lies in between peak_start and peak_end. If yes, show which SNP falls in which peak (preferably write output to a tab-delimited file). I would prefer to split the file, and use hashes where the chromosome is the key. I have found only a few questions remotely similar to this but i could not understand well the suggested solutions.
Here is the example of my code. It is only meant to illustrate my question and so far doesn't do anything so think of it as "pseudocode". Also, please forgive the formatting, this is my first time to ask a question here and i am a biologist trying to learn perl programming!



#!usr/bin/perl

use strict;
use warnings;

my (%peaks, %X81_05);
my @array;

#open file or die

unless (open (FIRST_SAMPLE, "X81_05.txt")) {
die "could not open X81_05.txt";
}

#split the tab-delimited file into respective fields

while (<FIRST_SAMPLE>) {

chomp $_;
next if (m/Chromosome/); #skip the header

@array = split("\t", $_);
($chr1,$pos,$sample) = @array;


$X81_05{'$array[0]'} = (

'position' =>'$array[1]'
)

}

close (FIRST_SAMPLE);

#open file using file handle
unless (open (PEAKS, "peaks.txt")) {
die "could not open peaks.txt";
}

my ($chr, $peak_start, $peak_end);


while (<PEAKS>) {
chomp $_;

next if (m/Chromosome/); #skip header
($chr, $peak_start, $peak_end) = split(/\t/);
$peaks{$chr}{'peak_start'} = $peak_start;
$peaks{$chr}{'peak_end'} = $peak_end;

}


close (PEAKS);

for my $chr1 (keys %X81_05) {
my $val = $X81_05{$chr1}{'position'};

for my $chr (keys %peaks) {
my $min = $peaks{$chr}{'peak_start'};

my $max = $peaks{$chr}{'peak_end'};

if ( ($val > $min) and ($val < $max) ) {
#print $val, " ", "lies between"," ", $min, " ", "and", " ", $max, "\n";

}else {
#print $val, " ", "does not lie between"," ", $min, " ", "and", " ", $max, "\n";
}
}
}


more awesome code





No comments:

Post a Comment