Basically I have these files (medline from NCBI). Each is associated with a journal title. Each has 0, 1 or more genbank identification numbers (GBIDs). I can associate the number of GBIDs per file with each journal name. My problem is that I may have more than one file associated with the same journal, and I don't know how to add the number of GBIDs per file into a total number of GBIDs per journal.
My current code: 
 jt stands for journal title, pulled out properly from the file. GBIDs are added to the count as encountered. 
... up to this point, the first search is performed, each "pmid" you can think of 
   as a single file, so each "fetch" goes through all the files one at a time...
  pmid_list.each do |pmid|
 
    ncbi_fetch.pubmed(pmid, "medline").each do |pmid_line|
 
     if pmid_line =~ /JT.+- (.+)\n/
         jt = $1
         jt_count = 0
         jt_hash[jt] = jt_count
 
         ncbi_fetch.pubmed(pmid, "medline").each do |pmid_line_2|
 
             if pmid_line_2 =~ /SI.+- GENBANK\/(.+)\n/
                 gbid = $1
                 jt_count += 1
                 gbid_hash["#{gbid}\n"] = nil
             end 
         end 
 
         if jt_count > 0
             puts "#{jt} = #{jt_count}"
 
         end
     end
   end
 end
 My result:
 Your search returned 192 results.
  Virology journal = 8
  Archives of virology = 9
  Virus research = 1
  Archives of virology = 6
  Virology = 1
 Basically, how do I get it to say Archives of virology = 15, but for any journal title? I tried a hash, but the second archives of virology just overwrote the first... is there a way to make two keys add their values in a hash?
Full code:
 #!/usr/local/bin/ruby
 
  require 'rubygems'
  require 'bio'
 
 
 Bio::NCBI.default_email = 'kepresto@uvm.edu'
 
 ncbi_search = Bio::NCBI::REST::ESearch.new
 ncbi_fetch = Bio::NCBI::REST::EFetch.new
 
 
 print "\nQuery?\s" 
 
 query_phrase = gets.chomp
 
 "\nYou said \"#{query_phrase}\". Searching, please wait..."
 
 pmid_list = ncbi_search.search("pubmed", "#{query_phrase}", 0)
 
 puts "\nYour search returned #{pmid_list.count} results."
 
 if pmid_list.count > 200
 puts "\nToo big."
 exit
 end
 
 gbid_hash = Hash.new
 jt_hash = Hash.new(0)
 
 
 pmid_list.each do |pmid|
 
 ncbi_fetch.pubmed(pmid, "medline").each do |pmid_line|
 
     if pmid_line =~ /JT.+- (.+)\n/
         jt = $1
         jt_count = 0
         jt_hash[jt] = jt_count
 
         ncbi_fetch.pubmed(pmid, "medline").each do |pmid_line_2|
 
             if pmid_line_2 =~ /SI.+- GENBANK\/(.+)\n/
                 gbid = $1
                 jt_count += 1
                 gbid_hash["#{gbid}\n"] = nil
             end 
         end 
 
         if jt_count > 0
             puts "#{jt} = #{jt_count}"
 
         end
         jt_hash[jt] += jt_count
     end
 end
 end
 
 
 jt_hash.each do |key,value|
 # if value > 0
     puts "Journal: #{key} has #{value} entries associtated with it. "
 # end
 end
 
 # gbid_file = File.open("temp_*.txt","r").each do |gbid_count|
 #   puts gbid_count
 # end
  
No comments:
Post a Comment