Tuesday, August 20, 2013

Extracting and displaying the Astronomy Photo of the Day in a server.


I am having problems presenting this article. Please be patient.

The most astounding photos in astronomy are posted daily in the NASA site
http://apod.nasa.gov The url of the photo
however do not have the same value everyday, say "apod.jpg", instead the
filename is like "http://apod.nasa.gov/apod/image/1308/sunvenusuv3_dove_960.jpg".
This filename is embedded in the redirected url
"http://apod.nasa.gov/apod/astropix.html".


The image filename is enclosed in an IMG tag.Our hope is that all other html
containing the image file has similar html template.
So we read in this file, extract the filename and build up the image url.
We then use the Linux utility wget to download the image file, then move it
to a fixed destination in our server with a fixed filename apod.jpg.
This way we only "disturb" the NASA site once each day instead of linking
each time a browser visits our weather page which hosts the image.

This is our first version. Python has changed much with new recommended
libraries and we will update the following code to make it more readable and
"Pythonic". We are open to experiment with the regular expression library,
and use the Requests module.


#!/usr/bin/env python

import urllib
import os

 
src = "http://apod.nasa.gov/apod/astropix.html"
strstart="<img SRC=\""
l = len(strstart)

contents = urllib.urlopen(src).read()
startpos = contents.find(strstart)
print "parsing and determining graphics file name"
if startpos:
   endpos = startpos + contents[startpos:].find(".jpg\"")+len(".jpg\"")
   imgurl = "http://apod.nasa.gov/apod/"+contents[startpos+l:endpos-1]
   

   print "copying to destination directory"
   os.system("sudo wget -ct 0 %s" % imgurl)  
   srcfile   = imgurl[imgurl.rfind("/")+1:]

   destfile = "/var/www/xxxxx/xxxx/xxx/apod.jpg"

   #print "storing image file to apod.jpg"
   os.system("sudo cp  %s %s" % (srcfile, destfile))
To see how it works out, please click on our weather http://adorio-research.org/wordpress/?page_id=5833 page.

No comments:

Post a Comment