Wednesday, August 31, 2011

Python: Sampling from a normal population

Here is code to create N samples of size n from a normal population with mean mu and standard deviation sigma.
"""
File             normalsim.py
Author           Dr. Ernesto P. Adorio
Desc             Sampling from the normal distribution.
Version          0.0.1 Sep. 1, 2011
"""



import numpy as np
import scipy.stats as stat
from math import sqrt


def  normalsim(mu, sigma, n, N, statistic="mean"):
    out = [0] * N
    for i in range(N):
        sample = stat.norm.rvs(mu, sigma, size=n)
        #print sample, 
        if statistic=="mean":
           out[i] = np.mean(sample)
        elif statistic=="svar":
           out[i] = np.var(sample, ddof=1)
        elif statistic=="pvar":
           out[i] = np.var(sample)
        elif statistic=="ssdev":
           out[i] = sqrt(np.var(sample, ddof=1))
        elif statistic=="psdev":
           out[i] = np.var(sample)
        elif statistic== "min":
           out[i] = min(sample)
        elif statistic == "max":
           out[i] = max(sample)
        else:
           raise ArgumentError,"bad statistic."
        #print out[i]
    return out

v = normalsim(10, 4, 10, 1000, statistic="svar")
print np.mean(v), np.var(v)
When I run the program under the time command, time python normalsim.py, the shell outputted
15.7770857263 55.2458134956

real    0m2.681s
user    0m1.080s
sys     0m0.070s
Problem: Can you rewrite the code so it will be faster? HINT: a comparison is always used inside the loop!