Friday, December 28, 2012

If He Exists, He Must Be a Cruel Joker

In Romantic art, Transcendentalist musings, hippie fantasies and New Age faux spiritualism, nature is painted as benign, its contemplation both nurturing and redemptive. Not so, according to party-pooping Darwinists. These hard-nosed realists decipher nature to be fundamentally violent, "red in tooth and claw", where the danger of violent death, or, worse, being eaten alive, lurks everywhere, as this freshwater crab found out in a no-holds-barred duel with a flesh-sucking tiger leech:

httpv://www.youtube.com/watch?v=kkw6WMuFsL8

Friday, September 14, 2012

Terrorist or Toast?

[caption id="attachment_1432" align="aligncenter" width="575"] The Dream (1910) by Henry Rousseau.[/caption]

What do you do when the fire alarm in your office building goes off and everyone heads for the exit? You know for a fact that this is a well-rehearsed ritual, triggered, as long as anyone cares to remember, not by a ticking bomb planted by a terrorist but by the charred remains of a toast left unattended by a careless co-worker somewhere in the building.

Of course, you join the exodus and head for the exit. This is because the cost of what statisticians call Type I error in situations like this one is “asymmetrical”. In case the fire alarm was triggered by a ticking bomb and you ignore it, you risk losing life and limb.

A Type I error is committed when a true null hypothesis is rejected. When an office fire alarm goes off, the null hypothesis is that the alarm is caused by a grave security threat, perhaps a bomb planted by a terrorist. The alternative hypothesis is that a burnt toast occasioned the fire alarm. You would rather drop everything and head for the designated assembly area than risk the consequences of a security breach even if the probability of such an event is tiny.

When our ancestors were still living in the wild at the mercy of the beasts of prey, they had to make similar calculations. Was the rustle in the bush caused by a gust of wind or a crouching tiger? A Type I error, which in this case meant ascribing the rustle to a gust of wind when, in reality, a hungry tiger was prowling nearby, could be quite unpleasant.

Conversely, false null hypothesises can be accepted, giving rise to Type II errors. If a superstitious person is boarding in an old house, then he may take fright at nocturnal sounds, accepting his own null hypothesis that nocturnal sounds in an old house are made by ghosts. The alternative hypothesis is that aging manmade structures can creak and groan, producing sounds which get amplified at night.

The costs of Type II errors can be prohibitive, too. For example, if airport security checkpoints fail to catch armed terrorists, the consequences can be catastrophic.

Type I errors cannot be decreased without increasing Type II errors and vice versa. If, to minimize Type II errors, security screenings at airports are made too stringent, many innocent passengers would be caught up in the dragnet, increasing Type I errors. However, minimising Type I errors means relaxing security measures, risking the possibility of waving along armed hijackers and bombers through security checkpoints, raising Type II errors and security breaches.

Tuesday, September 11, 2012

Buddha Died After Eating ...

[caption id="attachment_1421" align="aligncenter" width="665"] The cause of the Buddha's death may be a subject of theological dispute but the event itself has inspired countless artists and sculptors down the ages. This 19th century Japanese print shows women and a cat mourning the suicide of actor Ichikawa Danjuro VIII in a scene reminiscent of traditional artistic conception of the Buddha's deathbed, in which animals are often depicted grieving alongside humans.[/caption]

“I forgot today is the Buddha’s anniversary and ate meat!” the Buddhist lady lamented from the back of the car as we cruised along the Princess Highway from Sydney to a South Coast destination.

“Why?” I replied. “The Buddha himself is said to have died after eating pork!”

This was a revelation to the devout lady, who, after a shocked reaction, retreated into silence.

The cause of the Buddha’s death - mahaparinirvana in the Buddhist canonical jargon – is a matter of theological dispute between the two major Buddhist traditions of Theravada and Mahayana.

It is an article of faith in the Mahayana tradition, which has traditionally held sway in Nepal, India, Tibet, China, Korea and Japan, that the Enlightened One died after eating a meal of mushrooms. For the adherents of Mahayana, it seems that anything less than the Buddha’s imagined vegetarianism detracts from his divinity and precept of karuna (compassion) and ahimsa (non-violence).

On the other hand, the more orthodox Theravada tradition, which took root in Sri Lanka, Myanmar, Thailand and Cambodia, takes a comparatively more relaxed view on this delicate matter, holding that the Buddha died after eating a meal of pork near Kusinagar, India. For the doctrinal purists of Theravada, the Buddha’s system seems to rest on a more substantive foundation than his dietary habits.

Almost half a millennium passed before the Buddha’s life and teaching were committed to writing. Needless to say, myth and legend were pressed into service to embellish his actions and utterances and also to promote the agendas of rival claimants to his “true” teaching.

As the Roman statesman and philosopher Cicero wrote, “There are many questions in philosophy to which no satisfactory answer has yet been given. But the question of the nature of the gods is the darkest and most difficult of all…. So various and so contradictory are the opinions of the most learned men on this matter as to persuade one of the truth of the saying that philosophy is the child of ignorance…”

Thursday, September 6, 2012

Eating Rice and Mutton Curry from Brass Plates While Thinking About Home



“Come, eat rice and mutton curry from brass plates – just like back home!” read an advertisement for a Nepalese restaurant here in Sydney, playing on the homesickness and ingrained culinary habits of local Nepalese residents.

I habitually pick up free Nepalese newsletters from various Indo-Bengali-Nepalese grocery stores that dot Sydney’s “ethnic enclaves” to read not so much the news and articles but the advertisements for various products and services.

These advertisements paint an accurate picture of the aspirations, hopes, triumphs, heartaches, struggles and yearnings of the Nepalese people here and elsewhere in Australia.

In the world of print media, there used to be two competing views about the place of the "fourth estate" in the body politic (Admittedly, free community newsletters may not be able to claim membership of the fourth estate as their voice is hardy audible in the cacophony of the national discourse still dominated by a few media corporations).

One of these views proclaimed newspapers to be the raw material of history, or even history itself in the making, while cynics disparaged them as little more than vehicles for selling advertisements and for peddling and perpetuating ignorance, falsehood and bigotry.

The undercurrents of these dichotomous views of the print media can be detected even in humble community newsletters. They definitely convey a sense of history in the making (albeit a peripheral one) and they peddle products and services as well as news, gossips and opinions. They editorialize as if they have the power to shape opinion and guide the hand of history - if not here, then at least back home.

A recent newsletter that I grabbed from a local Nepalese grocery store had an advertisement for a “Roaming pest control”, reflecting growing home ownership in the Nepalese community.

Another advertisement for a Nepalese consultancy promised an “entire gamut of IT and Telecommunications services to your organization”. A bulleted list spelled out what an “entire gamut” meant: e-commerce solution, IT application for enterprises, project management, data migration services, quality assurance and testing, and so on.

Interestingly, the same consultancy also offered “cheapest international flight tickets, low domestic flight tickets in Nepal and India”, and, best of all, “... a lifetime journey to the magistical (sic) Himalayan adobe (sic)”, including Bhutan.

Advertisements like this one signify the presence of a viable local Nepalese market, and the pleasing fact that the diaspora is acquiring wealth and entrepreneurial flair.

In keeping with the zeitgeist, another colour advertisement posted by a “business consultant” asked: “Looking for an Exciting Business Opportunity?”, offering to “make it easier to find you a perfect business with finance”. The business consultant also promised to find finance for car loans, even for students, which told me that the fortunes of Nepalese students have improved dramatically since my own days as an impecunious overseas student.

However, the vast majority of the advertisements are still for overseas student services that range from finding colleges, changing courses and education providers to advice for permanent residency, which still remains the ultimate goal of most Nepalese students.

Perhaps inevitably, even the afore-mentioned IT consultancy-cum-travel agency offered services for international students, revealing the full extent of its “entire gamut”. In the full-page colour advertisement, it posed what it hoped to be a rhetorical question: “Finding it hard to select right Collages (sic) / Universities?”

Sunday, September 2, 2012

"The System Failed Us"

Too often, the “system” is blamed for all sorts of failures and tragedies, but especially for misfortunes arising from refusal to take personal responsibility.

Some time back, I remember watching an Australian man on TV blaming the “system” for the death of his child from starving. By transferring blame to an abstract entity, he was trying to absolve his own role in the tragedy and whitewash his conscience.

To be sure, a robust system of social safety net is a hallmark of civilised societies – there is zero social safety net in third world dictatorships, theocracies and kleptocracies, which are often one and the same thing.

However, you know things have gone too far when people who shirk personal responsibility blame their misfortunes on the “system”, and when politicians of all stripes try to outdo each other in pandering to an overriding sense of entitlement.

So, it was no wonder, then, that many took umbrage at Gina Rinehart's comment last week, when she stated the obvious: "Do something to make more money yourself - spend less time drinking or smoking and socialising, and more time working."

The fact that Rinehart herself did not work her ass off for her billions - she inherited it from her father - does not disqualify her from making such comments nor does it invalidate the thrust of her comment.

Wednesday, August 29, 2012

Dead Worshipers Tell No Tales

[caption id="attachment_1371" align="aligncenter" width="355"]Cicero Cicero, Roman philosopher and statesman.[/caption]

Cicero recounted the following story:

A non-believer named Diagoras was shown painted tablets depicting some worshipers who had prayed and then survived a shipwreck.

Diagoras was not impressed. He wondered about the portraits of those worshipers who had prayed but still drowned.

Apparently, drowned worshipers told no tales.

Think of this story the next time

  • you think the supermarket queue you join always moves slowly. Is it because you tend not to remember the times when queues move fast?

  • someone tells you that dropping out of university is a smart career move ever since Bill Gates left Harvard to found Microsoft. People and the media do not seem to talk a lot about university dropouts who end up with mediocre careers

  • an astrological prediction seems to come true

  • you meet a rude person

  • you read the story of how the self-exiled New Castle mining magnate Nathan Tinkler literally punted his house on a risky mining venture and amassed a fortune within the span of a few years

Tuesday, August 28, 2012

Meaning of Life and Color of Jealousy

[caption id="attachment_1350" align="aligncenter" width="500"]Richard Dawkins Richard Dawkins, biologist, author and atheist.[/caption]

Who has not quested and pondered on the meaning of life?

My own misguided, adolescent quest for the meaning of life led me to the ashram of a yogi, the sermons of J. Krishnamurti, the glib utterances of the charlatan Osho Rajneesh, a university course in Shada Darshana (the Six Schools of Indian Philosophy), and the popular works of Bertrand Russell and other Western thinkers.

One of these thinkers was an eccentric Austrian named Ludwig Wittgenstein, who had become a cult figure by the time he died in Cambridge, England at the age of 62 in 1951.

When Wittgenstein first came to Cambridge in 1911 to study the foundations of mathematics with Russell, his lordship could not decide if Wittgenstein was a crank or a genius but eventually settled for the latter.

In 1929, Wittgenstein returned to Cambridge for a Ph D, occasioning the economist Keynes’ letter to his wife in which he noted: “Well, God has arrived. I met him on the 5:15 train.”

Wittgenstein’s philosophy of logical positivism inspired a group of thinkers called the Vienna Circle. Its central tenet was distilled in the aphorism “The meaning of a sentence is its method of verification.”

According to this "method of verification", a sentence had to satisfy one of the following conditions to be valid:

1. True by definition. “A triangle has three sides.”
2. Empirically verifiable: “Mt. Everest is taller than Mt. Druit.”

Conversely, the following sentences are not valid as they are neither true by definition nor empirically verifiable:

1. A thing of beauty is a joy forever
2. Jesus was born of immaculate conception
3. God is great
4. In my End is my Beginning

Logical positivism had a grand ambition: To smash metaphysics and, with it, all the “ultimate” questions. It ran into a familiar epistemological hurdle: itself.

By the yardstick of logical positivism itself, the sentence “The meaning of a sentence is its method of verification” is a nonsense. Perhaps, this is why, despite Wittgenstein’s modest belief that he had solved all philosophical problems by analysing language, we keep asking the ultimate questions.

Fast forward to 2012 and an epiphany of sorts!

A recent ABC TV’s Q & A panel discussion pitted Cardinal George Pell, Archbishop of Sydney, against the British evolutionary biologist and outspoken atheist Richard Dawkins.

At one point, Dawkins, arguing that life has no meaning beyond itself, said that just because you can ask a question does not mean it is a valid question.

“What is the color of jealousy?” is one such invalid question, according to the renowned biologist and author of The Selfish Gene and The God Delusion.

Dawkins’ argument resonated deeply with me. I wish I had come across such an insight during my adolescent meanderings. Then, perchance, if not abandon my futile search for the miraculous, I might at least not have given up the study of calculus in favor of canard.

Monday, August 27, 2012

More Dog Whistling TV Programs, Please!

Who says prime time current affair programs on free-to-air TV channels are rubbish? I propose that they can be edifying.

In a recently aired one such TV program, a colleague apparently heard the phrase "ethnic enclaves" for the first time. In a low, conspiratorial whisper to another colleague, my colleague gingerly pronounced the word "enclaves" a few times like a jittery rugby fullback juggling a high ball before gathering it safely into his arms.

It was apparent that my thirty something colleague had never come across this phrase before, but thanks to the current affair program, my colleague had finally learned a new phrase and, with it, a new category of evils lurking in the suburbs.

This got me thinking. If my colleague had never heard the phrase "ethnic enclaves" before, what about associated words and phrases such as "ethnic cleansing", "pogroms", "dog whistling", "xenophobia", "yellow peril", "Asian horde", "concentration camps", "Lebensraum" and so on? Surely, the Balkan wars took place within the living memory of my colleague’s generation?

Please, explain!

To sum up, we need more dog-whistling current affair TV programs advocating ethnic cleansings, pogroms and concentration camps so that the suburbs can be ridden of undesirable ethnic enclaves and this fair Continent of carefree surfers and cricketers shielded from the imminent dangers posed by the yellow peril and swarthy hordes.

Sunday, August 26, 2012

Ambushed by Outlier



On Saturday morning, I reached a radiology in a neighbouring suburb for a dental x-ray at 9:30, hoping to wrap up the visit in 15 minutes. I ended up waiting for an hour - twice punctuated by my inquiries about expected waiting time  - before an amiable, bespectacled radiologist materialised and led me to the x-ray room.

When asked why I had to wait so long when this type of x-ray should be a fairly short affair, he only murmured, "I don't know. It was a misunderstanding."

Back at the main waiting room, I handed the slip of paper that the radiologist had asked me to hand in to the reception, repeating to one of the receptionists what the radiologist had told me, that I had waited an hour as a result of a "misunderstanding".

The receptionist conferred with a colleague to her right and said something like, "You'd to wait as long as you did because the x-ray room has another machine that was being used by another patient for a procedure that takes time."

"You're unlucky," she added as she apologized and wished me a great day.

***


To begin with, what us lay folks call being "unlucky", experts from quantitative disciplines such as economics, social sciences and statistics may call being victims of "outliers", which are nothing but rare, out-of-the-ordinary events.

Examples of outliers include winning a lottery, the volcanic eruption of Mt. Vesuvius in 79 AD that buried Pompeii, air crashes in developed nations, Australia's national rugby team Wallabies posting a win against New Zealand's All Blacks ... and, apparently, waiting for an hour to get one's "missing/crowded" teeth x-rayed.

Since my dentist indicated that I will have to make a number of visits to the radiology, I am mainly concerned with two questions.

First, what is the probability of my being "unlucky" in the same radiology's waiting room in my next visit?

For the sake of argument, let's assume that, on average, 1 in 100 dental x-ray patients on a Saturday morning gets "unlucky" in that particular radiology, ending up waiting an hour. This means the probability of getting unlucky is 1 percent or 0.01.

From this, assuming that two visits to the radiology for dental x-rays are independent, i.e. the first visit does not influence the waiting time of the second visit, the probability that I will again be "unlucky" on the second visit is still 0.01.

A slightly different question is this. What is the probability that a patient like myself will be "unlucky" in two consecutive visits? Intuition tells us that it has to be lower than 1 in 100. In fact, it is 0.01 x 0.01 = 0.0001 or 1 in 10,000.

On the other hand, I may reason that since I was already unlucky in my last visit to the radiology, the probability of getting unlucky in the next visit is lower than 1 in 100. If I reason like this, assuming that the previous visit has no effect on the waiting time of my next visit, I have just fallen victim to the gambler's fallacy.

The next question I am interested is this. If I am not unlucky in my next visit to the radiology, what should I expect the waiting time to be? To put it another way, what is the average waiting time for a dental x-ray on a Saturday morning?

Assuming that waiting times are normally distributed and a waiting time of 1 hour lies in the upper 1 percent of the distribution, i.e. 99 percent of waiting times are less than 1 hour, the average waiting time is around 35 minutes, with the standard deviation of around 11 minutes.

This means that my initial hope of wrapping up the visit in 15 minutes was a forlorn hope. From the properties of the normal distribution, it can be estimated that there is less than 3 percent likelihood of a waiting time to be 15 minutes or less.

Tuesday, July 17, 2012

Luhn Algorithm in Teradata SQL

Luhn algorithm is used, among others, to calculate the checksum digit of credit cards and mobile handset IMEIs. The following is my attempt to implement this algorithm in Teradata sql. It flags each IMEI as valid or not. Needless to say, IMEIs would typically be read from a table rather than hard-coded as in this example.
SELECT dt3.IMEI
,CASE
WHEN (dt3.dig1 + dt3.dig2 + dt3.dig3 + dt3.dig4
+ dt3.dig5 + dt3.dig6 + dt3.dig7
+ dt3.dig8 + dt3.dig9 + dt3.dig10 + dt3.dig11
+ dt3.dig12 + dt3.dig13
+ dt3.dig14 + dt3.dig15) MOD 10 = 0 THEN 'Y'
ELSE 'N'
END AS VALID_IMEI
FROM
(
SELECT dt2.IMEI
,dt2.dig1
,CASE
WHEN dt2.dig2 = 0 THEN 0
WHEN dt2.dig2 MOD 9 = 0 THEN 9
ELSE dt2.dig2 MOD 9
END AS dig2
,dt2.dig3
,CASE
WHEN dt2.dig4 = 0 THEN 0
WHEN dt2.dig4 MOD 9 = 0 THEN 9
ELSE dt2.dig4 MOD 9
END AS dig4
,dt2.dig5
,CASE
WHEN dt2.dig6 = 0 THEN 0
WHEN dt2.dig6 MOD 9 = 0 THEN 9
ELSE dt2.dig6 MOD 9
END AS dig6
,dt2.dig7
,CASE
WHEN dt2.dig8 = 0 THEN 0
WHEN dt2.dig8 MOD 9 = 0 THEN 9
ELSE dt2.dig8 MOD 9
END AS dig8
,dt2.dig9
,CASE
WHEN dt2.dig10 = 0 THEN 0
WHEN dt2.dig10 MOD 9 = 0 THEN 9
ELSE dt2.dig10 MOD 9
END AS dig10
,dt2.dig11
,CASE
WHEN dt2.dig12 = 0 THEN 0
WHEN dt2.dig12 MOD 9 = 0 THEN 9
ELSE dt2.dig12 MOD 9
END AS dig12
,dt2.dig13
,CASE
WHEN dt2.dig14 = 0 THEN 0
WHEN dt2.dig14 MOD 9 = 0 THEN 9
ELSE dt2.dig14 MOD 9
END AS dig14
,dt2.dig15
FROM
(
SELECT dt1.IMEI
,SUBSTR(dt1.IMEI, 1, 1) AS dig1
,SUBSTR(dt1.IMEI, 2, 1) * 2 AS dig2
,SUBSTR(dt1.IMEI, 3, 1) AS dig3
,SUBSTR(dt1.IMEI, 4, 1) * 2 AS dig4
,SUBSTR(dt1.IMEI, 5, 1) AS dig5
,SUBSTR(dt1.IMEI, 6, 1) * 2 AS dig6
,SUBSTR(dt1.IMEI, 7, 1) AS dig7
,SUBSTR(dt1.IMEI, 8, 1) * 2 AS dig8
,SUBSTR(dt1.IMEI, 9, 1) AS dig9
,SUBSTR(dt1.IMEI, 10, 1) * 2 AS dig10
,SUBSTR(dt1.IMEI, 11, 1) AS dig11
,SUBSTR(dt1.IMEI, 12, 1) * 2 AS dig12
,SUBSTR(dt1.IMEI, 13, 1) AS dig13
,SUBSTR(dt1.IMEI, 14, 1) * 2 AS dig14
,SUBSTR(dt1.IMEI, 15, 1) AS dig15
FROM
(
SELECT '999999999999999' AS IMEI
FROM SYS_CALENDAR.CALENDAR
WHERE calendar_date = CURRENT_DATE

UNION

SELECT '352651010278244' AS IMEI
FROM SYS_CALENDAR.CALENDAR
WHERE calendar_date = CURRENT_DATE

) AS dt1
) AS dt2
) AS dt3
;

It returns the following answerset:

Saturday, July 7, 2012

Some Parameters And Their Estimators And Standard Errors

[table class = "table-bordered"] PARAMETER ($\Theta$), ESTIMATOR ($\hat{\Theta}$), STD ERR ($\sigma_{\hat{\Theta}})$, ESTIMATE OF STD ERR ($s_{\hat{\Theta}}$)

$\mu$, $\hat{y}$, $\frac{\sigma}{\sqrt{n}}$, $\frac{s}{\sqrt{n}}$

$\mu_1 - \mu_2$, $\hat{y_1} - \hat{y_2}$, $\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$,"$\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2} {n_2}},  n_1 \ge 30,  n_2 \ge 30$"

$\mu_1 - \mu_2$, $\hat{y_1} - \hat{y_2}$, $\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$,"$\sqrt{s_p^{2*}(\frac{1}{n_1} + \frac{1}{n_2})},  n_1 < 30  or  n_2 < 30$"

$\frac{\sigma_1^2}{\sigma_2^2}$,$\frac{s_1^1}{s_2^2}$," " ," " , [/table]

*$s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}$

Confidence Intervals for a Population Parameter $\Theta$ and Test Statistics for $H_0: \Theta = \Theta_0$, where    $\Theta = \mu$  or  $(\mu_1 - \mu_2)$:

[table class = "table-bordered"] SAMPLE SIZE, CONFIDENCE INTERVAL, TEST STATISTIC

Large, $\hat{\Theta} \pm z_{\alpha/2}s_{\hat{\Theta}}$, $z = \frac{\hat{\Theta} - \Theta_0}{s_{\hat{\Theta}}}$

Small,$\hat{\Theta} \pm t_{\alpha/2}s_{\hat{\Theta}}$,$t = \frac{\hat{\Theta} - \Theta_0}{s_{\hat{\Theta}}}$ [/table]

The test statistic for testing the null hypothesis $(H_0: \frac{\sigma_1^2}{\sigma_2^2} = 1)$ is $F = \frac{s_1^2}{s_2^2}$

Tuesday, July 3, 2012

SQL Query Tuning

SQL query tuning is a dark art but here are some simple tips that anyone can use.

Sunday, July 1, 2012

Summation Rules

Here is a list of summation rules, with 'k' denoting a constant:

  1. \[\sum_{i=1}^{n}{(x_i + y_i)} = \sum_{i={1}}^{n}{x_i} + \sum_{i=1}^{n}{y_i}\]

  2. \[\sum_{i=1}^{n}{(x_i - y_i)} = \sum_{i=1}^{n}{x_i}  - \sum_{i=1}^{n}{y_i}\]

  3. \[\sum_{i=1}^{n}{x_iy_i} \neq \sum_{i=1}^{n}{x_i} \times \sum_{i=1}^{n}{y_i}\]

  4. \[\sum_{i=1}^{n}{x_i}^2 \neq (\sum_{i=1}^{n}{x_i})^2\]

  5. \[\sum_{i=1}^{n}{k} = nk\]

  6. \[\sum_{i=1}^n{(x_i + k)} = \sum_{i=1}^{n}{x_i} + \sum_{i=1}^{n}{k} = \sum_{i=1}^{n}{x_i}+nk\]

  7. \[\sum_{i=1}^{n}{(x_i-k)} = \sum_{i=1}^{n}{x_i} - nk\]

Simulating Central Limit Theorem

In this post, the Central Limit Theorem (CLT) will be simulated using Python, SciPy and matplotlib. The CLT gives the following two theorems:

Theorem 1: If the sampled population is normally distributed with population mean = $\mu$ and standard deviation = $\sigma$, then for any sample size n, sampling distribution of the mean for simple random samples is normally distributed, with mean ($\mu_\overline{x}$) = $\mu$ and standard deviation ($\sigma_\overline{x}$) = $\frac{\sigma}{\sqrt {n}}$.

Theorem 2: For large sample sizes $(n\geq 30)$, even if the sampled population is not normally distributed, the sampling distribution of the mean for simple random samples is approximately normally distributed, with mean ($\mu_\overline{x}$) = $\mu$ and standard deviation ($\sigma_\overline{x}$) = $\frac{\sigma}{\sqrt {n}}$.

The standard deviation of sampling mean ($\sigma_\overline{x}$) is also known as the standard error of mean, standard error of estimate or simply as standard error as the sampling standard deviation gives the average deviation of the sample means from the actual population mean.

The following Python script simulates Theorem 1.
#----------------------------------------------------------------------------
# By Ram Limbu @ ramlimbu.com
# Copyright 2012 Ram Limbu
# License: GNU GPLv3 http://www.gnu.org/licenses/gpl.html
#----------------------------------------------------------------------------

# import required packages
import random
import matplotlib.pylab as pylb

def plotDist(t, val='Values'):
'''plot histogram of distribution'''
pylb.hist(t, bins=50, color='R')
pylb.title('Central Limit Theorem Simulation')
pylb.ylabel('frequency')
pylb.xlabel(val)
pylb.show()

def simulateSampDist(t_pop):
'''simulate sampling distributions'''
samp_sizes = (5,15,25)
t_samp_mean = []
for i in range(0,len(samp_sizes)):
for j in range(0,1000):
t_samp_mean.append(pylb.mean(random.sample(t_pop, samp_sizes[i])))

# plot the population distribution
samp_mean = round(pylb.mean(t_samp_mean), 2)
samp_stddev = round(pylb.std(t_samp_mean), 2)
val = 'mean = ' + str(samp_mean) + ' stddev = ' + str(samp_stddev) \
+ ' n=' + str(samp_sizes[i])
plotDist(t_samp_mean, val)

def main():
'''simulate central limit theorem'''

# generate a population of 10,000 normally distributed random numbers
# with mean = 50 and standard deviation = 10
t_pop = []
mu = 50
sigma = 10
pop_size = 10000

for i in range(0,pop_size):
t_pop.append(random.gauss(mu, sigma))

# plot a histogram of the population
plotDist(t_pop)

# simulate sampling distributions by drawing and replacing
# samples of various sizes from this population
simulateSampDist(t_pop)

if __name__ == '__main__':
main()

First, it creates a normally distributed population of 10,000 pseudo-random numbers with $\mu$ = 50 and $\sigma$ = 10. Then, it takes a sample of size 5, calculates its mean and appends it to a list, repeating this process 1,000 times. Finally, it plots the histogram of the sample means. Then, it repeats the whole sampling process with samples of size 15 and 25.

Histogram of normally distributed population of 10,000 random numbers.


The following histogram shows the distribution of sampling means of size 5. It has mean of 49.92, which is close to the population mean of 50, and the standard error of 4.51. The latter figure decreases as the sample size increases.


The next two figures show the distribution of means of samples of size 15 and 25. Note that in each case, the distribution has a mean close to 50, with the standard error decreasing as the sample size increases.

histogram of sample means of size 15


 


 The following Python script simulates Theorem 2, generating means, standard errors and histograms of samples of size 30, 50 and 100 from a population of exponentially distributed pseudo-random numbers with $\mu$=50.
#----------------------------------------------------------------------------
# By Ram Limbu @ ramlimbu.com
# Copyright 2012 Ram Limbu
# License: GNU GPLv3 http://www.gnu.org/licenses/gpl.html
#----------------------------------------------------------------------------

# import required packages
import random
import matplotlib.pylab as pylb

def plotDist(t, val='Values'):
'''plot histogram of distribution'''
pylb.hist(t, bins=30, color='R')
pylb.title('Central Limit Theorem Simulation')
pylb.ylabel('frequency')
pylb.xlabel(val)
pylb.show()

def simulateSampDist(t_pop):
'''simulate sampling distributions'''
samp_sizes = (30,50,100)
t_samp_mean = []
for i in range(0,len(samp_sizes)):
for j in range(0,1000):
t_samp_mean.append(pylb.mean(random.sample(t_pop, samp_sizes[i])))

# plot the population distribution
samp_mean = round(pylb.mean(t_samp_mean), 2)
samp_stddev = round(pylb.std(t_samp_mean), 2)
val = 'mean = ' + str(samp_mean) + ' stddev = ' + str(samp_stddev) \
+ ' n=' + str(samp_sizes[i])
plotDist(t_samp_mean, val)

def main():
'''simulate central limit theorem'''

# generate a population of 10,000 exponentially distributed random numbers
# with mean = 10
t_pop = []
mu = 50.00
pop_size = 10000

for i in range(0,pop_size):
t_pop.append(random.expovariate(1/mu))

# plot a histogram of the population
plotDist(t_pop)

# simulate sampling distributions by drawing and replacing
# samples of various sizes from this population
simulateSampDist(t_pop)

if __name__ == '__main__':
main()

The following figure shows the histogram of a population of exponentially distributed 10, 000 pseudo-random numbers. The distribution is centred on 50, and is positively skewed.



The next three figures show the distributions of means of samples of size 30, 50 and 100. Even though the samples were drawn from a non-normal distribution, the sample distributions approximate normal distribution as the sample size increases.







The importance of the CLT lies in the fact that given normally distributed populations or sufficiently large sample sizes ($n\geq 30$), it shows that (a) the sample statistic ($\mu_\overline{x}$) approximates population parameter ($\mu$) and (b) sampling distributions approximate normal distribution. Once a distribution approximates normality, the properties of normal distribution can be used to make inferences about the sampled population.

 

Tuesday, June 19, 2012

How Many Rooms Should This Hotel Overbook?

The following example is taken from A Second Course in Business Statistics: Regression Analysis (4th edn) by William Mendenhall and Terry Sincich:
Often, travellers who have no intention of showing up fail to cancel their hotel reservations in a timely manner. These travellers are known in the parlance of the hospitality trade, as "no-shows".

The no-shows for a 500-room hotel for a sample of 30 days are as follows:

18, 16, 16, 16, 14, 18, 16, 18, 14, 19, 15, 19, 9, 20, 10, 10, 12, 14, 18, 12, 14, 14, 17, 12, 18, 13, 15, 13, 15, 19

Based on this sample, what is the minimum number of rooms that the hotel should overbook?

The mean number of no-shows for the sample =  15.133

The standard deviation of no-shows for the sample = 2.945

When sample size is 30 or more, as is the case in this example, the distribution of sample means is approximately normal as per the Central Limit Theorem irrespective of the distribution of the sampled population. In the normal distribution, 95% of data points lie within 2 standard deviations from the mean. For our sample,

mean ± 2 * standard deviation = 15.133  ± 2 * 2.945 = 15.133 ± 5.890


 In other words, 95% of the time, the no-shows range between 9.243 and 21.023 (the red region in the figure above). Hence, the hotel can overbook at least 9.243 or 10 rooms each day and still be highly confident of honouring all reservations.

Here  is my Python script to calculate the mean and standard deviation of the example dataset:
#-------------------------------------------------------------------------------
# By Ram Limbu @ ramlimbu.com
# Copyright 2012 Ram Limbu
# License: GNU GPLv3 http://www.gnu.org/licenses/gpl.html
#-------------------------------------------------------------------------------

import math

def mean(t):
''' Returns the mean of the measurements

args:
t: list of measurements
'''
return float(sum(t))/len(t)

def var(t):
''' Returns sample variance

args:
t: list of measurements
'''
mu = mean(t)
devsq = [(x - mu) ** 2 for x in t]
sample_var = sum(devsq) / (len(t) - 1)
return sample_var

def stddev(t):
''' Returns the standard deviation of the sample

args:
t: list of measurements
'''
return math.sqrt(var(t))

def main():
noshows = [18, 16, 16, 16, 14, 18, 16, 18, 14, 19, \
15, 19, 9, 20, 10, 10, 12, 14, 18, 12, \
14, 14, 17, 12, 18, 13, 15, 13, 15, 19]
mu = mean(noshows)
sigma = stddev(noshows)
print 'mean of no-shows is', mu
print 'sample variance of no-shows is', var(noshows)
print 'standard deviation of no-shows is', sigma
print 'mean - 2 * standard deviations is', mu - 2 * sigma
print 'mean + 2 * standard deviations is', mu + 2 * sigma

if __name__ == '__main__':
main()

Tuesday, February 14, 2012

A maggot-minded, starved, fanatic crew

Omar Khayyam wrote:
And do you think that unto such as you;
A maggot-minded, starved, fanatic crew:
God gave the secret, and denied it me?--
Well, well, what matters it! Believe that, too.

Sunday, February 12, 2012

Bobby's Dream

Bobby was his name. A swarthy fella with slick hair, big hooked nose, gaunt features and tall, reedy figure. His brooding expression and gnarled hands betrayed a lifetime spent cleaning toilets in pubs and hotels.

If you ran into him in a toilet in the wee hours of a Saturday morning with a stub of cigarette in his mouth and a wet mop in his hands, you could not help thinking about the villainous sorcerers in Walt Disney cartoons.

And yet, Bobby had a dream.

Every week, he would half-seriously announce that he was quitting his job and moving to the sunshine State to retire and fish.

At the time, Bobby's weekly retirement announcements were taken as something of a light-hearted joke, but looking back, I cannot help thinking that, perhaps, he had bought a lottery ticket every week of his working life with the hope of claiming a life-changing prize.

Once I commented to a hotel patron that Bobby, when not working, always seemed to be drinking and playing pokies in the hotel where he toiled. Bobby took my comment as a compliment.

"That sounds alright to me," he said without taking his eyes off the one-armed bandit that he was battling.

Born and brought up in Australia, Bobby, who was approaching retirement age, traced his lineage to Fijian Indians.  This meant that, whether he realised it or not, one of his ancestors was probably sold into bonded labour by his own impoverished family in a rural Indian village.

Apart from the heartache of having to leave behind for good his family and village, Bobby’s unfortunate ancestor had to endure the horror of crossing Kaala Paani, literally “black water”, that haunted the imagination of illiterate villagers like a nightmare.

Perhaps, a century had passed since his fateful crossing of Kaalaa Paani and subsequent disgorgement onto a Fijian sugarcane plantation, but one of his descendants was still cleaning toilets in a sahib’s hotel.

Admittedly, compared to his indentured ancestor, Bobby’s lot was much better. He did not have to cower in fear of abusive foremen who bullied and beat him. He could drink in the same bar where semi-retired rich white fellas drank, and he did. Everyone treated him nicely.

I started this post with the working title of “Lottery Approach to Life”, with Bobby’s life held up and dissected as a prime example. However, I lost the plot …

I have not seen Bobby in almost a decade. I left the posh peninsula with its touristy vibe, and the hotel where Bobby and I used to work was sold after the witless owner lost his own lottery and the resulting arms wresting with his bank. I wonder if the new owner ‘relieved’ Bobby of his duties.

Whatever happened, I just hope that Bobby finally won his lottery and retired to a life of ease and fishing in his beloved Queensland.

His Excellency Saluted by Red Army

Groucho Marx once resigned from a club with the explanation that “I don’t want to belong to any club that will accept people like me as a member”.

This famous quip partly explains why I myself do not care to join any club. This bothers and annoys some of my Sydneysider Nepalese friends, who conclude that, since I do not show up in the movable feast of their club barbeques and song-and-dance fests, I must lie low all the time in my dimly-lit ‘cave’ like a mythical slumbering monster. I call this non sequitur “social solipsism”.

To be sure, I do not remember turning down an invitation for private functions. I recently had the pleasure of attending one such function at a friend’s place. I joined a throng of people in the drawing room, and sat down on the floor to watch Australian Open Tennis on Channel 7.

An elderly visitor from Nepal whom I did not know was holding court, surrounded by some well-known stalwarts of the local Nepalese community scene, some of them sitting on the floor just like myself but with their backs to the TV in deference to the elderly visitor.

Even though I was focusing on tennis, I could not help listening to snatches of their conversation, which was, in reality, more like a monologue delivered by the elderly visitor as he regaled his audience by recounting how he had been saluted by Red Army guards and addressed as "Your Excellency" by Foreign Ministry mandarins during a visit to China.

“A Nepalese tour operator with a talent for self-promotion. Elementary, my dear Watson,” I ratiocinated subconsciously.

Inevitably, their conversation turned to the social and economic problems in Nepal and the demands for self-determination by various ethnic groups. Here, the visitor and his listeners politely agreed to disagree, which was not surprising given that the visitor, unlike his audience, belonged to the ruling caste in Nepal.

Finally, the discussion converged on a root cause analysis of the problems besetting the beautiful Himalayan republic. At last, all parties could reach some sort of consensus. Yes, all agreed, it was not the domination of the ruling caste or the ‘machinations’ of New Delhi, Beijing or Washington that was holding back Nepal’s destiny but a lack of developed institutions.

One local community stalwart clinched the argument by holding up the example of North Korea, pointing out the obvious that the peaceful transfer of power in that glorious nation in the aftermath of Dear Leader Kim Jong-il’s sudden death demonstrated its institutional maturity.

Soon thereafter, the elderly visitor left amidst a flourish of parting ‘Namastes’, and his erstwhile interlocutors started to swap notes and conduct a postmortem of their robust intellectual joust with the visitor. They remonstrated among themselves that the elderly visitor, who seemed to command a lot of respect even in abstentia, had not offered any ‘guidance’ on the question of the ethnic issues.

Curiosity got the better of me and I inquired about the departed visitor. It transpired that he was the Attorney General in a former Nepali Congress government.

Such an August Personage publicly boasting about being saluted by Red Army guards as if it was the highpoint of his public career … and his ethnic audience expecting to be given a prescription for a political panacea by a distinguished buffoon from their masterly class … supposing such a panacea exists …

Perhaps, I should, after all, join a community group to enliven my mirthless existence.

Saturday, February 11, 2012

Auspicious Moment for Cogitation

Lately, I have been having a lot of fun at work drafting emails to various internal ‘stakeholders’. Being a data analyst who spends the bulk of his time crafting and running SQL queries against a ponderous leviathan of a data warehouse, there are frequent downtimes due to competing queries running simultaneously, insolent IT cretins performing in broad daylight what are intended to be nocturnal ‘cron’ jobs, or my own queries scanning and processing gigantic datasets such as call record details.

Since I refuse to ascend to the sunny uplands of my non-existent Facebook to update my status every nanosecond, I often descend with glee and gusto, as I wait for my queries to fetch desired records from the netherworld of Teradata ‘amps’, to the corporate banality of email writing.

While not compromising or clouding the messages, one of my aims in drafting emails to the mythical stakeholders who rely on data analysts for reports and analyses is to parody the imagined diction of an educated foreigner who learned English by reading Gibbon with the aid of nothing more than a hefty, well-thumbed dictionary. For good measure, I often intersperse my turgid, highfalutin prose with Latin phrases. Quid quid latine dictum sit, altum videtur.

“My final obiter dictum on the … report …”, announced one of my recent emails. Another began: “Now is a most auspicious moment to cogitate on …”. “Do you wish to circumscribe the report with a temporal boundary by prescribing an arbitrary baseline date? If yes, did the madam have a date in mind?” another inquired politely of a young marketing ‘exec’. Another finished by lavishing “most sincere thanks on the honorable gentlemen” who were implementing an IT change request.

Far be it from me to mock my stakeholders, who are really my colleagues, even though my partner warns that is how my playfulness, designed partly to alleviate ennui, could be misconstrued. In reality, I am also partly playing to the stereotype of data analysts, who inhabit, in my team’s case anyway, that crepuscular no-man’s land between the IT and marketing department.

With their Masters of the Universe mindset, some IT managers, the vast majority of whose roles furnish the modern equivalents of overseers of indentured labor in the far-flung sugarcane plantations of a benighted age, look down on data analysts as little more than middling marketing mediocrities uninitiated in the runes and rituals of information technology. Some marketing execs and product managers, on the other hand, suspect data analysts of being nothing more than number-crunching numb nuts devoid of humanizing creative impulses.

Actually, just like any other profession, “marketing analytics” attracts people from varied and storied backgrounds. My own group has, at various times, counted in its ranks analysts with degrees and backgrounds in mathematics, linguistics, literature, statistics, IT, computer science, software engineering, robotics, business, marketing, hospitality, customer service, etc.

All data analysts perform three key tasks: Scouring, sourcing and cleaning data, called “data munging” in the trade, followed by analysis and/or modeling, which can range from pivoting data in Excel to implementing sophisticated machine learning algorithms, and, finally, presenting them to stakeholders, an art that has spawned its own sub-discipline of “visualizing beautiful data”.

The profession, which is red-hot at the moment due to the exponential growth and availability of “big data”, has its share of quackery but is there one that does not?

But I have strayed far from the topic. Ipso facto, now is a most auspicious moment to shut up.