Sunday 19 December 2010

Rough stats: Death Penalty

With Christmas just round the corner, what better time to look at death penalty statistics? From these lists I thought I'd compare the GDP of countries who have banned the death penalty with those where it is still permitted (so the first list and the last list on that page). For GDP I've used this, primarily the IMF numbers, but also the CIA World Factbook ones when a country wasn't in the IMF. After excluding countries that don't have easily obtainable GDP data, our dataset features 69 countries who still permit the death penalty and 91 that have banned it.

First up, let's compare mean GDPs of countries that have banned the death penalty with those that permit it:

Mean GDP of countries permitting the death penalty: 441,481 million USD
Mean GDP of countries that have banned the death penalty: 258,376 million USD

and looking at means the countries that still have the death penalty have much higher GDP. However, mightn't a few countries (mostly the USA, but also Japan and China) be dragging the death penalty mean up? Let's compute the median instead:

Median GDP of countries permitting the death penalty: 21,308 million USD
Median GDP of countries that have banned the death penalty: 31,511 million USD

and now it's the countries that have banned the death penalty that have the higher GDP. (Of course, using mean for something as skewed as GDP was silly to begin with, but it's always good to illustrate this sort of thing.)

Interestingly, the median GDP of all the countries in the world is 21,749 million USD, a figure incredibly close to that of those permitting the death penalty. You can read what you like into that.

Thursday 16 December 2010

Rough stats: What is everyone studying?

With the recent news surrounding the proposed rise in tuition fees in the UK, I thought I'd see how many more students we have these days. The claim is that with more than ever before going to university it's impractical for every student to be funded by the taxpayer, but how many more are there?

I've rifled through the Higher Education Statistics Agency for figures, and my first graph is of total students numbers since 1996/97:

A pretty clear trend, then. Although there are a couple of caveats. The main one is that I have plotted total undergraduates, which doesn't separate international students from UK ones, which obviously muddies the water a bit. The other is that my y-axis starts at 1 million, so the increase looks a little more dramatic than it is (although a 25% increase is still pretty big).

What I find most interesting about this graph, however, are the two places where the trend gets interrupted. They represent when key changes were made to how much a degree would cost - the first being when tuition fees were initially introduced, and the second when top-up fees (trebling the cost of most courses) came in. It doesn't seem that these two measures have had long-term impacts on the increasing number of students, but eyeballing it is obviously pretty dangerous, and since we don't have data prior to 1996 we can't really say much about earlier trends either.

So we know there are more students, but what are they studying? Again, I've looked at undergraduates only, and plotted the numbers of students studying various subjects from 2002/03 to 2008/09. There are subject-by-subject data, but to try and make it close to comprehensible I've used HESA's 19 subject 'areas'. For reference, I've listed which subjects each subject area includes at the bottom, as it can explain a lot of the relative popularity of each.

It's not a particularly good graph, I know (and you probably need to click on it to see it properly), but I just wanted an overview to see if anything leapt out. (Don't worry about some of the colours being quite similar - the legend is arranged to match up with the order the lines appear on the far right of the plot, so it should be just about decipherable.)

First of all, why only from 2002/03? The answer is because there seems to be a dramatic change in how degrees were classified in the datasets I found. Prior to 2002 around 100,000 students were categorised as doing a 'combined' subject, but this suddenly dropped to just 10,000. At the same time various other subjects saw massive jumps in numbers - clearly most of the combined subjects were now being counted amongst other categories, and so it's easiest to just look at the data from this point onwards.

What are the main trends? Most subjects seem to be gradually increasing, as we might expect, but there are some that stay roughly constant, and some that drop considerably. In particular, computer science is having a terrible time of it, with a huge drop over the last seven years.

A better way to look at these data, however, is to consider what proportion of undergraduates are studying which subjects, rather than their absolute numbers. This gives us a clearer picture of the changes in the makeup of our student population, and should highlight which subjects are just increasing in line with the overall surge in student numbers, and which are losing out or doing better still.

It might not look like too much has changed, but you can see plenty of subjects' lines aren't quite as steep as they were. One can now identify with slightly more confidence which subjects are getting more than their fair share of new students.

As you've probably noticed, this 'analysis' is pretty rough and not particularly scientific. The main warning I should probably provide is that you shouldn't read too much into the subject area headings. For instance, something like biological sciences seems to be getting a bigger share of the pie, but this doesn't mean biology is. From 2002/03 to 2008/09 biology increases from 17,390 undergraduates to 18,885, whereas sports science goes from 15,755 to 31,370.

A subject-by-subject analysis might be forthcoming, should I get bored enough over the festive period, but for now I must resist. All the data are available for free here, though. So if you're super keen you could poke around it yourself. You never know, you might be able to work out why statistics has dropped from 1,680 undergraduates to 1,325. It seems totally inexplicable to me...

--------------------------

Those subject areas in full...

Business & administrative studies: Broadly-based programmes within business & administrative studies; Business studies; Management studies; Finance; Accounting; Marketing; Human resource management; Office skills; Hospitality, leisure, tourism & transport; Others in business & administrative studies

Subjects allied to medicine: Broadly-based programmes within subjects allied to medicine; Anatomy, physiology & pathology; Pharmacology, toxicology & pharmacy; Complementary medicine; Nutrition; Ophthalmics; Aural & oral sciences; Nursing; Medical technology; Others in subjects allied to medicine

Creative arts & design: Broadly-based programmes within creative arts & design; Fine art; Design studies; Music; Drama; Dance; Cinematics & photography; Crafts; Imaginative writing; Others in creative arts & design

Social studies: Broadly-based programmes within social studies; Economics; Politics; Sociology; Social policy; Social work; Anthropology; Human & social geography; Others in social studies

Biological sciences: Broadly-based programmes within biological sciences; Biology; Botany; Zoology; Genetics; Microbiology; Sports science; Molecular biology, biophysics & biochemistry; Psychology; Others in biological sciences

Engineering & technology: Broadly-based programmes within engineering & technology; General engineering; Civil engineering; Mechanical engineering; Aerospace engineering; Naval architecture; Electronic & electrical engineering; Production & manufacturing engineering; Chemical, process & energy engineering; Others in engineering; Minerals technology; Metallurgy; Ceramics & glasses; Polymers & textiles; Materials technology not otherwise specified; Maritime technology; Biotechnology; Others in technology

Languages: Broadly-based programmes within languages; Linguistics; Comparative literary studies; English studies; Ancient language studies; Celtic studies; Latin studies; Classical Greek studies; Classical studies; Others in linguistics, classics & related subjects; French studies; German studies; Italian studies; Spanish studies; Portuguese studies; Scandinavian studies; Russian & East European studies; European studies; Others in European languages, literature & related subjects; Chinese studies; Japanese studies; South Asian studies; Other Asian studies; African studies; Modern Middle Eastern studies; American studies; Australasian studies; Others in Eastern, Asiatic, African, American & Australasian languages, literature & related subjects

Law: Broadly-based programmes within law; Law by area; Law by topic; Others in law

Computer science: Broadly-based programmes within computer science; Computer science; Information systems; Software engineering; Artificial intelligence; Others in computing sciences

Physical sciences: Broadly-based programmes within physical sciences; Chemistry; Materials science; Physics; Forensic & archaeological science; Astronomy; Geology; Science of aquatic & terrestrial environments; Physical geographical sciences; Others in physical sciences

Education: Broadly-based programmes within education; Training teachers; Research & study skills in education; Academic studies in education; Others in education

Historical and philosophical studies: Broadly-based programmes within historical & philosophical studies; History by period; History by area; History by topic; Archaeology; Philosophy; Theology & religious studies; Others in historical & philosophical studies

Medicine & dentistry: Broadly-based programmes within medicine & dentistry; Pre-clinical medicine; Pre-clinical dentistry; Clinical medicine; Clinical dentistry; Others in medicine & dentistry

Mass communications & documentation: Broadly-based programmes within mass communications & documentation; Information services; Publicity studies; Media studies; Publishing; Journalism; Others in mass communications & documentation

Architecture, building & planning: Broadly-based programmes within architecture, building & planning; Architecture; Building; Landscape design; Planning (urban, rural & regional); Others in architecture, building & planning

Mathematical sciences: Broadly-based programmes within mathematical sciences; Mathematics; Operational research; Statistics; Others in mathematical sciences

Agriculture & related subjects
: Broadly-based programmes within agriculture & related subjects; Animal science; Agriculture; Forestry; Food & beverage studies; Agricultural sciences; Others in veterinary sciences, agriculture & related subjects

Combined

Veterinary science: Pre-clinical veterinary medicine; Clinical veterinary medicine & dentistry

(There were occasional changes to these lists throughout the years in our dataset, with these being the details of the 2008/09 data. Probably an unimportant detail, but worth bearing in mind.)

Tuesday 7 December 2010

Statistics - not always black and white

I was a little startled by the front page of the Guardian this morning. It featured an article claiming that David Lammy, MP for Tottenham, had uncovered shocking evidence of racism in the admissions procedures of Oxford and Cambridge - Britain's two most prestigious universities.

Some of the figures are certainly cause to raise an eyebrow - just one black Briton of Caribbean descent accepted by Oxford last year? One college hasn't admitted a black student in five years? Surely this is evidence of institutionalised racism at its worst! Or is it? That one black Briton of Caribbean descent was of just 35 applicants, and a spokeswoman for Oxford points out that "black students apply disproportionately for the most oversubscribed subjects". This is before you start thinking about how many people don't disclose their ethnicity (on all the forms you're sent when you apply), and so on.

Clearly this is somewhere where some better statistical thinking would help, but it does not seem to be forthcoming. There are plenty of points that can be dissected and discussed, but I'm just going to pick on one quote (posted on this blog) from the honourable member, which I think highlights the quality of the analysis:

"Why is it that 25 of 84 Black applicants received offers from Keble College but just 5 of 64 Black applicants received offers from Jesus College over the same 11 year period?"

On face value, this seems quite a big difference. 25 out of 84 is 30%, whilst 5 out of 64 is just 8%. Surely that's not down to chance? That is presumably what we're supposed to think, but let's dig a little deeper. The Guardian have published the admissions data for each college, and it's from these that these two figures come. A quick look reveals that Lammy has picked out the two colleges with the highest and lowest rate of admission for black applicants, and this is when alarm bells should start ringing.

The sharpshooter fallacy is a classic. An old Texan (not a great shot) fires at the broad side of a barn, and then draws a big target whose centre is where his biggest cluster of shots happened to land. He points to this as proof of his superb marksmanship. This is an essential aspect of statistics - if you don't decide what you're looking for before you collect your data, it's easy to find results that seem implausible. If Lammy had had some reason to chose Keble and Jesus before he'd looked at the data, then the difference he highlights might mean something, but as it is could it just be down to chance?

Fortunately, it's a pretty easy thing to check. Let's assume that a black applicant has the same chance of being admitted to any Oxford college. On average, 22% of black applicants over the last 11 years were admitted, so I'll use this as my baseline. I'm going to simulate new versions of the real dataset, where I take the (real) total number of black applicants to each college, and then see how many get accepted by giving everyone a 22% chance. If we do this, then on average the college with the highest success rate admits 35% of its black applicants, whilst the average lowest success rate is 11% - a spread of 24%. David Lammy highlights a difference of 22% between best and worst as indicative of... something, when in fact it's pretty much what you'd expect.

There are doubtless plenty of valid issues - some of which Lammy does try to raise - that these data could highlight, had they been analysed properly and not obfuscated by a cloud of sensationalism. Lammy says that "the variations between colleges in their admissions statistics is a pertinent point and Oxbridge should be doing more to find out why such variations exist". Perhaps if he'd employed a statistician he would be able to answer this one for himself.