With advancing technology, scientists no longer have to rely on capturing animals or gathering data manually in the field. Bioinformatics enables the analysis of a whole genome from a computer. Once the initial DNA sequencing has taken place, a lot of research can be conducted just from that data. For example, the DNA, mRNA or amino acid sequence between two individuals or species can be compared.
From this short sequence of amino acids in the haemoglobin of these different species we can infer several things. Let’s do humans and chimps first! How many differences are there? Lys, Glu, His, Iso and… Lys, Glu, His, Iso. Right. Absolutely no difference. Humans and gorillas have one difference, zebras and horses have one difference and zebras and humans have 3 differences!
We can infer a lot of different information from this table, and it’s just a very small sequence in just one protein looking at just five different species. The potential of investigating diversity with molecular biology tools is astounding.
DNA can be studied similarly, and a lot of creativity can be employed to come up with ways to twist and turn heaps of genetic data in such a way that interesting information can be pulled out. In this example, it’s a fairly straightforward, run of the mill comparison between the DNA sequence itself of a mouse gene versus a fly gene.
We can see that the sequence itself is 76.66% identical, while the protein product resulting from the exons only, is actually identical in its entirety at 100% between the two sequences (highlighted in green).
Where do you even begin to count the number of organisms in a field, for example? How can quantitative data be obtained for a rocky shore? It’s not feasible to assess every single individual plant, count all the crabs you can possibly find, or estimate the abundance of all types of grass along a shingle ridge. Even if that was possible, the data would still not apply to the make up of a population, for example, at different times.
The answer is to obtain a sample. If a sample of crabs on a shore was to be taken, where would you look? In the spot where you can already see 3 of them, or in the spot where there are none? Well, in order for the sample to be representative, you must not involve yourself, as the experimenter, in the process of deciding the sample locations. That would incur experimenter bias and would render you hard-earned precious data invalid.
Recording and analysing data
Depending on the size and type of organism, data can be collected in the form of numbers by counting the present organisms in each quadrat (frequency), or working out the percentage of area within a quadrat that a species occupies (percentage cover), then scale it up to the whole area investigated by multiplying.
Analysis is done by obtaining basic stats on the data: the mean and the standard deviation. The mean represents a gross overall picture of the data set. For example, the mean of 3, 4 and 2 is 3. The mean of 5, 3 and 7 is 5. The mean is obtained by adding all data together and diving by how many there are (3 + 4 + 2 = 9; 9/3 = 3).
In this example, the data set with the mean of 3 has on “average” lower values than the data set with the mean of 5. However, we have a value of 3 in both sets, so we can’t get enough of an accurate picture just by having the mean.
That’s why the standard deviation (SD) is worked out alongside it. The standard deviation represents how far away from the mean the data are. Its formula is a little bit convoluted and you don’t need to know it.
Sometimes the SD is included alongside the mean in text in brackets, as in 46.2 (+/- 4.2) which means the mean is 46.2 and the SD goes 4.2 points above the mean and 4.2 points below the mean. Most if not all the data will therefore be found in the 50.4 – 42.0 range.
In graphs, the mean usually works out close to the middle of the range of values, so is indicated there, while the SD can be drawn as a box around the mean, or as upper limit and lower limit bars either side of the mean.
In the graph, TGUG and ETGUG stand for “Timed Get Up and Go” and “Expanded Timed Get Up and Go” and they refer to mobility tests to assess general health for various purposes. The bars show the mean value for each subgroup in each test, while the T line represents the upper SD.
For example, the TGUG young mean time to get up and go is 7.5 seconds with a small SD of less than 1 second. The ETGUG at-risk mean on the other hand, clocks up at 34 seconds with a standard deviation of a whole 10 seconds!
A large SD indicates scattered data far away from the mean where the mean might be irrelevant or not even represented itself as one of the data values. A small SD indicated a data set with values close to the mean. Depending on what the data refers to, these different SD scenarios can make drawing conclusion from the data more or less difficult.
From this graph and the previous points mentioned, we can say that the young all performed very similarly to each other, and were the fastest of all the subgroups. The at-risk performed the worst, but also had the highest SD and hence variation in how well they performed. It could be that the at-risk population differs in its mobility for whatever reason. It is, however, still clearly separated from even the elderly group in its high time required to get up and go.