Monday, March 27, 2017

Thoughts on an predator-prey active learning exercise

Today was a very exciting day for me in my BI111 class, as I got to try out a role-playing game that I have been thinking / planning for a long while. At this point in the semester, my course (Biological Diversity & Evolution) has reached the topic of community ecology. One of the phenomenon I talk about this week is how predators and prey can become linked into cyclical patterns. A well-known example of this involves the  Candian Lynx, Lynx canadensis and its prey, the snowshoe hare, Lepus americanus. I've been wanting to try and let my students seem how this pattern, arises, so I devised an exercise that could be done in my two sections for BI111.
For context, this class has ~400 students per section, and I run i out of a lecture hall that holds 450 students (see pictures below)




Here is the set of rules I came up with, and shared with the students before today's lecture.

Now before you get too worried, I chose soft foam practice golf balls (I bought 4 sets from amazon) for use by the "lynx".

--------------------
OK. So, flash forward to the exercise in question. Overall. I felt it went well, but there are a few tweaks that I'm going to consider for next year's implementation.
1. Lynx - it was perhaps too easy for the lynx to survive and reproduce. In my 10:30 class, I followed the original plan, and it it didn't take long for the lynx population so rapidly rise. Things were a bit better in the second round, where I changed the rule to 4 hits =survival, 5=1 offspring and 6 hits=2 offspring. Lynx numbers still did increase pretty high, but it took a bit longer (I'll upload some scans of the data sheets later).
2. Hares. It became clear early on that the hares that survived needed bigger litters. Perhaps it was the orientation of the room, or the skill of the lynx at throwing, but predation success was greater than I anticipated. I tired out 3 offspring per litter. That seemed to work.
3.Time. I had expected that 5-6 rounds of predation + instructions preamble would take ~20 minutes (to give time for cycles to become apparent). It ended up taking about 30. Perhaps reducing the number of balls/lynx/round might speed things up.
4. Loundness. This was loud exercise. I knew it would, but - wow! Lots of excitement from the students (good), but hard to keep focus on the exercise. I'll need to think about what can be done.

That's it for now,  but I'll update this post when I get the student feedback on the exercise.
TL


Thursday, May 21, 2015

You are never too young (or to old) to be thinking about data visualization!

Over the last couple of weeks, I have been working with my son on his first science fair project. It has been lots of fun, working with him to develop a question, a prediction, and design a meaningful experiment. Collecting the data was also great - as you can (hopefully) see in the pictures below, we were investigating how rubber ball bounced at different temperatures. When it came time to "writing" up his results, we decided that the best approach would be to plot all his data (using stickers to represent the heights of the first bounce, of balls dropped from a height of 2m). While that may seem obvious, for many scientists, there is a great adversion to plotting raw data. It would be far more likely to see a "professional" version of this experiment present results using bar plots of mean values (and either SE, SD or 95%CI error bars). This is very unfortunate, as bar plots forsake a great deal of useful visual information about the distribution of the data. By coincidence, this was also (one of) the take-home message(s) of a recent paper:

Weissgerber, T. L., Milic, N. M., Winham, S. J., & Garovic, V. D. (2015). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biology: e1002128.


that we has chosen to read in this week's Long Lab journal club. Overall, I thought the authors did a commendable job, and it is evident from the paper's metadata that their message is reaching a large audience. While I am in favour of anything that turns the tide against bar plots, I do wish they would have given boxplots as much publicity as the univariate scatterplots that were heavily featured in the manuscript. I suspect that as the sample sizes in the literature they were surveying (physiology) tended to have small sample sizes  According to the authors "the minimum and maximum sample sizes for any group show in a figure ... were 4... and 10 respectively". These results are presented in panel C of supplemental figure S2*


I have nothing against univariate scatterplots. In fact, for small sample sizes (say <30 elements/group), directly plotting data reveals a great deal about the distribution of the data. However, after a certain point the usefulness of this approach starts to wain, as there will be more overlap in points. In such cases, a box-plot is a more desirable solution. Not only is as aesthetic, but is also clearly indicates meaningful visual information to the reader about the centrality, the skew and the distribution of the data. *I suspect that is why, when Weissgerber et al. presented their data of their hundreds of figures, they did so using a box-plot.

"let me tell you about the wonders of data visualization"

Tuesday, April 14, 2015

Statistically, it's mayhem*

Below is a letter to the editor I recently wrote in response to the potential flaws in the analysis that forms the basis of the Waterloo Region Record's recent article "Police call records reveal region's trouble hot spots" which can be read here ->http://goo.gl/DDQEg0

One of the first things that I emphasize to the students in my biostatistics class at Wilfrid Laurier University is that statistics are a powerful tool. Used carefully and properly, statistics can provide valuable insight into the factors that shape the world around us - but used or interpreted incorrectly, statistics can potentially lead to conclusions that are unjustified or altogether incorrect​. Your recent "analysis" of police call data seems to fall into the latter category due to problems with your data set, and in the conclusions drawn from them.

First, let's consider your data set. Of the ~903,000 calls in your initial data set almost half were excluded from the analysis for a variety of reasons. Whenever data is dropped, there is the strong possibility that what remains is a non-random (and thus biased) set of data. Furthermore, the remaining data points "do not measure crime" (as belatedly stated in the 30th inch of the story) -but instead capture a wide variety of incidents (including "enforcement of traffic laws" and "attend at collisions" that are not necessarily linked to the residents of that region). It should go without saying that if your data does not contain variables are relevant to the question, then the conclusions drawn from them will be suspect. 

Using this questionable data set, the conclusion "the poorer the zone, the more often people call police and the more time police spend there, responding to distress" is drawn, without any thought of potentially confounding effects. There are potentially dozens of other factors besides average household income that differ between the patrol zones that may be ultimately responsible for the observed patterns. For instance, a cursory search on Google Maps seems to indicate that the regions with the highest frequencies of calls to the police also have a greater density of Tim Hortons locations - but you would not (hopefully) conclude that their presence is responsible for "where trouble lives". 

Generations of statisticians have warned that "correlation does not imply causation", but that message seems to have been ignored in the construction of this article, to the detriment of your readership. 

Sincerely,

Tristan A.F. Long

*The title for this post is taken from one of the hyperbolic statements made in the article. I think that, ironically, this statement is an apt description of the statistics used in the analysis.


Sunday, September 28, 2014

Long Lab (summer fun edition)

L->R: Arnold Mak, David Filice, Katie Plagens, Tristan Long, Thiropa Balasubramaniam, Mireille Golemiec, Emily Martin

BONUS: GIF!

How to (truly) randomly assign mating treatments - an elegant R approach

This week in the lab we'll be setting up some experiments using our 45 inbred lines of flies (lines were inbred by mating sets of single male an females, derived from the IV line, and subsequently selecting a single brother and sister from the resulting offspring to found the next generation. This process was repeated for >10 generations).

In the experiment we want to randomly pair males and females from different lines, which in R seems pretty simple, as you can use the code

inbred.lines <-c(1:45)
random.mates<- sample(inbred.lines, 45, replace = FALSE)
combos <-cbind(inbred.lines, random.mates)

Which (most of the time) will end up with randomly pairing males and females from different lines....
...However, as this is a random process, there is a chance that R might by chance choose pairs of males and females from the same line. This may not seem a big issue, as you could use a simple logical argument such as

inbred.lines==random.mates

to make sure that there were no matches, and re-run the random sampling if any TRUE values returned by the last command until all pairs were different.

This "brute force" approach is OK, I guess, but becomes much less efficient if we want to place two (or more) males, each selected from a different line in with females. Now we might get a TRUE value if we had a match between the female and "male 1" ,  a match between the female and "male 2", or  a match between "male 2" and "male 1". You can imagine how this problem can get more difficult as the number of combinations increases with the number of males and females in each vial (see Handshake Problem).

So, as I could not find a solution online, I have developed some R code to quickly and elegantly solve this problem.
Let's begin as above

inbred.lines <-c(1:45)

Here is my solution

for (i in 1:length(inbred.lines)){treatmentBB<-sample(inbred.lines,45)}
if (treatmentBB[i]==inbred.lines[i]) {treatmentBB<-sample(inbred.lines,45)} else{treatmentBB}

As you can see in this code, I am creating a new column "treatmentBB" that I am populating with 45 randomly sampled numbers (with no replacement) from the inbred.lines vector. The next step is to ask R to check if any of the rows match. If they do, then we ask R to start all over again, but if there are no matches, then to leave treatment BB alone.

Now let's expand this to see if we wanted to add a second (random male) into each treatment, by creating a column of values called treatmentEI. As you can see below, I have taken into account potential matches between "females and males from treatmentBB", "females and males from treatmentE1", and "males from treatment E1 and males from treatmentBB"

for (i in 1:length(inbred.lines)){treatmentE1<-sample(inbred.lines,45)}
if (treatmentE1[i]==inbred.lines[i]|treatmentE1[i]==treatmentBB[i]) {treatmentE1<-sample(inbred.lines,45)} else{treatmentE1}
Created by Pretty R at inside-R.org

and If I wanted to add a third, I would need to make sure that all possible matches are accounted for... 

for (i in 1:length(family)){treatmentE2<-sample(family,45) if (treatmentE2[i]==family[i]|treatmentE2[i]==treatmentBB[i]|treatmentE2[i]==treatmentE1[i]) {treatmentE2<-sample(family,45)} else{treatmentE2}}

Hope you find this useful!
TL


Wednesday, June 25, 2014

Setting up for a hemiclone assay




 Several of the assays we are conducting in the lab this summer are examining the genetic basis of female mating behaviours. There are many ways to do these assays, but (in my opinion) one of the best ways to measure ofstanding genetic is using hemiclonal analysis. Both Mireille and Arnold (above) have been learning the ropes of this technique, and this week set up the first (of many) assays.

The first step involves mating clone males (which carry a randomly-sampled haploid genome, and set of translocated autosomal chromosomes) with many wild-type virgin female collected from one of our outbred populations. These mated flies are placed into half-pint collectors outfiltted with 35mm (diameter) petrie dishes containing a grape-juice/agar media for no more than 18h.

During that time, eggs are laid on the surface (see above).
As fruit fly development can be strongly influenced by variation in larval density, it is essential that our assay vials are standardized. This is done by counting exact numbers of eggs (under the microscope, using a paintbrush)...
...which are then gently cut from the surface of the juice "cookie"....
...and transferred to vials containing ~10ml of our lab's standard banana Drosophila media!
Due to the nature of the cross between the clone males and the wild-type females, there is a 50% mortality (due to chromosomal imbalances). Thus, for every 100 eggs we transfer into the vial, only 50 will hatch into larvae. To further establish standard developmental conditions in the vials, we add an additional 50 eggs from one of our other populations that carry a recessive-brown-eyed marker, yielding a total of ~100 larvae per vial (which mimics the typical developmental conditions experienced in the our lab populations).

Now that our vials are set up, we'll incubate them, and then collect hemiclonal (red-eyed) females as they eclose starting 9d later. Check back for updates!

Sunday, June 22, 2014

More great news!

I'm a little slow to post this, but last week was also Spring Convocation here at Laurier. It was great to see all the lab alumni (Ashley Guncay, Leah DeJong and Conor Delar) walk across the stage. After the ceremony, I had the honour of presenting Leah DeJong with the 2013-2014 Rick Playle Award for Best Thesis! Congratulations Leah!
Incidentally, Leah is the third student from the lab to win this award (following Erin Sonser in 2012-2013, and Jessie Young in 2011-2012)!