Extra exercises

Extra exercises#

Although the previous tutorials contained several exercises, it doesn’t hurt to practice some more! The following exercises will test your Python (and Pandas/Matplotlib) skills. Note that it does not contain any exercises on Numpy because this is not directly relevant to the Python/PsychoPy course.

Basic Python#

The following ToDos help you practice with basic Python syntax and operations.

ToDo: Add five to the value in the variable x below and then raise the result to the third power. Store the result in a variable named y.

""" Tests the above ToDo. """
assert(y == 3.375)
print("Well done!")

Well done!

ToDo: From the list below (my_list), extract every element at an odd index (at index 1, at index 3, at index 5, etc.) and store it in a new variable named my_list_index_odd. Note: you don't have to use a for-loop for this (but you can, if you want). Remember: Python is zero-indexed (i.e., the first element is at index 0).

""" Tests the above ToDo. """
assert(my_list_index_odd == ['b', 'd', 'f', 'h'])
print("Well done!")

Well done!

ToDo: From the list below (my_list2), extract the strings with five or more letters and store the result (another list) in a variable named my_list_filtered.

""" Tests the above ToDo. """
assert(my_list_filtered == ['hello', 'student', 'ToDo!'])
print("Well done!")

Well done!

ToDo: From the list below (my_list3, extract all even numbers (e.g., 0, 2, 100, 52) and store the result (another list) in a variable named my_list_even. Note: this ToDo is testing your google skills!

""" Tests the above ToDo. """
assert(sorted(my_list_even) == [-4, 0, 8, 1242, 2386, 5820])
print("Well done!")

Well done!

ToDo: The list below (graded_per_day) contains integers that represent the number of essays I graded in the past week (Monday-Friday). Compute the percentage of essays I graded for each day and store the result (another list) in a new variable with the name percentage_per_day. Hint: to get a percetage, you divide a number by the sum of the collection and then multiply it by 100.

""" Tests the above ToDo. """
assert([float(p) for p in percentage_per_day] == [40.0, 20.0, 16.0, 24.0])
print("Well done!")

Well done!

ToDo (difficult!): The covariance between two variables ($x$, $y$) of length $n$ is computed as follows:

(7)#\[\begin{align} \mathrm{cov}_{xy} = \frac{\Sigma_{i}^{n} (x_{i} - \bar{x})(y_{i} - \bar{y})}{n-1} \end{align}\]

Compute the covariance (a single float!) between the two lists (z1 and z2) below and store it in a variable with the name cov_z1z2. You may use the order of steps below:

Compute the mean of the two lists
Create two new lists in which each value has been subtracted with the list’s mean
Write a for-loop (or list comprehension) that multiplies each value in one list with the corresponding value in the other list (resulting in another list)
Sum the results from the previous step
Divide the result by the length of the list minus 1

""" Tests the above ToDo. """
import numpy as np
assert(round(cov_z1z2, 3) == round(np.cov(z1, z2)[0, 1], 3))
print("Well done!")

Well done!

Matplotlib#

Two Matplotlib exercises (both relatively difficult).

ToDo: Below, we simulate the resting heart rate (rhr, in beats per minute) for 100 people, who are either male ('M') or female ('F') in this example (stored in the gend variable). Using Matplotlib, plot the histogram of the resting heart rate of males and females separately, but in the same plot! Make sure that the transparency of the bars of the histograms are set to 50% (google this!). Add a legend and sensible axis labels.

""" Tests the above ToDo. """
# Your plot should look like the one below!
from IPython.display import Image
Image('rhr_by_gender.png')

../../_images/a81b938e8f1636d844d42fe57499bb022faf99256c77fbbd92046184ed481c6c.png

ToDo: Create a bar graph with the average heart rate for three groups:

People with a RHR < 58 ("athletic")
People with a 58 ≤ RHR ≤ 65("average")
People with a RHR > 65 ("sedentary")

Do this separately for men (“M”) and women (“F”), such that there are six bars (athletic/men, athletic/women, average/men, average/women, sedentary/men, sedentary/women).

""" Tests the above ToDo. """
# Your plot should look like the one below!
from IPython.display import Image
Image('rhr_stratified.png')

../../_images/a0b07c457c3caadfd7e9861dd646e6c62839b48e302b42b07261e790decbce56.png

Pandas#

Some Pandas exercises.

import random
import pandas as pd

n = 30
df = pd.DataFrame({
    'participant_id': ['sub-' + str(i).zfill(2) for i in range(1, n + 1)],
    'gender': [random.choice(['M', 'F']) for _ in range(n)],
    'condition': ['A', 'B', 'C'] * (n // 3),
    'prop_correct': np.random.uniform(0.4, 1, size=n),
    'mean_rt': np.random.normal(200, 20, n)
})
df.iloc[np.random.choice(range(n), size=2), -1] = np.nan

ToDo: Remove all participants with missing reaction time data ("mean_rt" column) and store it in a new variable named df_clean.

""" Tests the above ToDo. """
assert(df_clean.shape[0] == df.shape[0] - 2)
assert(df_clean.shape[1] == df.shape[1])

print("Well done.")

Well done.

Let’s delete the original df for now:

del df

ToDo: Select from the df_clean dataframe the subset of male participants and save them in a new dataframe with the name df_m.

""" Tests the above ToDo. """
assert(all(df_m.loc[:, 'gender'] == 'M'))
print("Well done!")

Well done!

ToDo: Select from the df_clean dataframe the subset of participants who have more than 65% correct ("prop_correct") and a mean reaction time smaller than 100 ms. Store the result in a new variable named df_select.

""" Tests the above ToDo. """
assert(all(df_select.loc[:, 'mean_rt'] < 200))
assert(all(df_select.loc[:, 'prop_correct'] > 0.65))

print("Well done!")

Well done!

ToDo: Add a new column to the dataframe, "status", which contains either the string "above_average" or "below_average" depending on whether the participant has a "prop_correct" score higher than the average or lower than the average, respectively. This column should be added to the df_clean dataframe. Note: if you get a "SettingWithCopyWarning", you may ignore this.

""" Tests the above ToDo. """
assert(all(df_clean.query("status == 'below_average'")['prop_correct'] < df_clean['prop_correct'].mean()))
assert(all(df_clean.query("status == 'above_average'")['prop_correct'] > df_clean['prop_correct'].mean()))
print("Well done!")

Well done!

ToDo: Using Matplotlib, create a scatterplot with the variables "prop_correct" (on the x-axis) and "mean_rt" (on the y-axis).

ToDo: Compute the correlation between proportion correct and mean RT (using, e.g., he pearsonr function from the scipy.stats module) for each group ("A", "B", and "C") separately. Store these three floats (i.e., the correlations) in a list with the variable name corrs_conditions. Hint: take a good look at what the pearsonr function returns exactly!

Note that, in real life, doing such “subgroup” analyses is probably a bad idea ;-)

""" Tests the above ToDo. Don't use this implementation ;-)"""
ans = df_clean.groupby('condition')[['mean_rt', 'prop_correct']].corr().iloc[1::2, 0].tolist()
np.testing.assert_array_almost_equal(ans, corrs_conditions, decimal=4)
print("Well done!")

Well done!

Extra exercises

Contents

Extra exercises#

Basic Python#

Matplotlib#

Pandas#