Learn R – Part 4

parvej
11/04/2025
5 min read

1. Basic Statistical Functions

R provides built-in functions for statistical analysis:

summary(): Summary statistics (min, max, quartiles, mean).

sum(): Total of values.

range(): Minimum and maximum.

var(): Variance.

sd(): Standard deviation.

Demo Data

# Basic dataset  
data_basic <- c(2, 4, 6, 8, 10)  
# Advanced dataset (mtcars)  
data(mtcars)  

mpg <- mtcars$mpg

Practice

# BASIC TASKS  
# HW1: Calculate the sum of data_basic  
# HW2: Find the range (min and max) of data_basic  
# HW3: Compute the variance of data_basic  

# ADVANCED TASKS  
# HW4: Calculate the standard deviation of mtcars$mpg  

# HW5: Generate a summary of mtcars$hp (horsepower)

Solution

# BASIC SOLUTIONS  
sum_basic <- sum(data_basic)  
range_basic <- range(data_basic)  
var_basic <- var(data_basic)  

# ADVANCED SOLUTIONS  
sd_mpg <- sd(mpg)  
summary_hp <- summary(mtcars$hp)

2. Mean, Median, Mode

Mean : Average (mean()).

Median : Middle value (median()).

Mode : Most frequent value (no built-in function—custom code required).

Demo Data

# Basic dataset  
data_numbers <- c(1, 2, 2, 3, 4, 5, 5, 5)  
# Advanced dataset (iris)  
data(iris)  
sepal_length <- iris$Sepal.Length

Practice

# BASIC TASKS  
# HW1: Calculate the mean of data_numbers  
# HW2: Find the median of data_numbers  
# HW3: Write a function to compute the mode  

# ADVANCED TASKS  
# HW4: Compute the mean of iris$Sepal.Length  

# HW5: Find the median of iris$Petal.Length grouped by Species

Solution

# BASIC SOLUTIONS  
mean_val <- mean(data_numbers)  
median_val <- median(data_numbers)  
mode_func <- function(x) {  
  ux <- unique(x)  
  ux[which.max(tabulate(match(x, ux)))]  
}  
mode_val <- mode_func(data_numbers)  

# ADVANCED SOLUTIONS  
mean_sepal <- mean(sepal_length)  
median_petal <- aggregate(Petal.Length ~ Species, iris, median)

3. Max, Min, and Percentiles

max()/min(): Extreme values.

quantile(): Percentiles (e.g., 25th, 50th).

IQR(): Interquartile range.

Demo Data

# Basic dataset  
data_scores <- c(45, 67, 89, 34, 56, 78, 90, 23)  
# Advanced dataset (airquality)  
data(airquality)  

temp <- airquality$Temp

Practice

# BASIC TASKS  
# HW1: Find the max and min of data_scores  
# HW2: Calculate the 75th percentile of data_scores  
# HW3: Compute the IQR of data_scores  

# ADVANCED TASKS  
# HW4: Find the 90th percentile of airquality$Temp  

# HW5: Identify outliers in airquality$Ozone using IQR

Solution

# BASIC SOLUTIONS  
max_score <- max(data_scores)  
min_score <- min(data_scores)  
percentile_75 <- quantile(data_scores, 0.75)  
iqr_score <- IQR(data_scores)  

# ADVANCED SOLUTIONS  
percentile_90 <- quantile(temp, 0.90)  
# Outlier detection (IQR method)  
q1 <- quantile(airquality$Ozone, 0.25)  
q3 <- quantile(airquality$Ozone, 0.75)  
iqr <- IQR(airquality$Ozone)  
outliers <- airquality$Ozone[airquality$Ozone < (q1 - 1.5*iqr) | airquality$Ozone > (q3 + 1.5*iqr)]

4. Hypothesis Testing (T-Test/ANOVA)

Perform t-tests (t.test()), ANOVA (aov()), and chi-square tests (chisq.test()) to compare groups.

Demo Data

# Create sample data  
group_a <- c(20, 22, 19, 18, 24)  

group_b <- c(25, 24, 22, 23, 20)

Practice

# HW1: Perform an independent t-test between group_a and group_b  

# HW2: Run a one-way ANOVA on `mtcars` to compare `mpg` across cylinder groups

Solution

# HW1  
t.test(group_a, group_b)  
# HW2  
cyl_groups <- split(mtcars$mpg, mtcars$cyl)  
anova_result <- aov(mpg ~ factor(cyl), data=mtcars)  
summary(anova_result)

5. Regression Analysis (Linear Regression)

Fit linear (lm()) and logistic regression models. Use summary() to interpret coefficients and p-values.

Demo Data

# Use `mtcars` for linear regression

Practice

# HW1: Fit a linear model predicting `mpg` from `wt` and `hp`  

# HW2: Check the R-squared value of the model

Solution

# HW1  
model <- lm(mpg ~ wt + hp, data=mtcars)  
# HW2  
summary(model)$r.squared

6. Data Transformation

Recode variables with dplyr::mutate() and case_when(). Create new variables using arithmetic/logical operations.

Demo Data

# Create sample data  
df <- data.frame(  
  age = c(18, 25, 30, 35, 40),  
  income = c(50000, 60000, 75000, 90000, 120000) 
 
)

Practice

# HW1: Recode `age` into categories: "<25", "25-35", ">35"  

# HW2: Create a new variable `income_group` (Low: <70k, High: >=70k)

Solution

# HW1  
library(dplyr)  
df <- df %>%  
  mutate(age_group = case_when(  
    age < 25 ~ "<25",  
    age >= 25 & age <= 35 ~ "25-35",  
    age > 35 ~ ">35"  
  ))  
# HW2  
df <- df %>%  
  mutate(income_group = ifelse(income >= 70000, "High", "Low"))

7. Exporting Results

Export tables and plots using write.csv(), stargazer, or flextable.

Practice

# HW1: Save `mtcars` summary to a CSV  

# HW2: Export a ggplot to PNG

Solution

# HW1  
write.csv(summary(mtcars), "mtcars_summary.csv")  
# HW2  
ggsave("plot.png", plot=last_plot())

8. Additional Statistical Concepts

Skewness : Measure of asymmetry (moments package).

Kurtosis : Tailedness of the distribution (moments package).

Covariance : cov().

Correlation : cor().

Demo Data

# Advanced dataset (cars)  
data(cars)  
speed <- cars$speed  

dist <- cars$dist

Practice

# HW1: Calculate covariance between speed and distance  
# HW2: Compute correlation between speed and distance  

# HW3: Install the `moments` package and calculate skewness of speed

Solution

# HW1  
covariance <- cov(speed, dist)  
# HW2  
correlation <- cor(speed, dist)  
# HW3  
library(moments)  
skewness_speed <- skewness(speed)

9. Handling Missing Values

Use na.rm = TRUE to ignore NA values in calculations.

Demo Data

data_missing <- c(1, 2, NA, 4, 5)

Practice

# HW1: Calculate the mean of data_missing (ignore NA)  

# HW2: Check if data_missing contains any NA values

Solution

mean_missing <- mean(data_missing, na.rm = TRUE)  
has_na <- anyNA(data_missing)

Learn R – Part 4

1. Basic Statistical Functions

Demo Data

Practice

Solution

2. Mean, Median, Mode

Demo Data

Practice

Solution

3. Max, Min, and Percentiles

Demo Data

Practice

Solution

4. Hypothesis Testing (T-Test/ANOVA)

Demo Data

Practice

Solution

5. Regression Analysis (Linear Regression)

Demo Data

Practice

Solution

6. Data Transformation

Demo Data

Practice

Solution

7. Exporting Results

Practice

Solution

8. Additional Statistical Concepts

Demo Data

Practice

Solution

9. Handling Missing Values

Demo Data

Practice

Solution

Leave a Reply Cancel reply