Navigate to Tools > Global Options > Appearance (or RStudio > Preferences > Appearance on macOS) and select a desired theme from the dropdown menu.
Load the tidyverse - a package that will give you lots of tools.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Load data (make sure you name it as an object with a useful name)
dat_kentucky <- read_csv("/Users/mariacuellar/Github/crim_data_analysis/data/kentucky-derby-2018.csv")
## Rows: 144 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Date, Winner
## dbl (7): Year, Year_no, Mins, Secs, Time.in.Sec, Distance (mi), Speed (mph)
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dat_students <- read_csv("/Users/mariacuellar/Github/crim_data_analysis/data/students.csv")
## Rows: 10497 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): name, gender, year_in_college, favorite_color
## dbl (2): age, grade
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
How to select an observation:
year1986 <- dat_kentucky %>% filter(Year==1986)
year1986
## # A tibble: 1 × 9
## Year Year_no Date Winner Mins Secs Time.in.Sec `Distance (mi)`
## <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1986 112 3-May-86 Ferdinand 2 2.8 123. 1.25
## # ℹ 1 more variable: `Speed (mph)` <dbl>
Note: understand the pipe operator (%>% or |>): The main function of the pipe operator is to take the output of the expression or function on its left-hand side (LHS) and pass it as the first argument to the function on its right-hand side (RHS). This allows you to chain multiple operations together in a clear, sequential manner.
dat_kentucky$Winner
## [1] "Aristides" "Vagrant" "Baden-Baden"
## [4] "Day Star" "Lord Murphy" "Fonso"
## [7] "Hindoo" "Apollo" "Leonatus"
## [10] "Buchanan" "Joe Cotton" "Ben Ali"
## [13] "Montrose" "Macbeth II" "Spokane"
## [16] "Riley" "Kingman" "Azra"
## [19] "Lookout" "Chant" "Halma"
## [22] "Ben Brush" "Typhoon II" "Plaudit"
## [25] "Manuel" "Lieut. Gibson" "His Eminence"
## [28] "Alan-a-Dale" "Judge Himes" "Elwood"
## [31] "Agile" "Sir Huon" "Pink Star"
## [34] "Stone Street" "Wintergreen" "Donau"
## [37] "Meridian" "Worth" "Donerail"
## [40] "Old Rosebud" "Regret" "George Smith"
## [43] "*Omar Khayyam" "Exterminator" "Sir Barton"
## [46] "Paul Jones" "Behave Yourself" "Morvich"
## [49] "Zev" "Black Gold" "Flying Ebony"
## [52] "Bubbling Over" "Whiskery" "Reigh Count"
## [55] "Clyde Van Dusen" "Gallant Fox" "Twenty Grand"
## [58] "Burgoo King" "Brokers Tip" "Cavalcade"
## [61] "Omaha" "Bold Venture" "War Admiral"
## [64] "Lawrin" "Johnstown" "Gallahadion"
## [67] "Whirlaway" "Shut Out" "Count Fleet"
## [70] "Pensive" "Hoop Jr." "Assault"
## [73] "Jet Pilot" "Citation" "Ponder"
## [76] "Middleground" "Count Turf" "Hill Gail"
## [79] "Dark Star" "Determine" "Swaps"
## [82] "Needles" "Iron Liege" "Tim Tam"
## [85] "*Tomy Lee" "Venetian Way" "Carry Back"
## [88] "Decidedly" "Chateaugay" "Northern Dancer"
## [91] "Lucky Debonair" "Kauai King" "Proud Clarion"
## [94] "Forward Pass**" "Majestic Prince" "Dust Commander"
## [97] "Canonero II" "Riva Ridge" "Secretariat"
## [100] "Cannonade" "Foolish Pleasure" "Bold Forbes"
## [103] "Seattle Slew" "Affirmed" "Spectacular Bid"
## [106] "Genuine Risk" "Pleasant Colony" "Gato Del Sol"
## [109] "SunnyÕs Halo" "Swale" "Spend a Buck"
## [112] "Ferdinand" "Alysheba" "Winning Colors"
## [115] "Sunday Silence" "Unbridled" "Strike the Gold"
## [118] "Lil E. Tee" "Sea Hero" "Go for Gin"
## [121] "Thunder Gulch" "Grindstone" "Silver Charm"
## [124] "Real Quiet" "Charismatic" "Fusaichi Pegasus"
## [127] "Monarchos" "War Emblem" "Funny Cide"
## [130] "Smarty Jones" "Giacomo" "Barbaro"
## [133] "Street Sense" "Big Brown" "Mine That Bird"
## [136] "Super Saver" "Animal Kingdom" "I'll Have Another"
## [139] "Orb" "California Chrome2" "American Pharoah"
## [142] "Nyquist" "Always Dreaming" "Justify"
thewinner <- dat_kentucky %>% select(Winner)
thewinner
## # A tibble: 144 × 1
## Winner
## <chr>
## 1 Aristides
## 2 Vagrant
## 3 Baden-Baden
## 4 Day Star
## 5 Lord Murphy
## 6 Fonso
## 7 Hindoo
## 8 Apollo
## 9 Leonatus
## 10 Buchanan
## # ℹ 134 more rows
How to select both (observation and variable):
thewinnerin1986 <- dat_kentucky %>%
filter(Year==1986) %>%
select(Winner)
thewinnerin1986
## # A tibble: 1 × 1
## Winner
## <chr>
## 1 Ferdinand
Variables in dataset? can look at Data pane, or use names()
names(dat_kentucky)
## [1] "Year" "Year_no" "Date" "Winner"
## [5] "Mins" "Secs" "Time.in.Sec" "Distance (mi)"
## [9] "Speed (mph)"
Type of R variable: can look at the Data pane, or use class()
class(dat_kentucky$Secs)
## [1] "numeric"
Look at dimensions of data: can look at the Data pane, or use dim()
dim(dat_kentucky)
## [1] 144 9
Note: the commands let me see what you did.
How to make a table to summarize a categorical variable:
names(dat_students)
## [1] "name" "gender" "age" "year_in_college"
## [5] "favorite_color" "grade"
table(dat_students$year_in_college) # Base R
##
## First year Fourth year Second year Third year
## 2129 2108 3144 3116
dat_students %>% count(year_in_college) # tidyverse
## # A tibble: 4 × 2
## year_in_college n
## <chr> <int>
## 1 First year 2129
## 2 Fourth year 2108
## 3 Second year 3144
## 4 Third year 3116
How to make add proportions:
dat_students %>%
count(year_in_college) %>%
mutate(prop = n / sum(n))
## # A tibble: 4 × 3
## year_in_college n prop
## <chr> <int> <dbl>
## 1 First year 2129 0.203
## 2 Fourth year 2108 0.201
## 3 Second year 3144 0.300
## 4 Third year 3116 0.297
How to make a table to summarize Two categorical variables:
table(dat_students$favorite_color, dat_students$year_in_college) # It's trickier in tidyverse, but check out the janitor package
##
## First year Fourth year Second year Third year
## blue 214 205 301 274
## green 417 413 616 658
## red 1498 1490 2227 2184
How to draw a histogram:
dat_kentucky %>% ggplot(aes(x=Time.in.Sec)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
dat_kentucky %>% ggplot(aes(x=Time.in.Sec)) + geom_histogram(bins=60) # Changes number of bins
dat_kentucky_lowtime <- dat_kentucky %>% filter(Time.in.Sec<140)
dat_kentucky_lowtime %>% ggplot(aes(x=Time.in.Sec)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
dat_kentucky %>%
filter(Time.in.Sec < 140) %>%
ggplot(aes(x=Time.in.Sec)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
You can save the plot as an object
p <- dat_kentucky %>% ggplot(aes(x=Time.in.Sec)) + geom_histogram()
library(ggthemes)
p + theme_economist()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
p + labs(title="Histogram") + theme_economist()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
How to change the aesthetics and labels
dat_kentucky %>% ggplot(aes(x=Time.in.Sec)) + geom_histogram(bins = 30, fill = "steelblue", color = "white")
How to change the aesthetics and labels
dat_kentucky %>% ggplot(aes(x=Time.in.Sec)) + geom_histogram() +
labs(
x = "Time in seconds", # new label for x-axis
y = "Count", # new label for y-axis
title = "Histogram of Time in Seconds"
)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This is equivalent to
p +
labs(
x = "Time in seconds", # new label for x-axis
y = "Count", # new label for y-axis
title = "Histogram of Time in Seconds"
)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
You can change the theme too
p + theme_economist()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
p + theme_stata()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
p + theme_fivethirtyeight()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
p + theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
How to draw a barplot for YearInCollege
# using base R, it's just barplot()
barplot(table(dat_students$year_in_college))
dat_students %>% ggplot(aes(x = year_in_college)) + geom_bar()