Using R for Introductory Statistics

Data

Quickly entering in small data sets is the c function:
>typos = c(2,3,0,3,1,0,0,1)
>typos
[1] 2 3 0 3 1 0 0 1

Note:

  1. The assignment operator is a = and it can be a <-
  2. [1] indicates the the value is a vector

mean function find the mean or average of the data:
> mean(typos)
[1] 1.25

median function find the median of the data:
>median(typos)
[1] 1

var function find the simple variance of the data:
> var(typos)
[1] 1.642857

Data are stored in R as vectors:
> typos.draft1 = c(2,3,0,3,1,0,0,1)
> typos.draft2 = c(0,3,0,3,1,0,0,1)
> typos.draft2 = typos.draft1 #make a copy
> typos.draft2[1] = 0 #assigne the first page 0 typos
> typos.draft[2] #print 2nd pages' value
> typos.draft[-4] #print all but 4th page
> typos.draft[c(1,2,3)] #fancy, print 1st, 2nd and 3rd


Node:

  1. the period is only used as punctuation
  2. you can’t use an _ (underscore) to punctuate
  3. # is used to make comments
  4. parenteses () are for functions and square brackets [] are for vectors, arrays and lists
  5. the last ex is very important. You can take mode than one value at a time by using another vector of index numbers. This is called slicing

max function find the max of the data:
> max(typos.draft2)
[1] 3

== test the data:
> typos.draft2 == 3 #where are they?
[1] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE

which test the data and get the indices:
> which(typos.draft2 == 3)
[1] 2 4


or
> n = length(typos.draft2)
> pages = 1:n
> pages
[1] 1 2 3 4 5 6 7 8
> pages[typos.draft2==3]
[1] 2 4

seq is a more general function to produce sequences than ::
> seq(1,n,1)
[1] 1 2 3 4 5 6 7 8

sum and subtracta vector:
> sum(typos.draft2)
[1] 8
> sum(typos.draft2>0)
[1] 4
> typos.draft1-typos.draft2
[1] 2 0 0 0 0 0 0 0

Manipulate a vector:
> x = c(45,43,46,48,51,46,50,47,46,45)
> x = c(x,48,49,51,50,49)
> x
[1] 45 43 46 48 51 46 50 47 46 45 48 49 51 50 49
> x[16] = 41
> x
[1] 45 43 46 48 51 46 50 47 46 45 48 49 51 50 49 41
> x[17:20] = c(40,38,35,40)
> x
[1] 45 43 46 48 51 46 50 47 46 45 48 49 51 50 49 41 40 38 35 40

Edit data using a spreadsheet interface:

> data.entry(x) #pops up spreadsheet to edit data
> x = de(x) #same only, doesn't save changes
> x = edit(x) #uses editor to ediz x

The variable x needs to be defined previously

running maximum and minimun of a set of data

> cummax(x)
[1] 45 45 46 48 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51
> cummin(x)
[1] 45 43 43 43 43 43 43 43 43 43 43 43 43 43 43 41 40 38 35 35

define new functions

> std = function(x) sqrt(var(x))
> std(x)
[1] 4.558393

standard deviation built-in command

> sd(x)
[1] 4.558393

difference between near elements in data

> diff(x)
[1] -2 3 2 3 -5 4 -3 -1 -1 3 1 2 -1 -1 -8 -1 -2 -3 5

rangeretrieve the max and min elements in data

> range(x)
[1] 35 51
> diff(range(x))
[1] 16

Univariate Data

Categorical Data

tableallows to look at tables

> x = c("Yes","No","No","Yes","Yes")
> table(x)
x
No Yes
2 3

factorsallows to look at tables

> x = c("Yes","No","No","Yes","Yes")
> table(x)
x
No Yes
2 3

Numerical Data

This entry was posted in Ph.D., Programming, tips and tricks and tagged , , , , . Bookmark the permalink.

Leave a Reply