Data
Quickly entering in small data sets is the c function:
>typos = c(2,3,0,3,1,0,0,1)
>typos
[1] 2 3 0 3 1 0 0 1
Note:
- The assignment operator is a = and it can be a <-
- [1] indicates the the value is a vector
mean function find the mean or average of the data:
> mean(typos)
[1] 1.25
median function find the median of the data:
>median(typos)
[1] 1
var function find the simple variance of the data:
> var(typos)
[1] 1.642857
Data are stored in R as vectors:
> typos.draft1 = c(2,3,0,3,1,0,0,1)
> typos.draft2 = c(0,3,0,3,1,0,0,1)
> typos.draft2 = typos.draft1 #make a copy
> typos.draft2[1] = 0 #assigne the first page 0 typos
> typos.draft[2] #print 2nd pages' value
> typos.draft[-4] #print all but 4th page
> typos.draft[c(1,2,3)] #fancy, print 1st, 2nd and 3rd
Node:
- the period is only used as punctuation
- you can’t use an _ (underscore) to punctuate
- # is used to make comments
- parenteses () are for functions and square brackets [] are for vectors, arrays and lists
- the last ex is very important. You can take mode than one value at a time by using another vector of index numbers. This is called slicing
max function find the max of the data:
> max(typos.draft2)
[1] 3
== test the data:
> typos.draft2 == 3 #where are they?
[1] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
which test the data and get the indices:
> which(typos.draft2 == 3)
[1] 2 4
or
> n = length(typos.draft2)
> pages = 1:n
> pages
[1] 1 2 3 4 5 6 7 8
> pages[typos.draft2==3]
[1] 2 4
seq is a more general function to produce sequences than ::
> seq(1,n,1)
[1] 1 2 3 4 5 6 7 8
sum and subtracta vector:
> sum(typos.draft2)
[1] 8
> sum(typos.draft2>0)
[1] 4
> typos.draft1-typos.draft2
[1] 2 0 0 0 0 0 0 0
Manipulate a vector:
> x = c(45,43,46,48,51,46,50,47,46,45)
> x = c(x,48,49,51,50,49)
> x
[1] 45 43 46 48 51 46 50 47 46 45 48 49 51 50 49
> x[16] = 41
> x
[1] 45 43 46 48 51 46 50 47 46 45 48 49 51 50 49 41
> x[17:20] = c(40,38,35,40)
> x
[1] 45 43 46 48 51 46 50 47 46 45 48 49 51 50 49 41 40 38 35 40
Edit data using a spreadsheet interface:
> data.entry(x) #pops up spreadsheet to edit data
> x = de(x) #same only, doesn't save changes
> x = edit(x) #uses editor to ediz x
The variable x needs to be defined previously
running maximum and minimun of a set of data
> cummax(x)
[1] 45 45 46 48 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51
> cummin(x)
[1] 45 43 43 43 43 43 43 43 43 43 43 43 43 43 43 41 40 38 35 35
define new functions
> std = function(x) sqrt(var(x))
> std(x)
[1] 4.558393
standard deviation built-in command
> sd(x)
[1] 4.558393
difference between near elements in data
> diff(x)
[1] -2 3 2 3 -5 4 -3 -1 -1 3 1 2 -1 -1 -8 -1 -2 -3 5
rangeretrieve the max and min elements in data
> range(x)
[1] 35 51
> diff(range(x))
[1] 16
Univariate Data
Categorical Data
tableallows to look at tables
> x = c("Yes","No","No","Yes","Yes")
> table(x)
x
No Yes
2 3
factorsallows to look at tables
> x = c("Yes","No","No","Yes","Yes")
> table(x)
x
No Yes
2 3