본문 바로가기

Notice

Recent Posts

Recent Comments

Link

Tags more

Archives

Today

Total

관리 메뉴

tellusboutyourself

[경경데 Week3] Basic Grammar of R | Matrix, Dataframe 본문

카테고리 없음

[경경데 Week3] Basic Grammar of R | Matrix, Dataframe

금서_ 2024. 9. 20. 18:22

2024.09.20.FRI Week3

중앙대학교 경영경제데이터분석소프트웨어 _ 김황 교수님

[last time]

Data Type
- vector : one-dimentional data type
  - A = c(1,2,3,4)
  - R is case-sensative a is not A

[week3 start] p.41~

Matrix

multi-dimentional data types
2 dimentional data types
matrix has columns and rows
matrix function = matrix(XXX)
- byrow=True → we fill the value row way.
- 즉, 행의 순서로 값을 채운다. 첫 번째 줄에 1,2 두 번째 줄에 3,4 …
- byrow
TRUE fill values row-wise

FALSE fill values column-wise
> A = matrix(data=1:6,nrow=3,ncol=2,byrow=FALSE) # 3x2 > A [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6
combine columns / rows
- combine columns : 열 기준으로 결합하는 것이므로 a와 b를 합치면 b가 a의 오른쪽에 붙는다
- combine rows : 행 기준으로 결합하는 것이므로 a와 b를 합치면 a밑에 b가 붙는다

combine matrix

combine two datasets column-wise : cbind

> x = c(1,2,3)
> y = 10:12
> C = cbind(x,y)

> C
     x  y
[1,] 1 10
[2,] 2 11
[3,] 3 12

combine two datasets row-wise : rbind

> D = rbind(x,y)

> D
  [,1] [,2] [,3]
x    1    2    3
y   10   11   12

‘=’ 앞 글자는 생략 가능하다.
> A = matrix(data=1:6,nrow=3,ncol=2) > A [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6 # 글자를 생략해도 동일한 결과가 나옴 > A = matrix(1:6,3,2) > A [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6
make row names and column names
> rownames(A) = c("JAN", "FEB", "MAR") > colnames(A) = c("price","prom") > A price prom JAN 1 4 FEB 2 5 MAR 3 6

DataFrame

matrix와 다른점?matrix는 기본적으로 레이블이나 컬럼명이 지정되어 있지 않다.
dataframe은 컬럼 이름을 갖고 있다.
similar to matrix, but it has column titles
컬럼명을 반드시 갖고 있어야 한다.
data.frame(xxx)
> E = data.frame(ID=1:3, AGE=c(20,21,45)) > E ID AGE 1 1 20 2 2 21 3 3 45
summary(x)
- see the summary stats of variables
```
> summary(E)
       ID           AGE       
 Min.   :1.0   Min.   :20.00  
 1st Qu.:1.5   1st Qu.:20.50  
 Median :2.0   Median :21.00  
 Mean   :2.0   Mean   :28.67  
 3rd Qu.:2.5   3rd Qu.:33.00  
 Max.   :3.0   Max.   :45.00  
```
위와 같은 정보들은 엑셀로 충분히 확인할 수 있다. 그러나 매번 엑셀파일에 들어가서 확인할 수 없기 때문에 summary 함수를 쓰는 것이다. 또한, 엑셀에 없는 정보들도 추가적으로 확인할 수 있다.
str(x)
> str(E) 'data.frame': 3 obs. of 2 variables: $ ID : int 1 2 3 $ AGE: num 20 21 45

Index System for Matrix

want to access an element in the second row and first column

> B = matrix(1:6,3,2)

> B
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

# B[row number, column number]
> B[2,1] 
[1] 2

want to access EVERYTHING in the first row

> B[1,]
[1] 1 4

empty means everything!

want to access EVERYTHING in the second column

> B[,2]
[1] 4 5 6

want to access the first and second rows in the second column

B[c(1,2),2] 

B[1:2,2]    # r에서는 시작구문을 비워두는 것이 불가능하다.

> A = c(1:6) > A[2] [1] 2
Index System for Dataframe

E = data.frame(ID=1:3, AGE=c(20,21,45))

choose a column in a dataframe

> E$ID
[1] 1 2 3

> E$ID[2]  # treat it like a vector
[1] 2

그냥 id라고 적으면 E에 속하는지 모르기 때문에 컬럼명 앞에 꼭 소속된 데이터프레임명을 적고 그 다음에 $를 붙인 후 컬럼명을 적는다.

choose more than one column (variable)
> E[c("ID","AGE")] ID AGE 1 1 20 2 2 21 3 3 45
choose some rows in more than one column

> E[c("ID","AGE")][1:2,]  # treat it like a matrix
  ID AGE
1  1  20
2  2  21

티스토리툴바