[MapReduce] 파이썬으로 단어 별 빈도수 확인하기

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

tellusboutyourself

[MapReduce] 파이썬으로 단어 별 빈도수 확인하기 본문

Hadoop

[MapReduce] 파이썬으로 단어 별 빈도수 확인하기

금서_ 2024. 3. 7. 17:39

리눅스 환경 - Rocky9

터미널 창 열기

하둡으로 접속

[root@localhost ~]# cd hadoop

1. mapper.py

gedit으로 mapper.py를 만든다.

[root@localhost hadoop]# gedit mapper.py

< 스크립트 편집창 >

#!/usr/bin/env python
import sys
for line in sys.stdin:
	words = line.strip().split()
	for word in words:
 	 print(f"{word}\\t{1}")

편집창에 위와 같이 입력 후 저장하고 닫기

다음 문장을 입력하면

[root@localhost hadoop]# echo "hello world python and hadoop" | ~/hadoop/mapper.py

hello 1

world 1

python 1

and 1

hadoop 1

이러한 결과가 나옴. 5개의 단어가 1번 씩 사용되었다.

2. mapreduce.py

1번과 마찬가지로 터미널창에 다음과 같이 입력.

[root@localhost hadoop]# gedit mapreduce.py

<스크립트 편집창>

a. 선생님이 하신 방법

#!/usr/bin/env python

from operator import itemgetter #다양한 방법으로 딕셔너리를 정렬
import sys

#필요한 변수 초기화

current_word = None
current_count = 0
word=None

result = {}
for line in sys.stdin:
	word,count = line.strip().split('\\t',1)
	if word not in result:
		result[word] = 1 #최초생성
	else:
		result[word] += 1  # 키가 있으므로 증가
for key, value in result.items():
	print(f"{key} : {value}")

b. GPT 제안 방법

#!/usr/bin/env python

from operator import itemgetter #다양한 방법으로 딕셔너리를 정렬
import sys

#필요한 변수 초기화

current_word = None
current_count = 0
word=None

for line in sys.stdin:
    line = line.strip()
    word,count = line.split('\\t',1)
    try:
        count = int(count)  # count를 정수로 변환
    except ValueError:
        continue
    if current_word == word:
        current_count += count
    else: 
        if current_word:
            print(f"{current_word}/{current_count}")
        current_word = word
        current_count = count

# 마지막 단어에 대한 결과 출력
if current_word == word:
    print(f"{current_word}/{current_count}")

편집창에 위와 같이 입력 후 닫기

아래 문장 터미널창에 입력.

echo "hello world python and hadoop hello" | /root/hadoop/mapper.py | sort -k1,1 | /root/hadoop/mapreduce.py

다음과 같은 결과가 나옴.(Gpt 제안 방법으로 실행했습니다)

and/1

hadoop/1

hello/2

python/1

world/1

hello는 총 2번 사용되었고, 나머지 단어들은 1번 씩 사용되었다.

'Hadoop' 카테고리의 다른 글

하이브(Hive) 쿼리테스트 (0)	2024.03.10
하이브(Hive) 실습 환경 구축 (employee, salaries) (3)	2024.03.08
하이브(Hive) 설치 (1)	2024.03.08
하이브(Hive)에 대해 알아보자 (2)	2024.03.08

'Hadoop' Related Articles

tellusboutyourself

[MapReduce] 파이썬으로 단어 별 빈도수 확인하기 본문

[MapReduce] 파이썬으로 단어 별 빈도수 확인하기

'Hadoop' 카테고리의 다른 글

티스토리툴바