[python, algorithm] 가장 흔한 단어 찾기

개발

[python, algorithm] 가장 흔한 단어 찾기

jmob_blog 2020. 9. 29. 23:25

728x90

출처: 파이썬 알고리즘 인터뷰

문제: 금지된 단어를 제외한 가장 많이 나온 단어를 찾기

여기서도 정규식을 활용하여 문제 풀이가 가능하다.

str01 = 'aaa bbb dfsdfe ds#@fds ff!@!ffd df 234$# #@$ @$##@@'
print(str01)
print(re.sub(r'[^\w]', ' ', str01))

정규식에서 \w는 단어 문자를 뜻하며 위와 같이 하면 문자가 아니면 전부 공백으로 처리할 수 있다.

print 결과 아래와 같이 여러 문자들은 없어진다.

aaa bbb dfsdfe ds#@fds ff!@!ffd df 234$# #@$ @$##@@
aaa bbb dfsdfe ds  fds ff   ffd df 234

이외에도 소대문자 통일 및 금지된 단어 제외를 하는 방법을 순서대로 출력하면 아래와 같다.

str01 = 'aaa bbb dfsdfe ds#@fds ff!@!ffd df 234$# #@$ @$##@@'
print(str01)
print(re.sub(r'[^\w]', ' ', str01))
print( [word for word in re.sub(r'[^\w]', ' ', str01).lower().split()] )
print( [word for word in re.sub(r'[^\w]', ' ', str01).lower().split() if word not in 'aaa'] )

print:

aaa bbb dfsdfe ds#@fds ff!@!ffd df 234$# #@$ @$##@@
aaa bbb dfsdfe ds  fds ff   ffd df 234
['aaa', 'bbb', 'dfsdfe', 'ds', 'fds', 'ff', 'ffd', 'df', '234']
['bbb', 'dfsdfe', 'ds', 'fds', 'ff', 'ffd', 'df', '234']

다음은 collections의 Counter 모듈의 most_common을 사용하여 가장 많은 단어를 추출할 수 있다.

순서대로 과정을 보면 아래 코드와 같습니다.

먼저 단어들을 리스트로 만들고, collections.Counter를 이용해서 각 단어 별 카운팅을 합니다.

most_common(n) 은 가장 많은 값을 n개 return 하므로 1을 주면 가장 많이 나온 값을 return 합니다. 이후 [0][0]의 값을 추출하면 됩니다.

str01 = 'aaa bbb bbb dfsdfe ds#@fds ff!@!ffd df 234$# #@$ @$##@@'
words = [word for word in re.sub(r'[^\w]', ' ', str01).lower().split() if word not in 'aaa']
counts = collections.Counter(words)
print( counts )
print( counts.most_common(1) )
print( counts.most_common(1)[0][0] )

print:

Counter({'bbb': 2, 'dfsdfe': 1, 'ds': 1, 'fds': 1, 'ff': 1, 'ffd': 1, 'df': 1, '234': 1})
[('bbb', 2)]
bbb

code:

def find_word(papagraph : str, banned : str) -> str:
    words = [word for word in re.sub(r'[^\w]', ' ', papagraph).lower().split() if word not in banned]
    counts = collections.Counter(words)
    return counts.most_common(1)[0][0]

728x90

저작자표시 비영리 변경금지 (새창열림)

'개발' 카테고리의 다른 글

[Python, Algorithm] sorted() sort (0)	2020.09.30
[Python, algorithm] 애너그램(Anagram) (0)	2020.09.30
[python] 팰린드롬 풀기 (0)	2020.09.21
[Flutter] 개발환경 만들기 (0)	2020.09.20
[Android, Kotlin] Null Safety (0)	2020.09.06

현재글[python, algorithm] 가장 흔한 단어 찾기

공부&기록

조금씩 기록합니다.

250x250

C++, ubuntu, Unity, python, 영등포구청, uwp, 칼국수, 제주도 맛집, 내돈내산, Android, 영등포구청맛집, 또간집, qt, 제주도맛집, 제주도, 맛집, Flutter, 혼밥, 영등포구청 맛집, QML,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

공부&기록