๐‘๐‘œ๐‘ก๐‘’๐‘๐‘œ๐‘œ๐‘˜

fomatting, split, group by, pivot_table ๋ณธ๋ฌธ

ํŒŒ์ด์ฌ

fomatting, split, group by, pivot_table

seoa__ 2025. 1. 10. 13:41

ํฌ๋งทํŒ… (formatting)

f-string
x = 10
print(f"๋ณ€์ˆ˜ x์˜ ๊ฐ’์€ {x}์ž…๋‹ˆ๋‹ค.")
x = 10
print("๋ณ€์ˆ˜ x์˜ ๊ฐ’์€ {}์ž…๋‹ˆ๋‹ค.".format(x))
x = 10
print("๋ณ€์ˆ˜ x์˜ ๊ฐ’์€ %d์ž…๋‹ˆ๋‹ค." % (x))

split

๋ฌธ์ž์—ด โ†’ ๋ฆฌ์ŠคํŠธ๋กœ ๋ณ€ํ™˜ (๊ณต๋ฐฑ ๊ธฐ์ค€์œผ๋กœ ๋ถ„ํ• )

sentence = "Hello, how are you doing today?"
words = sentence.split()
print(words)  # ์ถœ๋ ฅ: ['Hello,', 'how', 'are', 'you', 'doing', 'today?']

(ํŠน์ • ๊ตฌ๋ถ„์ž๋ฅผ ๊ธฐ์ค€์œผ๋กœ)

data = "apple,banana,grape,orange"
fruits = data.split(',')
print(fruits)  # ์ถœ๋ ฅ: ['apple', 'banana', 'grape', 'orange']์ฝ”๋“œ๋ฅผ ์ž…๋ ฅํ•˜์„ธ์š”

๋ฆฌ์ŠคํŠธ์˜ ๊ฐ ํ•ญ๋ชฉ์„ ๋ฌธ์ž์—ด๋กœ (์•Œ๊ณ ๋งŒ ์žˆ๋Š” ๊ฑธ๋กœ)

words = ['Hello,', 'how', 'are', 'you', 'doing', 'today?']
sentence = ' '.join(words)
print(sentence)  # ์ถœ๋ ฅ: Hello, how are you doing today?

๋ฆฌ์ŠคํŠธ์˜ ๊ฐ ํ•ญ๋ชฉ์„ ๋ฌธ์ž์—ด๋กœ ๊ฒฐํ•ฉํ•˜๋˜, ํŠน์ • ๊ตฌ๋ถ„์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒฐํ•ฉ

fruits = ['apple', 'banana', 'grape', 'orange']
data = ','.join(fruits)
print(data)  # ์ถœ๋ ฅ: apple,banana,grape,orange

์—ฌ๋Ÿฌ ์ค„๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฌธ์ž์—ด์„ ์ค„ ๋‹จ์œ„๋กœ ๋ถ„ํ• ํ•˜์—ฌ ๋ฆฌ์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ

text = """First line
Second line
Third line"""
lines = text.split('\n')
print(lines)  # ์ถœ๋ ฅ: ['First line', 'Second line', 'Third line']

๋ฌธ์ž์—ด์„ ๊ณต๋ฐฑ์œผ๋กœ ๋ถ„ํ• ํ•œ ํ›„ ํŠน์ • ๊ฐœ์ˆ˜์˜ ํ•ญ๋ชฉ๋งŒ ๊ฐ€์ ธ์˜ค๊ธฐ

sentence = "Hello, how are you doing today?"
words = sentence.split()
first_three_words = words[:3]
print(first_three_words)  # ์ถœ๋ ฅ: ['Hello,', 'how', 'are']

๋ฌธ์ž์—ด์—์„œ ๊ณต๋ฐฑ์„ ์ œ๊ฑฐํ•œ ํ›„ ๋ฌธ์ž์—ด์„ ๋ฆฌ์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ

text = "   Hello   how   are   you   "
cleaned_text = text.strip()
words = cleaned_text.split()
print(words)  # ์ถœ๋ ฅ: ['Hello', 'how', 'are', 'you']

split ์‹ค์ „ ์˜ˆ์‹œ

# ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ๋กœ๋ฅผ ๋ฌธ์ž์—ด๋กœ ํ‘œํ˜„
file_path = "/usr/local/data/sample.txt"

# split() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋””๋ ‰ํ† ๋ฆฌ์™€ ํŒŒ์ผ๋ช…์œผ๋กœ ๋ถ„ํ• 
directory, filename = file_path.rsplit('/', 1)
print("๋””๋ ‰ํ† ๋ฆฌ:", directory)  # ์ถœ๋ ฅ: ๋””๋ ‰ํ† ๋ฆฌ: /usr/local/data
print("ํŒŒ์ผ๋ช…:", filename)    # ์ถœ๋ ฅ: ํŒŒ์ผ๋ช…: sample.txt

 

  • ์˜ˆ์‹œ์—์„œ๋Š” file_path๋ผ๋Š” ๋ฌธ์ž์—ด ๋ณ€์ˆ˜์— ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ๋กœ๋ฅผ ์ €์žฅํ•˜๊ณ , split() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์ž์—ด์„ / ๊ธฐ์ค€์œผ๋กœ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋•Œ, rsplit() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์˜ค๋ฅธ์ชฝ์—์„œ๋ถ€ํ„ฐ ์ตœ๋Œ€ 1ํšŒ๋งŒ ๋ถ„ํ• ํ•˜๋„๋ก ์„ค์ •ํ•˜์—ฌ ํŒŒ์ผ๋ช…๊ณผ ๋””๋ ‰ํ† ๋ฆฌ๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค.
  • ๋ถ„ํ• ๋œ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ๊ฐ directory์™€ filename ๋ณ€์ˆ˜์— ํ• ๋‹นํ•˜์—ฌ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

group by()

  • ๋ฐ์ดํ„ฐ๋ฅผ ํ”ผ๋ด‡ํŒ…ํ•˜์—ฌ ํ†ต๊ณ„๋Ÿ‰์„ ๋ณผ ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ฃผ๋Š” ๋ฉ”์„œ๋“œ
  • ํŠน์ • ์กฐ๊ฑด์— ๋งž๊ฒŒ ์ „์ฒ˜๋ฆฌํ•ด ์ค„ ๋•Œ ์šฉ์ด

by : ๊ทธ๋ฃนํ™”ํ•  ๋‚ด์šฉ. ํ•จ์ˆ˜, ์ถ•, ๋ฆฌ์ŠคํŠธ ๋“ฑ๋“ฑ์ด ์˜ฌ ์ˆ˜ ์žˆ์Œ
sort : ๊ทธ๋ฃนํ‚ค๋ฅผ ์ •๋ ฌํ• ์ง€ ์—ฌ๋ถ€
dropna : ๊ฒฐ์ธก๊ฐ’์„ ๊ณ„์‚ฐ์—์„œ ์ œ์™ธํ• ์ง€ ์—ฌ๋ถ€

# ๋‹ค์ค‘ ์ปฌ๋Ÿผ groupby
df.groupby(['sex', 'pclass'])[['survived', 'age']].mean()

# ๋‹ค์ค‘ ํ†ต๊ณ„๊ฐ’
df.groupby(['sex', 'pclass'])[['survived', 'age']].agg(['mean', 'sum'])

pivot_table()

  • ๋ฐ์ดํ„ฐ๋ฅผ ์Šคํ”„๋ ˆ๋“œ์‹œํŠธ ๊ธฐ๋ฐ˜ ํ”ผ๋ฒ— ํ…Œ์ด๋ธ”๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฉ”์„œ๋“œ, ์—‘์…€ ์Šคํ”„๋ ˆ๋“œ์‹œํŠธ ํ”ผ๋ฒ— ํ…Œ์ด๋ธ”๊ณผ ์œ ์‚ฌ

values : ๊ฐ’์œผ๋กœ ์ž…๋ ฅ๋  ์ปฌ๋Ÿผ
aggfunc : ์ ์šฉํ•  ํ•จ์ˆ˜
fill_value : ๊ฒฐ์ธก์น˜๋ฅผ ์ฑ„์›Œ๋„ฃ์„ ๊ฐ’

# index์— ๊ทธ๋ฃน์„ ํ‘œ๊ธฐ
df.pivot_table(index='who', values='survived')

# columns์— ๊ทธ๋ฃน์„ ํ‘œ๊ธฐ
df.pivot_table(columns='who', values='survived')

df.pivot_table(index=['who', 'pclass'], values='survived')
df.pivot_table(index='who', 
columns='pclass', 
values='survived', 
aggfunc=['sum', 'mean'])

'ํŒŒ์ด์ฌ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

ํŒŒ์ด์ฌ ๊ธฐ์ดˆ  (1) 2025.01.10