处理数据的基本操作
Libraries
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import numpy as np
import seaborn as sns
Exploratory Data Analysis Demos.ipynb
Panda基本作图方法:
• Bar Plot: plt.bar([10, 20, 30], [5, 8, 2])
• Histogram: plt.hist(y1)
• Boxplot: y_outliers = y1+[-10]
plt.boxplot([y1,y_outliers])
plt.show()
• Line plot: plt.plot(x,y1,'-')
• Scatter plot: plt.plot(x,y1,'.r')
统计unique值: df1['sex'].unique()
统计值个数: df1['fare'].value_counts()
统计空值:sum(df1['embarked'].isnull())
Week 6 tutorial Data Auditing answer.ipynb
描述数据:
titanic.describe()
titanic.info()
按条件提取: titanic[titanic.title == "Rev"]
titanic[((titanic.who == "man") | (titanic.who == "woman")) & (titanic.age < 18)]
替换值:titanic.embark_town.replace({"Cherborg": "Cherbourg", "Cherbourge": "Cherbourg"}, inplace=True)
titanic['sex'].replace({'F':'female', 'M':'male'},inplace=True)
处理重复值(只留第一个): titanic[titanic.duplicated(["firstName", "lastName", "age"], keep="first")]
分割一个列里面的不同字段