學過sql的童鞋都知道在sql的世界裡面,查詢語句就是一切,不管查詢一列,還是多列,還是條件查詢,加上排序等等,都是很快很高效,但是有一個缺點不可避免的就是,你需要把數據導入到數據庫,然後建立查詢語句,比較複雜的查詢語句,比較複雜,也是很難編寫,如果電腦配置比較低,即便是簡單的查詢語句,在面臨巨大的數據面前,也是會變得很慢。
今天主要是講2點:
- python下的pandas如何篩選數據
- SQL的查詢方法如何在pandas裡面實現
- 相比較你感受一下誰的速度跟快更便捷
![pandas中如何實現了和SQL一樣的查詢方式](http://p2.ttnews.xyz/loading.gif)
查詢幾列數據並只顯示前幾行
SQL實現方法:
SELECT 'Year', 'Month', 'DayofMonth', 'DayOfWeek' FROM tips LIMIT 5
pandas實現方法:
df[['Year', 'Month', 'DayofMonth', 'DayOfWeek']]
![pandas中如何實現了和SQL一樣的查詢方式](http://p2.ttnews.xyz/loading.gif)
Where :查詢指定的值
時間為year=2011的前五行數據:
SQL:
SELECT * FROM tips WHERE tYear = 'Dinner' LIMIT 5;
pandas:
df[df["Year"]==2011].head()
多條件查詢 AND:
通過多個條件限制查詢的值得範圍
SQL:
SELECT * FROM tips WHERE Year = 2011 AND Month > 6;
pandas:
df[(df["Year"]==2011)&(df["Month"]>6)].head()
多條件查詢 OR:
多個條件只要滿足一個就可以
SQL:
SELECT * FROM tips WHERE Year = 2011 OR Month > 6;
pandas:
df[(df["Year"]==2011)|(df["Month"]>6)].head()
篩選是否含有空值:
篩選某列包含空值的數據:
SQL:
SELECT * FROM frame WHERE Month IS NULL;
pandas:
df[df["Month"].isna()].head
篩選不包含空值的列:
SQL:
SELECT * FROM frame WHERE Month IS NOT NULL;
pandas:
df[df["Month"].notna()].head
分組統計:
SQL:
SELECT Year, count(*) FROM tips GROUP BY Year;
pandas:
df.groupby("Year").size()
分組統計應用不容的統計方式:
SQL:
SELECT Year, AVG(TaxiOut), COUNT(*) FROM tips GROUP BY Year;
pandas:
df.groupby('Year').agg({'TaxiOut': np.mean, 'Year': np.size})
數據連接查詢:
INNER JOIN
SQL:
SELECT * FROM df1 INNER JOIN df2 ON df1.key = df2.key;
pandas:
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'],
'value': np.random.randn(4)})
df2 = pd.DataFrame({'key': ['B', 'D', 'D', 'E'],
'value': np.random.randn(4)})
pd.merge(df1, df2, on='key')
LEFT OUTER JOIN
SQL:
SELECT * FROM df1 LEFT OUTER JOIN df2 ON df1.key = df2.key;
pandas:
pd.merge(df1, df2, on='key', how='left')
RIGHT JOIN
SQL:
SELECT * FROM df1 RIGHT OUTER JOIN df2 ON df1.key = df2.key;
pandas:
pd.merge(df1, df2, on='key', how='right')
FULL JOIN
SQL :
SELECT * FROM df1 FULL OUTER JOIN df2 ON df1.key = df2.key;
pandas:
pd.merge(df1, df2, on='key', how='outer')
合併查詢
Union 包括兩種情況,刪除重複和不刪除
不刪除:
SQL:
SELECT city, rank FROM df1 UNION ALL SELECT city, rank FROM df2;
pandas:
pd.concat([df1, df2])
刪除:
SQL:
SELECT city, rank FROM df1 UNION SELECT city, rank FROM df2;
pandas:
pd.concat([df1, df2]).drop_duplicates()
排序
SQL:
SELECT * FROM tips ORDER BY Month DESC
pandas:
df.sort_values(by="Month",ascending=False)
更新
SQL:
UPDATE tips SET TaxiIn = TaxiIn*2 WHERE Month < 5
pandas:
df.loc[df["Month"]<5,"TaxiIn"]*=2
df[df["Month"]<5]["TaxiIn"]
刪除
SQL:
DELETE FROM tips WHERE Month > 9;
pandas:
df.drop(df[df["Month"]>9].index)
閱讀更多 嘩啦圈的夢 的文章