讓Python老司機幫你搞定MySQL數據庫

讓Python幫你搞定MySQL數據庫

讓Python老司機幫你搞定MySQL數據庫


奇貓 51CTO 昨天

Mysql是常用的數據庫之一,也是面試工作必備技能之一。


本文通過一個實戰,將Python與Sql語句結合起來使用,搞定MySQL數據庫。


// 實戰開始 //


我們在github上下載fifa18球員數據,將這些信息存入到mysql。

①數據下載地址:

https://github.com/amanthedorkknight/fifa18-all-player-statistics

②選擇:Complete->basicplayerdata.csv


// 將數據從CSV導入到MySQL //


1)創建對應庫與表












<code>#創建公司數據庫,編碼格式utf-8CREATE DATABASE fifa18_db DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;#選擇數據庫:use fifa18_db;#創建球員表:id,名稱,海報地址,俱樂部,年齡,薪資等,與csv文件對應create table player(id int Primary key auto_increment, player_id int, name char(64), age int, poster char(64),                    flag char(64), overall int, potential int, club char(64) default '', club_Logo char(64),                    value char(16), wage char(16), special int) default charset =utf8;/<code>


2)CSV讀取文件


數據集中某些球員字段為空,插入時候需要補充默認值,使用DictReader讀取。












<code>path = '/home/linux/workdir/data/basicplayerdata.csv'#字段名稱field = ['','ID','Name','Age','Photo','Nationality','Flag','Overall','Potential','Club','Club Logo','Value','Wage','Special']f = open(path)fcsv = csv.DictReader(f, fieldnames = field)#第一行去掉line = next(fcsv)#第一行有效數據line = next(fcsv)print(line)/<code> 


結果:








<code>OrderedDict([('', '0'), ('ID', '158023'), ('Name', 'L. Messi'), ('Age', '30'), ('Photo', 'https://cdn.sofifa.org/players/4/18/158023.png'), ('Nationality', 'Argentina'), ('Flag', 'https://cdn.sofifa.org/flags/52.png'), ('Overall', '94'), ('Potential', '94'), ('Club', 'FC Barcelona'), ('Club Logo', 'https://cdn.sofifa.org/teams/2/18/light/241.png'), ('Value', '€118.5M'), ('Wage', '€565K'), ('Special', '2161')])/<code>


3)讀取一行並整理數據格式


CSV讀取字段與數據庫對應起來,且不要CSV文件種的第一列,整理數據格式為字典,key為數據庫列名,value為CSV對應內容,代碼實現。












<code>sqlfield = ['player_id','name','age','poster','nationality','flag','overall','potential','club','club_logo','value','wage','special']csvfield = ['ID','Name','Age','Photo','Nationality','Flag','Overall','Potential','Club','Club Logo','Value','Wage','Special']#轉成字典:keysinfo = dict(zip(sqlfield, csvfield))#填充數據:data = {}for sfield, cfield in keysinfo.items():    ele = line.get(cfield, '')    data.setdefault(sfield, ele)print(data)/<code>


結果:







<code>{'player_id': '158023', 'name': 'L. Messi', 'age': '30', 'poster': 'https://cdn.sofifa.org/players/4/18/158023.png', 'nationality': 'Argentina', 'flag': 'https://cdn.sofifa.org/flags/52.png', 'overall': '94', 'potential': '94', 'club': 'FC Barcelona', 'club_logo': 'https://cdn.sofifa.org/teams/2/18/light/241.png', 'value': '€118.5M', 'wage': '€565K', 'special': '2161'}/<code>


4)拼接sql語句


嘗試插入一條數據:獲取插入字段與對應數據,拼接成sql語句。








<code>tablename = 'player'keys = data.keys()fields = ','.join(keys)vals = ','.join(["'%s'"% val for val in data.values()])sql = f"INSERT INTO {tablename}({fields}) VALUES({vals})"print(sql)/<code>


結果:




<code>INSERT INTO player(player_id,name,age,poster,nationality,flag,overall,potential,club,club_logo,value,wage,special) VALUES('158023','L. Messi','30','https://cdn.sofifa.org/players/4/18/158023.png','Argentina','https://cdn.sofifa.org/flags/52.png','94','94','FC Barcelona','https://cdn.sofifa.org/teams/2/18/light/241.png','€118.5M','€565K','2161')/<code>


5)連接數據庫,並插入數據












<code>#編碼格式:utf-8db = pymysql.connect("localhost","root","abc123","fifa18_db", charset='utf8')#獲取遊標cursor = db.cursor()cursor.execute(sql)#提交數據db.commit()#斷開連接cursor.close()db.close()/<code>


查看player中信息,插入成功。


讓Python老司機幫你搞定MySQL數據庫


基本功能都已經實現,對代碼進行整理,根據功能定義類及方法。





























<code>import csvimport pymysqlclass LoadDataFromCsvToMysql:    def __init__(self, csvpath, csvfield, mysqlfield, table, sqlconfig):        #初始化參數        pass    def connectSql(self):        #連接數據庫,獲取遊標        pass    def disconnectSql(self):        #斷開數據庫,獲取遊標        pass    def processSql(self, sql):        #處理sql語句        pass    def loadCsv(self):        #打開csv文件,返回csv對象        pass    def closeCsv(self):        #關閉csv文件        pass    def gensql(self, linedata):        #生成sql語句        pass    def process(self):        #對外接口,打開文件並寫數據庫        pass/<code> 


類定義完成之後,我們可以將功能明確的方法實現。














































<code>import csvimport pymysqlclass LoadDataFromCsvToMysql:    def __init__(self, csvpath, csvfield, mysqlfield, table, sqlconfig):        #初始化參數        self.dbconfig = sqlconfig        self.inpath = csvpath        self.sqlfield =  mysqlfield        self.csvfield = csvfield        self.fieldmap = dict(zip(mysqlfield, csvfield[1:]))        self.tablename = table    def connectSql(self):        self.db = pymysql.connect(**self.dbconfig, charset='utf8')        self.cursor = self.db.cursor()    def disconnectSql(self):        self.cursor.close()        self.db.close()    def processSql(self, sql):        ret = self.cursor.execute(sql)        self.db.commit()        return ret    def loadCsv(self):        #打開csv文件        self.f = open(path)        fcsv = csv.DictReader(self.f, fieldnames = self.csvfield)        next(fcsv)        return fcsv    def closeCsv(self):        self.f.close()    def gensql(self, linedata):        pass    def process(self):        #測試數據庫與csv文件打開關閉        self.connectSql()        print('connect sql...')        fcsv = self.loadCsv()        self.disconnectSql()        self.closeCsv()        print('disconnect sql')        path = '/home/linux/workdir/data/basicplayerdata.csv'sqlfield= ['player_id','name','age','poster','nationality','flag','overall','potential','club','club_logo','value','wage','special']csvfield = ['','ID','Name','Age','Photo','Nationality','Flag','Overall','Potential','Club','Club Logo','Value','Wage','Special']dbconfig = {'host':'localhost','port':3306,'user':'root','passwd':'abc123','db':'fifa18_db'}obj = LoadDataFromCsvToMysql(path, csvfield, sqlfield,  'player', dbconfig)obj.process()/<code>


輸出:





<code>connect sql...load csvfiledisconnect sql/<code>


6)完善process與gensql


先完善process方法實現:














<code> def process(self):        #測試數據庫與csv文件打開關閉        self.connectSql()        print('connect sql...')        fcsv = self.loadCsv()        for line in fcsv:            sql = self.gensql(line)            print(sql)            n = self.processSql(sql)        self.disconnectSql()        self.closeCsv()        print('disconnect sql')/<code>


然後完善gensql方法,將前面實現添加到此方法中。














<code>def gensql(self, linedata):        #產生sql語句        data = {}        for sfield, cfield in self.fieldmap.items():            ele = linedata.get(cfield, '')            data.setdefault(sfield, ele)        tablename = self.tablename        keys = data.keys()        fields = ','.join(keys)        vals = ','.join(["'%s'"% val for val in data.values()])        sql = f"INSERT INTO {tablename}({fields}) VALUES({vals})"        return sql/<code>


測試前將player表中內容刪除,然後使用代碼先插入一條,並查看結果;如果測試沒有問題,可以插入所有數據。實際運行出現問題:sql語句錯誤 ,字符串拼接錯誤。


出錯sql語句:



<code>"('158023','L. Messi','30','https://cdn.sofifa.org/players/4/18/158023.png',..)"/<code>


雙引號+單引號,但是文件中,某些字符串帶單引號,所以出現字段錯誤。


修改sql拼接方法:














<code>    def gensql(self, linedata):        #產生sql語句        data = {}        for sfield, cfield in self.fieldmap.items():            ele = linedata.get(cfield, '')            data.setdefault(sfield, ele)        tablename = self.tablename        keys = data.keys()        fields = ','.join(keys)        vals = ','.join(['"%s"'% val for val in data.values()])        sql = f'INSERT INTO {tablename}({fields}) VALUES({vals})'        return sql/<code>

運行結果:







<code>dstram","18","https://cdn.sofifa.org/players/4/18/238813.png","England","https://cdn.sofifa.org/flags/14.png","47","65","Crewe Alexandra","https://cdn.sofifa.org/teams/2/18/light/121.png","€60K","€1K","1305")INSERT INTO player(player_id,name,age,poster,nationality,flag,overall,potential,club,club_logo,value,wage,special) VALUES("238306","A. Conway","19","https://cdn.sofifa.org/players/4/18/238306.png","Republic of Ireland","https://cdn.sofifa.org/flags/25.png","47","63","Galway United","https://cdn.sofifa.org/teams/2/18/light/1571.png","€60K","€1K","1314")disconnect sql1555235373.5942054 1555235392.2122808/<code>


插入18000條數據,花費時間大概為20S左右;後續優化:每次插入500條數據,然後在查看花費時間,這個大家可以參考前面案例自己實現。


// 在SQL中查詢數據 //


需求:查詢player表中阿根廷國家球員姓名,年齡,頭像信息。讀者朋友可以自己嘗試去實現,考慮使用繼承。


①sql中查詢數據我們步驟:連接數據庫,執行sql語句,關閉數據庫;

②查詢與寫入很多方法通用,考慮繼承LoadDataFromCsvToMysql類;

③需要重載init,process,processSql方法;


代碼實現:































<code>class QueryMysql(LoadDataFromCsvToMysql):    def __init__(self, sqlconfig):        #調用父類方法,初始化傳一些無效參數        super(QueryMysql, self).__init__('',[], [], '', dbconfig)    def genSql(self, table, fields, condition=None):        #查詢語句生成        fds = ','.join(fields)        cond = ''        print(condition)        if condition:            cond = f' where {condition}'        sql = f'select {fds} from {table}{cond}'        return sql    def process(self, tablename, fields, condition=None):        #對外接口        #調用父類中的連接數據庫,關閉數據庫方法        self.connectSql()        sql = self.genSql(tablename, fields, condition)        self.processSql(sql)        items = self.cursor.fetchall()        for item in items:            print(item)        print('all Argentina player:', len(items))        self.disconnectSql()        dbconfig= {'host':'localhost','port':3306,'user':'root','passwd':'abc123','db':'fifa18_db'}tablename = 'player'files = ['name','poster', 'age']obj = QueryMysql(dbconfig)cond = 'nationality="Argentina"'obj.process(tablename, files, cond)/<code>


結果:









<code>('L. Messi', 'https://cdn.sofifa.org/players/4/18/158023.png', 30)('G. Higuaín', 'https://cdn.sofifa.org/players/4/18/167664.png', 29)('P. Dybala', 'https://cdn.sofifa.org/players/4/18/211110.png', 23)...('A. Miño', 'https://cdn.sofifa.org/players/4/18/243298.png', 23)('T. Durso', 'https://cdn.sofifa.org/players/4/18/240955.png', 18)('K. Humeler', 'https://cdn.sofifa.org/players/4/18/240291.png', 20)('J. Mendive', 'https://cdn.sofifa.org/players/4/18/241584.png', 20)all Argentina player: 966/<code>

結果:


數據集中一共有966名Argentina球員。

讓Python老司機幫你搞定MySQL數據庫


分享到:


相關文章: