Github被微軟收購了?不過一點不影響打造Github代碼洩露監控工具

Github被微軟收購了?不過一點不影響打造Github代碼洩露監控工具

0×01 擼起袖子開幹

人生苦短,我用Python!

Python強大的庫、簡潔語言以及開發迅速等特點,深受廣大程序開發者喜愛。那麼我們就用Python來開發吧!

Github被微軟收購了?不過一點不影響打造Github代碼洩露監控工具

0×02 步驟解析

1.登陸Github

登陸這裡設置了一個坑,登陸 https://github.com/login 會跳轉到 https://github.com/session ,然後提交請求主體。而主體包含了如下參數:

“commit=Sign+in&utf8=%E2%9C%93&authenticity_token=sClUkea9k0GJ%2BTVRKRYsvLKPGPfLDknMWVSd%2FyWvyGAR9Zz09bipesvXUo8ND2870Q2FEVsQWFKScyqtV0w1PA%3D%3D&login=YourUsername&password=YourPassword”

commit、uft8、login和password值相對來說是固定的,我們要做到工具登陸,那麼需要獲取到authenticity_token這個值,然後一起通過POST方法提交。那應該如何獲取該值呢?

我們打開瀏覽器嘗試手動正常登陸,同時按F12打開“開發者工具”,輸入用戶名和密碼可以看到跳轉到 https://github.com/session ,而authenticity_token的值就在如下圖位置:

Github被微軟收購了?不過一點不影響打造Github代碼洩露監控工具

雖然是隱藏的,但是我們可以通過Xpath來獲取它,然後跟其他參數一起提交登陸Github。看代碼:

Github被微軟收購了?不過一點不影響打造Github代碼洩露監控工具

2.查詢關鍵詞及結果呈現

登陸後請求查詢的URL,然後獲取響應的頁面,使用xpath解析節點獲取想要的信息。關於xpath的語法請看這裡

http://www.runoob.com/xpath/xpath-tutorial.html

我們還要將獲取的信息寫入表格裡面,便於以後查看。詳情如下:def hunter(gUser,gPass,keyword,payloads): global sensitive_list global tUrls sensitive_list = [] tUrls = [] try: #創建表格 csv_file = open('leak.csv','w',encoding='utf-8',newline='') writer = csv.writer(csv_file) #寫入表頭 writer = writerow(['URL','Username','Upload Time','Filename'])  #搜索信息 s = login_github(gUser,gPass) print('登陸成功,正在檢索洩露信息......') sleep(1) for page in tqdm(range(1,6)): #檢索1到6頁匹配關鍵詞keyword的結果 search_code = 'https://github.com/search?p=' + str(page) + '&q=' + keyword + '&type=Code'  resp = s.get(search_code) results_code = resp.text dom_tree_code = etree.HTML(results_code) #採用lxml提供的etree來解析結果 Urls = dom_tree_code.xpath('//div[@class="d-inline-block col-10"]/a[2]/@href') #獲取倉庫地址 users = dom_tree_code.xpath('//a[@class="text-blod"]/text()') #獲取用戶名 datetime = dom_tree_code.xpath('//relative-time/text()') #獲取上傳時間 filename = dom_tree_code.xpath('//div[@class="d-inline-block col-10"]/a[2]/text()') #獲取上傳的文件名稱 for i in range(len(Urls)): for Url in Urls: Url = 'https://github.com' + Url #獲取的URl被截斷,所以需要加入前綴便於訪問 tUrls.append(Url) writer.writerow([tUrls[i],users[i],datetime[i],filename[i]]) #寫入表格文件 ''' 以下部分主要是獲取洩露的raw代碼,然後在代碼中搜索用戶自定義的payload,例如 password,username,IP等等,然後把存在敏感關鍵詞的URL存放在sensitvie_list列表中,用於後續的郵件發送預警。 ''' for raw_url in Urls: url = 'https://raw.githubusercontent.com' + raw_url.replace('/blob','') code = requests.get(url).text for payload in payloads: if payload in code: leak_url = '命中的Payload為:' + payload + '\r\n' + 'https://github.com' + raw_url + '\r\n\r\n\r\n' + '代碼如下: \r\n' + code + '\r\n\r\n' sensitive_list.append(leak_url) csv_file.close() return sensitive_list except Exception as e: print(e)

以上代碼的核心主要是採用xpath解析DOM樹,然後根據需要的數據逐一獲取然後寫入表格中。最後請求raw.githubusercontent.com來獲取源代碼,根據用戶提供的payload進行逐一匹配,如果匹配則記錄payload、URL以及代碼,然後發送郵件預警。

3.郵件預警

其實郵件發送部分不是工具的重點,但是還是有必要貼上代碼部分。請看:

def send_warning(host,username,password,sender,receivers,content) def _format_addr(s): name,addr = parseaddr(s) return formataddr((Header(name,'utf-8').encode(),addr) msg = MIMEMultipart() msg['From'] = _format_addr('Github安全監控' % sender) msg['To'] = ''.join(receivers) Subject = 'Github敏感信息洩露通知' msg['Subject'] = Header(Subject,'utf-8').encode() msg.attach(MIMEText('Dear all \r\n\r\n請注意,懷疑Github上已經上傳敏感信息!以下是可能存在敏感信息的倉庫!\r\n\r\n'+content+'\r\n\r\n')) with open('leak.csv','rb') as f: m = MIMEBase('excel','csv',filename='leak.csv') m.add_header('Content-Disposition','attachment',filename = 'leak.csv' m.add_header('Content-ID','<0>') m.add_header('X-Attachment-ID','0') m.set_payload(f.read()) encoders.encode_base64(m) msg.attach(m) try: server = smtplib.SMTP(host,25) server.login(username,password) server.sendmail(sender,receivers,msg.as_string()) print('郵件發送成功!') except Exception as err: print(err) server.quit() 

4.配置文件讀取

我們將創建一個.ini的文件,便於工具讀取我們想要傳入工具的關鍵詞、用戶名、密碼以及payload等等。ini配置文件定義如下:

[KEYWORD]keyword = your main keyword here[EMAIL]host = Email serveruser = Email Userpassword = Email password[SENDER]sender = The email sender[RECEIVER]receiver1 = Email receiver No.1receiver2 = Email receiver No.2[Github]user = Github Usernamepassword = Github Password[PAYLOADS]p1 = Payload 1p2 = Payload 2p3 = Payload 3p4 = Payload 4p5 = Payload 5p6 = Payload 6

然後我們在main函數中讀取它們,然後傳入工具中。

if __name__ == '__main__': config = configparser.ConfigParser() config.read('info.ini') g_User = config['Github']['user'] g_Pass = config['Github']['password'] host = config['EMAIL']['host'] m_User = config['EMAIL']['user'] m_Pass = config['EMAIL']['password'] m_sender = config['SENDER']['sender'] receivers = [] for k in config['RECEIVER']: receivers.append(config['RECEIVER'][k]) keyword = config['KEYWORD']['keyword'] payloads = [] for key in config['PAYLOADS']: payloads.append(config['PAYLOADS'][key]) sensitive_list = hunter(g_User, g_Pass, keyword, payloads) if sensitive_list: print('\033[1;31;0m警告:找到敏感信息!\r\n\033[0m') print('開始發送告警郵件......') content = ''.join(sensitive_list) send_warning(host, m_User, m_Pass, m_sender, receivers, content) else: print('恭喜:未找到敏感信息!\r\n') print('所有檢查已完成,已生成報表!\r\n') print('開始發送報表......\r\n') send_mail(host, m_User, m_Pass, m_sender, receivers)

以上代碼中存在另外一個send_mail函數,同樣是發送郵件的功能跟send_warning功能一樣,只是發送的內容不一樣。這裡不再贅述。這樣我們就完成了整個工具的核心部分。怎麼樣?對於老司機來說很簡單吧!

0×03 監控效果

1.運行效果

Github被微軟收購了?不過一點不影響打造Github代碼洩露監控工具

2.郵件預警

Github被微軟收購了?不過一點不影響打造Github代碼洩露監控工具


分享到:


相關文章: