用戶失誤我“買單”：用戶輸入錯誤了怎麼辦？

2020-04-30 12:25:00 讀芯術

全文共2523字，預計學習時長15分鐘

圖源：unsplash

問題來源於生活。上週在做業餘項目時，我遇到了一個非常有趣的設計問題：“如果用戶輸入錯誤了怎麼辦？”如果輸入錯誤，就會發生以下這種情況：

示例：Python Dict

Python中的字典表示鍵(keys)和值(values)。例如：

<code>student_grades = {'John': 'A','Mary': 'C', 'Rob': 'B'}# To check grade of John, we call
print(student_grades['John'])
# Output: A/<code>

當您試圖訪問不存在的密鑰時會遇到什麼情況？

<code>print(student_grades['Maple'])
# Output:
KeyError                         Traceback(most recent call last)
<ipython-input-6-51fec14f477a> in <module>
----> print(student_grades['Maple'])

KeyError: 'Maple'/<module>/<ipython-input-6-51fec14f477a>/<code>

您會收到密匙錯誤（KeyError）提示。

每當dict（）請求對象為字典中不存在的鍵（key）時，就會發生KeyError。接收用戶輸入時，此錯誤十分常見。例如：

<code>student_name =input("Please enter student name: ")
print(student_grades[student_name])/<code>

本文將為你提供幾種處理Python字典 keyerror的方法。去努力構建一個python智能字典，它能幫你處理用戶的輸入錯誤問題。

設置默認值

一個非常簡便的方法便是在請求的key不存在時返回默認值。可以使用get()方法完成此操作：

<code>default_grade = 'Not Available'
print(student_grades.get('Maple',default_grade))# Output:
# Not Available/<code>

解決大小寫問題

假設您構建了Python字典，其中包含特定國家的人口數據。代碼將要求用戶輸入一個國家名並輸出顯示其人口數。

<code># population in millions. (Source: https://www.worldometers.info/world-population/population-by-country/)
                                  population_dict= {'China':1439, 'India':1380, 'USA':331, 'France':65,'Germany':83, 'Spain':46}
                                                                               # getting userinput
                                  Country_Name=input('Please enterCountry Name: ')
                                                                               # access populationusing country name from dict
                                  print(population_dict[Country_Name])/<code>

<code># Output
Please enter Country Name: France
65/<code>

然而，假設用戶輸入的是‘france’。目前，在我們的字典裡，所有的鍵的首字母均是大寫形式。那麼輸出內容會是什麼？

<code>Please enter Country Name:france-----------------------------------------------------------------KeyError                         Traceback (most recentcall last)
<ipython-input-6-51fec14f477a> in <module>
      2 Country_Name = input('Pleaseenter Country Name: ')
      3
----> 4 print(population_dict[Country_Name])/<module>/<ipython-input-6-51fec14f477a>/<code>

<code>KeyError: 'france'/<code>

由於‘france’不是字典中的鍵，因此會收到錯誤提示。

圖源：unsplash

一個簡單的解決方法：用小寫字母存儲所有國家/地區名稱。另外，將用戶輸入的所有內容轉換為小寫形式。

<code># keys (Country Names) are now alllowercase
        population_dict = {'china':1439, 'india':1380, 'usa':331, 'france':65,'germany':83, 'spain':46}
        Country_Name=input('Please enterCountry Name: ').lower() # lowercase input
                    print(population_dict[Country_Name])/<code>

<code>Please enter Country Name:france
65/<code>

處理拼寫錯誤

然而，假設用戶輸入的是 ‘Frrance’而不是 ‘France’。我們該如何解決此問題？

一種方法是使用條件語句。

我們會檢查給定的用戶輸入是否可用作鍵（key）。如不可用，則輸出顯示一條消息。最好將其放入一個循環語句中，並在某特殊的標誌輸入上中斷（如exit）。

<code>population_dict = {'china':1439, 'india':1380, 'usa':331, 'france':65,'germany':83, 'spain':46}
                                                       while(True):
                            Country_Name=input('Please enterCountry Name(type exit to close): ').lower()
                            # break from code if user enters exit
                            ifCountry_Name=='exit':
                                break
                                                           ifCountry_Nameinpopulation_dict.keys():
                                print(population_dict[Country_Name])
                            else:
                                print("Pleasecheck for any typos. Data not Available for ",Country_Name)/<code>

循環將繼續運行，直到用戶進入exit。

優化方法

雖然上述方法“有效”，但不夠“智能”。我們希望程序功能變強大，並能夠檢測到簡單的拼寫錯誤，例如frrance和chhina（類似於Google搜索）。

圖源：unsplash

我找到了幾個適合解決key error的庫，其中我最喜歡的是標準的python庫：difflib。

difflib可用於比較文件、字符串、列表等，並生成各種形式的不同信息。該模塊提供了用於比較序列的各種類和函數。我們將使用difflib的兩個功能：SequenceMatcher 和 get_close_matches。讓我們簡單地瀏覽下這兩種功能。

# SequenceMatcher

SequenceMatcher是difflib中的類，用於比較兩個序列。我們定義它的對象如下：

<code>difflib.SequenceMatcher(isjunk=None,a='', b='', autojunk=True)/<code>

· isjunk :在比較兩個文本塊時用於標明不需要的垃圾元素（空白，換行符等）。從而禁止通過有問題的文本。

· a and b: 比較字符串。

· autojunk ：一種自動將某些序列項視為垃圾項的啟發式方法。

讓我們使用SequenceMatcher比較chinna和china這兩個字符串：

<code>from difflib importSequenceMatcher# import
                                 # creating aSequenceMatcher object comparing two strings
              check =SequenceMatcher(None, 'chinna', 'china')
                                 # printing asimilarity ratio on a scale of 0(lowest) to 1(highest)
              print(check.ratio())
              # Output
              #0.9090909090909091/<code>

在以上代碼中，使用了ratio（）方法。ratio返回序列相似度的度量，作為範圍[0，1]中的浮點值。

# get_close_matches

現提供一種基於相似性比較兩個字符串的方法。

如果我們希望找到與特定字符串相似的所有字符串（存儲於數據庫），會發生什麼情況？

get_close_matches() 返回一個列表，其中包含可能性列表中的最佳匹配項。

<code>difflib.get_close_matches(word,possibilities, n=3, cutoff=0.6)/<code>

· word:需要匹配的字符串。

· possibilities: 匹配單詞的字符串列表。

· Optional n: 要返回的最大匹配數。默認情況下是3；且必須大於0。

· Optional cutoff：相似度必須高於此值。默認為0.6。

潛在的最佳n個匹配項將返回到一個列表中，並按相似度得分排序，最相似者優先。

圖源：unsplash

來看以下示例：

<code>from difflib importget_close_matches
                                     print(get_close_matches("chinna", ['china','france','india','usa']))
                # Output
                # ['china']/<code>

彙總

既然可以使用difflib了，那麼讓我們把所有內容進行組合，構建一個防誤的python字典。

當用戶提供的國家名不在population_dic.keys（）中時，需要格外注意。我們應嘗試找到一個名稱與用戶輸入相似的國家，然後輸出其人口數。

<code># pass country_name in word anddict keys in possibilities
maybe_country = get_close_matches(Country_Name, population_dict.keys())# Thenwe pick the first(most similar) string from the returned list
print(population_dict[maybe_country[0]])/<code>

最終代碼還需考慮其他一些情況。例如，如果沒有相似的字符串，或者未向用戶確認這是否是所需字符串。如下：

<code>from difflib importget_close_matches
                population_dict = {'china':1439, 'india':1380, 'usa':331, 'france':65,'germany':83, 'spain':46}
                                     while(True):
                    Country_Name=input('Please enterCountry Name(type exit to close): ').lower()
                    # break from code if user enters exit
                    ifCountry_Name=='exit':
                        break
                                         ifCountry_Nameinpopulation_dict.keys(): 

                        print(population_dict[Country_Name])
                    else:
                        # look for similarstrings
                        maybe_country =get_close_matches(Country_Name,population_dict.keys())
                        if maybe_country == []:  # no similar string
                            print("Pleasecheck for any typos. Data not Available for ",Country_Name)
                        else:
                            # user confirmation
                            ans =input("Do youmean %s? Type y or n."% maybe_country[0])
                            if ans =='y':
                                # if y, returnpopulation
                                print(population_dict[maybe_country[0]])
                            else:
                                # if n, start again
                                print("Bad input.Try again.")/<code>

輸出：

Inida 其實是India.

這樣一來，用戶的大小寫混淆或是輸入錯誤的處理就不在話下了。你還可以進一步研究其他各種應用程序，比如使用NLPs 更好地理解用戶輸入，並在搜索引擎中顯示相似結果。Python智能字典的構建方法，你學會了嗎？

我們一起分享AI學習與發展的乾貨

分享到:

閱讀更多 讀芯術 的文章

關鍵字: 買單 Maple 用戶