測試 G 社今天公開的 cloud vision api (beta),對 G 社的 cloud server 不太熟,留個步驟的記錄,免得之後又要從頭查起
1. 在 https://console.developers.google.com/project 建立一個專案
2. 建立完成後,點選專案名稱,進入專案管理頁 (https://console.developers.google.com/home/dashboard?project=<ur_prj_name>)
3. 在資訊主頁點選「啟用及管理 API」,在搜尋框中輸入「vision」,會帶出 Cloud Vision API,點開後按下 Enable 按紐
4. 建立專案的帳單資訊 (資訊主頁中點選 Google Developers Console 字樣左邊的三橫摃按紐,下面有帳單選項,或是利用右邊的搜尋框找帳單也行)
5. 建立一個服務帳戶(Service Account) : (資訊主頁中點選 Google Developers Console 字樣左邊的三橫摃按紐,下面有權限選項,或是利用右邊的搜尋框找權限也行)
- 在服務帳戶的分頁下新建立一個服務帳戶,並勾選「提供一組櫎的私密金鑰」,類型選 JSON 完成後會幫你下載一個金鑰json檔,等會用到
5. 準備測試資料
- 建立本地專案目錄 myPrj
- 專案目錄 myPrj 下再建立圖片目錄 imgs
- imgs 裡丟幾張要辨視的圖 (eg. )
(imgs/IMG_3076.jpg)
(imgs/IMG_3077.jpg)
6. 測試。在 https://github.com/GoogleCloudPlatform/cloud-vision 有完整的各平台範例程式 (ios, android , java, python , go ...),但 python 的很雞婆的用了 redis 要額外設定,所以修改裡頭的 textindex.py 成能辨示圖片中的文字就好了
shell> atom text-detect.py
#!/usr/bin/env python
import argparse
# [START detect_text]
import base64
import os
import re
import sys
from googleapiclient import discovery
from googleapiclient import errors
import nltk
from nltk.stem.snowball import EnglishStemmer
from oauth2client.client import GoogleCredentials
import redis
DISCOVERY_URL = 'https://{api}.googleapis.com/$discovery/rest?version={apiVersion}' # noqa
BATCH_SIZE = 10
class VisionApi:
"""Construct and use the Google Vision API service."""
def __init__(self, api_discovery_file='vision_api.json'):
self.credentials = GoogleCredentials.get_application_default()
self.service = discovery.build(
'vision', 'v1', credentials=self.credentials,
discoveryServiceUrl=DISCOVERY_URL)
def detect_text(self, input_filenames, num_retries=3, max_results=6):
"""Uses the Vision API to detect text in the given file.
"""
images = {}
for filename in input_filenames:
with open(filename, 'rb') as image_file:
images[filename] = image_file.read()
batch_request = []
for filename in images:
batch_request.append({
'image': {
'content': base64.b64encode(images[filename])
},
'features': [{
'type': 'TEXT_DETECTION',
'maxResults': max_results,
}]
})
request = self.service.images().annotate(
body={'requests': batch_request})
try:
responses = request.execute(num_retries=num_retries)
if 'responses' not in responses:
return {}
text_response = {}
for filename, response in zip(images, responses['responses']):
if 'error' in response:
print("API Error for %s: %s" % (
filename,
response['error']['message']
if 'message' in response['error']
else ''))
continue
if 'textAnnotations' in response:
text_response[filename] = response['textAnnotations']
else:
text_response[filename] = []
return text_response
except errors.HttpError, e:
print("Http Error for %s: %s" % (filename, e))
except KeyError, e2:
print("Key error: %s" % e2)
# [END detect_text]
# [START extract_descrs]
def extract_description(texts):
"""Returns all the text in text annotations as a single string"""
document = ''
for text in texts:
try:
document += text['description']
except KeyError, e:
print('KeyError: %s\n%s' % (e, text))
return document
def extract_descriptions(input_filename, texts):
"""Gets and indexes the text that was detected in the image."""
if texts:
document = extract_description(texts)
print "documents in ", input_filename, ":\n", document
else:
if texts == []:
print('%s had no discernible text.' % input_filename)
# [END extract_descrs]
# [START get_text]
def get_text_from_files(vision, input_filenames):
"""Call the Vision API on a file and index the results."""
texts = vision.detect_text(input_filenames)
for filename, text in texts.items():
extract_descriptions(filename, text)
def batch(iterable, batch_size=BATCH_SIZE):
"""Group an iterable into batches of size batch_size.
>>> tuple(batch([1, 2, 3, 4, 5], batch_size=2))
((1, 2), (3, 4), (5))
"""
b = []
for i in iterable:
b.append(i)
if len(b) == batch_size:
yield tuple(b)
b = []
if b:
yield tuple(b)
def main(input_dir):
"""Walk through all the not-yet-processed image files in the given
directory, extracting any text from them and adding that text to an
inverted index.
"""
# Create a client object for the Vision API
vision = VisionApi()
allfileslist = []
# Recursively construct a list of all the files in the given input
# directory.
for folder, subs, files in os.walk(input_dir):
for filename in files:
allfileslist.append(os.path.join(folder, filename))
for filenames in batch(allfileslist):
get_text_from_files(vision, filenames)
# [END get_text]
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description='Detects text in the images in the given directory.')
parser.add_argument(
'input_directory',
help='the image directory you\'d like to detect text in.')
args = parser.parse_args()
main(args.input_directory)
shell> chmod +x text-detect.py
7. 環境設定 (吼~~~~ 真的有麻煩是不是~~~ 是不是~~~~~ !)
- 安裝 google cloud sdk
shell> curl https://sdk.cloud.google.com | bash
重新登入,初始化你的專案
shell> gcloud init
會帶出瀏覽器做必要的認証,再回到你的 termial 選擇你之前建立的專案
- 安裝必要的 python libs :
shell > sudo easy_install pip
shell > sudo pip install google-api-python-client
- 把先前下載的金鑰 JSON 檔拷貝到專案目錄,並易名為 credentials-key.json,然後再執行
shell> export GOOGLE_APPLICATION_CREDENTIALS=credentials-key.json
8. 執行測試
shell> ./text_detect.py imgs
結果如下
documents in imgs/IMG_3077.jpg :
Panasonic KX-T7665
1215
MESSAGE
PROGRAM
DIGITAL
SUPER HYBRID SYSTEM
VOLUME
INTERCOM
TRANSFER
AUTO ANS
AUTO DIAL
documents in imgs/IMG_3076.jpg :
ER e
VL EXIT
結論 : 要抓的數字都抓不到,有點悲傷 T_T
p.s
1. 官方 Getting Start
2. 執行時的認証也許需要用到 google cloud sdk 先初始化,安裝及步驟可參考 https://cloud.google.com/sdk/
3. 如果安裝 google-api-python--client 出現無法 uninstall six 的錯誤,試著在指令後面加上 --ignore-installed six,並在 ~/.bash_profile 加上 export PYTHONPATH=/Library/Python/2.7/site-packages
0 意見:
張貼留言