Blog list

2016年3月2日 星期三

[記事] Google cloud vision API first try

屬於阿宅世界的技術文章,想看的再點開,切莫自誤 !




測試 G 社今天公開的 cloud vision api (beta),對 G 社的 cloud server 不太熟,留個步驟的記錄,免得之後又要從頭查起

1. 在 https://console.developers.google.com/project 建立一個專案
2. 建立完成後,點選專案名稱,進入專案管理頁 (https://console.developers.google.com/home/dashboard?project=<ur_prj_name>)
3. 在資訊主頁點選「啟用及管理 API」,在搜尋框中輸入「vision」,會帶出 Cloud Vision API,點開後按下 Enable 按紐
4. 建立專案的帳單資訊 (資訊主頁中點選 Google Developers Console 字樣左邊的三橫摃按紐,下面有帳單選項,或是利用右邊的搜尋框找帳單也行)
5. 建立一個服務帳戶(Service Account) : (資訊主頁中點選 Google Developers Console 字樣左邊的三橫摃按紐,下面有權限選項,或是利用右邊的搜尋框找權限也行)
  - 在服務帳戶的分頁下新建立一個服務帳戶,並勾選「提供一組櫎的私密金鑰」,類型選 JSON 完成後會幫你下載一個金鑰json檔,等會用到

5. 準備測試資料
  - 建立本地專案目錄 myPrj
  - 專案目錄 myPrj 下再建立圖片目錄 imgs
  - imgs 裡丟幾張要辨視的圖 (eg. )
(imgs/IMG_3076.jpg)

(imgs/IMG_3077.jpg)

6. 測試。在 https://github.com/GoogleCloudPlatform/cloud-vision 有完整的各平台範例程式 (ios, android , java, python , go ...),但 python 的很雞婆的用了 redis 要額外設定,所以修改裡頭的 textindex.py 成能辨示圖片中的文字就好了

shell> atom text-detect.py

#!/usr/bin/env python

import argparse
# [START detect_text]
import base64
import os
import re
import sys

from googleapiclient import discovery
from googleapiclient import errors
import nltk
from nltk.stem.snowball import EnglishStemmer
from oauth2client.client import GoogleCredentials
import redis

DISCOVERY_URL = 'https://{api}.googleapis.com/$discovery/rest?version={apiVersion}'  # noqa
BATCH_SIZE = 10


class VisionApi:
    """Construct and use the Google Vision API service."""

    def __init__(self, api_discovery_file='vision_api.json'):
        self.credentials = GoogleCredentials.get_application_default()

        self.service = discovery.build(
            'vision', 'v1', credentials=self.credentials,
            discoveryServiceUrl=DISCOVERY_URL)

    def detect_text(self, input_filenames, num_retries=3, max_results=6):
        """Uses the Vision API to detect text in the given file.
        """
        images = {}
        for filename in input_filenames:
            with open(filename, 'rb') as image_file:
                images[filename] = image_file.read()

        batch_request = []
        for filename in images:
            batch_request.append({
                'image': {
                    'content': base64.b64encode(images[filename])
                },
                'features': [{
                    'type': 'TEXT_DETECTION',
                    'maxResults': max_results,
                }]
            })
        request = self.service.images().annotate(
            body={'requests': batch_request})

        try:
            responses = request.execute(num_retries=num_retries)
            if 'responses' not in responses:
                return {}
            text_response = {}
            for filename, response in zip(images, responses['responses']):
                if 'error' in response:
                    print("API Error for %s: %s" % (
                            filename,
                            response['error']['message']
                            if 'message' in response['error']
                            else ''))
                    continue
                if 'textAnnotations' in response:
                    text_response[filename] = response['textAnnotations']
                else:
                    text_response[filename] = []
            return text_response
        except errors.HttpError, e:
            print("Http Error for %s: %s" % (filename, e))
        except KeyError, e2:
            print("Key error: %s" % e2)
# [END detect_text]

# [START extract_descrs]
def extract_description(texts):
    """Returns all the text in text annotations as a single string"""
    document = ''
    for text in texts:
        try:
            document += text['description']
        except KeyError, e:
            print('KeyError: %s\n%s' % (e, text))
    return document


def extract_descriptions(input_filename, texts):
    """Gets and indexes the text that was detected in the image."""
    if texts:
        document = extract_description(texts)
        print "documents in ", input_filename, ":\n", document 
    else:
        if texts == []:
            print('%s had no discernible text.' % input_filename)
# [END extract_descrs]

# [START get_text]
def get_text_from_files(vision, input_filenames):
    """Call the Vision API on a file and index the results."""
    texts = vision.detect_text(input_filenames)
    for filename, text in texts.items():
        extract_descriptions(filename, text)


def batch(iterable, batch_size=BATCH_SIZE):
    """Group an iterable into batches of size batch_size.

    >>> tuple(batch([1, 2, 3, 4, 5], batch_size=2))
    ((1, 2), (3, 4), (5))
    """
    b = []
    for i in iterable:
        b.append(i)
        if len(b) == batch_size:
            yield tuple(b)
            b = []
    if b:
        yield tuple(b)

def main(input_dir):
    """Walk through all the not-yet-processed image files in the given
    directory, extracting any text from them and adding that text to an
    inverted index.
    """
    # Create a client object for the Vision API
    vision = VisionApi()

    allfileslist = []
    # Recursively construct a list of all the files in the given input
    # directory.
    for folder, subs, files in os.walk(input_dir):
        for filename in files:
            allfileslist.append(os.path.join(folder, filename))

    for filenames in batch(allfileslist):
        get_text_from_files(vision, filenames)
# [END get_text]

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description='Detects text in the images in the given directory.')
    parser.add_argument(
        'input_directory',
        help='the image directory you\'d like to detect text in.')
    args = parser.parse_args()

    main(args.input_directory)

shell> chmod +x text-detect.py

7. 環境設定 (吼~~~~ 真的有麻煩是不是~~~ 是不是~~~~~ !)
  - 安裝 google cloud sdk
shell> curl https://sdk.cloud.google.com | bash
重新登入,初始化你的專案
shell> gcloud init
會帶出瀏覽器做必要的認証,再回到你的 termial 選擇你之前建立的專案

  - 安裝必要的 python libs :
shell > sudo easy_install pip
shell > sudo pip install google-api-python-client

  - 把先前下載的金鑰 JSON 檔拷貝到專案目錄,並易名為 credentials-key.json,然後再執行
shell> export GOOGLE_APPLICATION_CREDENTIALS=credentials-key.json

8. 執行測試
shell> ./text_detect.py imgs
結果如下

documents in imgs/IMG_3077.jpg :
Panasonic KX-T7665
1215
MESSAGE
PROGRAM
DIGITAL
SUPER HYBRID SYSTEM
VOLUME
INTERCOM
TRANSFER
AUTO ANS
AUTO DIAL

documents in imgs/IMG_3076.jpg :
ER e
VL EXIT


結論 : 要抓的數字都抓不到,有點悲傷 T_T

p.s
   1. 官方 Getting Start
   2. 執行時的認証也許需要用到  google cloud sdk 先初始化,安裝及步驟可參考 https://cloud.google.com/sdk/
   3. 如果安裝 google-api-python--client 出現無法 uninstall six 的錯誤,試著在指令後面加上 --ignore-installed six,並在 ~/.bash_profile 加上 export PYTHONPATH=/Library/Python/2.7/site-packages

沒有留言:

張貼留言