Datastoreで検索エンジンを作る

29 Oct 2016

Daigo Ikeda

Knightso, LLC

Profile

Daigo Ikeda
@hogedigo

Knightso, LLC
http://www.knightso.co.jp/
Shizuoka, JAPAN

Datastore Indexおさらい

Sample: Book Store

Model:

type Book struct {
    ID          string
    Title       string
    Category    string
    Price       int
    PublishDate string
}

Sample: Book Store

Run Query:

q := datastore.NewQuery("Book").Limit(20)

// apply filters here

var books []*Book
_, err := q.GetAll(ctx, &books)

Equality Filter:

q = q.Filter("Category = ", "趣味")

Inequality Filter:

q = q.Filter("Price >= ", 1000)

Sort:

q = q.Order("Price")

Single Property Index

type Hoge struct {
    Value string `datastore:",noindex"`
}

Composite Property Index

index.yaml

indexes:

- kind: Book
  properties:
  - name: Category
  - name: Price
    direction: desc

Index Example

List Property

type Hoge struct {
    Values []string
}

List Property Index Example

ZigZag Merge JOIN

Datastoreだけで検索エンジン作ろう!

How

N-gram(Bigram)

AppEngine → ap, pp, pe, en, ng, gi, in, ne

Prefix

AppEngine → a, ap, app, appe, appen, appeng, appengi, appengin, appengine

形態素解析

貴社の記者が汽車で帰社した → 貴社, の, 記者, が, 汽車, で, 帰社, した

併用

Why not Search API?

// search for documents with pianos that cost less than $5000
index.Search(ctx, "Product = piano AND Price < 5000", nil)

Why not BigQuery?

Why not Cloud SQL?

実装

Entity Model

type Book struct {
    ID          string `datastore:",noindex"`
    Title       string `datastore:",noindex"`
    Category    string `datastore:",noindex"`
    Price       int    `datastore:",noindex"`
    PublishDate string `datastore:",noindex"`
}

Index Model

インデックス保存用のエンティティを用意する

type BookIndex struct {
    Indexes     []string
    Title       string
    Category    string
    Price       int
    PublishDate string
}

Bookと同じKey名で保存する(Kindが異なる)

Indexesプロパティに解析したインデックスを保存
ソートに必要なプロパティも保存

Save Indexes

Composite Property Index

eg) ORDER BY PublishDate DESC, Price, Category

index.yaml

- kind: BookIndex
  properties:
  - name: Indexes
  - name: PublishDate
    direction: desc
  - name: Price
  - name: Category

Search!

q := datastore.NewQuery(KindBookIndex).Limit(QUERY_LIMIT + 1).KeysOnly()

if req.Title != "" {
    for _, w := range bigram(req.Title) {
        q = q.Filter("Indexes =", "t " + w)
    }
}

if req.Category != "" {
    q = q.Filter("Indexes =", "c " + req.Category)
}

if req.Price != "" {
    q = q.Filter("Indexes =", "p " + req.Price)
}

// 最初のソートに指定したプロパティにはInequality Filterを使える! 
if req.PublishDateFrom != "" {
    q = q.Filter("PublishDate >=", req.PublishDateFrom)
}

if noParams {
    q = q.Filter("Indexes =", createIndex("", "ALL"))
}

q = q.Order("-PublishDate").Order("Price").Order("Category")

keys := make([]*datastore.Key, 0, QUERY_LIMIT)

ite := q.Run(ctx)

for len(keys) < QUERY_LIMIT {
    idxKey, err := ite.Next(nil)
    if err == datastore.Done {
        break
    }

    ...snip...

    key := datastore.NewKey(ctx, KindBook, idxKey.StringID(), 0, nil)
    keys = append(keys, key)
}

books := make([]*Book, len(keys))
if len(books) > 0 {
    if err := datastore.GetMulti(ctx, keys, books); err != nil {
        return nil, fmt.Errorf("GetMulti failed: %s", err)
    }
}

On-Memory Filter

Inequality Filterは最初のソート対象プロパティに対してしか適用できない
他プロパティに対してフィルターかけたい場合はプログラムでやるしかない

//KeysOnly外す
for len(keys) < QUERY_LIMIT {
    var book Book
    idxKey, err := ite.Next(&book)

    ...snip...

    if book.Price < priceFrom || book.Price >= priceTo {
        continue
    }

    books = append(books, book)
}

独自複合インデックス

Merge Joinにも弱点がある

上記を解決する為に、よく使用される条件の組み合わせで予めインデックスを作成しておく。

Projection Query

SELECT PublishDate, Price, Category FROM Book

必要なプロパティのみ取得

q := datastore.NewQuery(KindBookIndex).
    Project("PublishDate", "Price", "Category").
    Limit(QUERY_LIMIT + 1)

まとめ

Pros

Cons

Single Propety Indexだけで実装したい!

Index Model

type BookIndex struct {
    Indexes []string
}

Index Key

eg) ORDER BY Price, PublishDate DESC, Category

<Price>:<PublishDate>:<Category>:<ID>

Thank you

29 Oct 2016

Daigo Ikeda

Knightso, LLC