[논문] Inductive matrix completion based on gnn

환상상상속상 2021. 7. 3. 15:33

INTRODUCTION

1. 새로운 방법을 제안한다.

we propose an inductive matrix completion model without side information.

2. 기존의 방법들은 transductive한 방법을 사용했다. inductive하게 만들기 위해서 content를 사용하였다.

by factorizing the rating matrix into the product of low-dimensional latent embeddings of rows and columns, a majority of existing matrix completion methods are transductive, since the learned embedding cannot generalize to unseen rows/columns or to new matrices.

to make matrix completion inductive, most previous works use content( side information ), such as user's age or movie' title to make predictions

3. 기존 방법의 문제점: content(side information)을 추출하는 것은 어려울 수 있다.

however, high-quality content is not always available, and can be hard to extract.

4. 기존 방법의 문제점을 극복할 수 있는 방법은 없을까?

under the extreme setting where not any side information is available other than the matrix(rating matrix외에 다른 정보를 구할 수 없는) to complete, can we still learn an inductive matrix completion model?

5. 새로운 해결 방법을 이 논문에서 제안한다.

in this paper, we propose an inductive graph-based matrix completion model to address this problem.

6. 새로운 해결 방법 IGMC 설명

IGMC trains gnn based on purely on 1-hop subgraphs around pairs (u,v) generated from the rating matrix and maps these subgraphs to their corresponding ratings.

7. 기존의 방법과 성능 비교

it achieves highly competitive performace with sota transductive baselines. in addition this model is inductive. it can generalize to users/items unseen during the training, and can even trans-fer to new tasks. our transfer learning experiments show that a model trained out of movielens dataset can be directly used to predict douban movie ratings with surprisingly good performance.

8. 결론

our work demonstrates that: 1) it is possible to train inductive matrix comletion models without using side information while achieving similar or better performances that sota transductive methods; 2) local graph patterns around a target user, item pair are effective predictors of the rating this user gives to the item; and 3) long-range dependencies might not be necessary for modeling recommender systems.

IGMC

we now present our IGMC framework.

3. 표기법

we use G to denote the undirected bipartite graph constructed from the given rating matrix R. in G, a node is either a user(denoted by u, corresponding to a row in R) or an item(denoted by i, corresponding to a column in R). Edges can exist between user and item, but cannot exist between two users or two items. Each edge (u,v) has a value corresponding to that rating that u gives to v. we use R' to denote the set of all possible ratings and use N(u) to denote the set of u's neighbors that connect to u with edge type r.

3.1. 서브 그래프 추출

the first part of IGMC is enclosing subgraph extraction.

서브 그래프를 뽑는다.

for each observed rating, we extract h-hop enclosing subgraph around (u,v) from G. algorithm describes the bfs procedure for extracting h-hop enclosing subgraphs.

서브 그래프를 모델의 학습데이터로 준다.

we will feed these enclosing subgraphs to a GNN and regression on their ratings.

타겟 데이터를 입력하면 학습된 모델을 통해 평점을 예측할 수 있다.

then, for each tesiting (u,v) pair, we again extract its h-hop encloing subgraph from G. and use the trained GNN model to predict rating. note that after extracting a training subgraph for (u,v), we should remove the edge because it is the target to predict.

3.2. 노드 라벨링

the second part of IGMC is node labeling

학습 단계전에 이 단계를 수행해야한다.

before we feed an enclosing subgraphs on the GNN, we first apply a node labeling to it, which gives an integer label to every node in the subgraph.

node labeling을 하는 방식의 목적은 특정 서브 그래프내에서 노드들에 서로 다른 label을 부여해서 서로 다른 역할을 맡기 위해서다.

the purpose is to use different labels to mark nodes' different roles in a subgraph.

목적(노드의 서로 다른 역할을 맡게하기 위함)의 이유는 1)과 2)다. GNN이 노드의 역할을 구분하게 해야한다.

ideally, our node labeling should be able to: 1) distinguish the target user and target item between which the target rating is located. and 2) differentiate user-type nodes from item-type nodes. otherwise, the GNN cannot tell between which user and item to predict the rating, and might lose node-type information.

어떻게 구현할 것인가?

to satisfy these conditions, we propose a node labeling as follows: we first give label 0 and 1 to the target user and target item, respectively. then we determine other other nodes' labels according to at which hop they are included in the subgraph in algo1. if a user-type node is included at the i-th hop, we will give it a label 2i. if an item-type node is included at the i-th hop, we will give it 2i+1.

기대되는 효과는 이렇다.

such a node labeling can sufficiently discriminate: 1) target nodes from context nodes, 2) users from items, and 3) nodes of different distances to the target rating.

다른 방법을 사용하여 node labeling을 할 수도 있다.

note that this is not the only possible way of node labeling, but we empirically verified its excellent performance.

the one-hot encoding of these node labels will be treated as the initial node features x0 of the subgraph when fed to the GNN. (?)

node labeling 단계를 하는 이유

note that our node labels are determined completely inside each enclosing subgraphs, thus are independent of the global bipartite graph. given a new enclosing subgraphs, we can as well predict its ratings even if all of its nodes are from a different bipartite graph, because IGMC purely relies on graph patterns within local enclosing subgraphs without leveraging any global information sprecific to the bipartite graph.

3.3. GNN 구조

the third part of IGMC is to train a GNN model predicting ratings from the enclosing subgraphs.

기존 방법과의 비교: 기존의 방법은 전체 그래프에 적용하여 node embedding을 추출하였다.

in previous node-based approaches such as GC-MC, a node level GNN is applied to the entire bipartite graph to extract node embeddings. then, the node embeddings of u and v are input to an inner-product or ...

반면에 제안하는 방법은 이렇다.

in contrast, IGMC applies a graph-level GNN to the enclosing subgraph around (u,v) and maps the subgraph to the rating.

(node level vs graph level: 노드를 통해서 분류한다. 그래프를 통해서 분류한다.)

구조소개: 1단계 feature vector extraction, 2단계 pooling layer로 subgraph represenation 표현

there are thus two components in our GNN: 1) message passing layers that extract a feature vector for each node in the subgraph, and 2) a pooling layer to summarize a subgraph represenation from node features.

1단계 설명: R-GCN 사용.

to learn the rich graph patterns introduced by the different edge type, we adopt the relational graph convolutional operator(R-GCN) as our GNN's message passing layers, whis has the following form.

수학 식이 존재하는데 대충 gnn's massege passing layer마다 노드 i와 이웃 노드들의 정보를 활용해서 다음 layer의 노드 i에 대한 feature vector을 만들어내고 최종 layer까지 node에 대한 feature vector을 구하고 나면 모든 layer의 output인 노드에 대한 feature vector을 concat한다.

hi = concat(x1, x2, .... , xL)

2단계 설명: pooling layer

그러면 graph level feature representation을 위해서 앞서 구한 node representation을 averaging, summing 등 다양한 방법 중 잘 맞는 방법을 사용해서 graph represenation을 구한다. 여기서는 node representation을 concat하여 graph representaion으로 바꿨다. 이때 context node보다 target node가 중요하기때문에 두 user, item node representaion을 concat하여 표현하였다.

g = concat(hu, hv)

마지막 예측

이 graph represenation을 MLP에 통과시키면 rating을 예측할 수 있다.

prediction r = w.T * ReLU(w * g)

3.4. 모델 학습

MSE를 사용하는데 이때 rathing matrix를 0/1 mask matrix로 치환한 matrix정보를 활용한다. 본 영화(= 1)에 한하여 예측값과 실제값을 비교하여 loss를 구한다.

node representaion을 구하는 R-GCN layer은 rating type에 따라서 서로 다른 parameters를 가지고 있다. 한계점이 rating에 대한 크기나 순서를 인식하지 못한다는 점이다. 이를 개선하기 위해서 frobenius norm of matrix를 사용하여 ARR technique을 적용하였다.

최종적으로 MSE, ARR LOSS를 사용하여 모델을 학습시켰다.

CONCLUSION

In this paper, we have proposed IGMC model. instead of learning transductive latent features, this one learns local graph patterns related to ratings inductively based on gnn. compared to previous inductive matrix completion methods, this idea does not rely on content(side information) of users/items. we show that this one has highly competitive performance compared to sota transductive baselines. in addition this is transferrable to new tasks without any retraining, a property much desired in those recommendation tasks having few training data. we hope this can provide a new idea to matrix completion and recsys.