Sponsored Links

Jumat, 05 Januari 2018

Sponsored Links

Supervised similarity: Learning symmetric relations from duplicate ...
src: explosion.ai

Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn from examples a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification.


Video Similarity learning



Learning setup

There are four common setups for similarity and metric distance learning.

Regression similarity learning
In this setup, pairs of objects are given ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} together with a measure of their similarity y i ? R {\displaystyle y_{i}\in R} . The goal is to learn a function that approximates f ( x i 1 , x i 2 ) ~ y i {\displaystyle f(x_{i}^{1},x_{i}^{2})\sim y_{i}} for every new labeled triplet example ( x i 1 , x i 2 , y i ) {\displaystyle (x_{i}^{1},x_{i}^{2},y_{i})} . This is typically achieved by minimizing a regularized loss min W ? i l o s s ( w ; x i 1 , x i 2 , y i ) + r e g ( w ) {\displaystyle \min _{W}\sum _{i}loss(w;x_{i}^{1},x_{i}^{2},y_{i})+reg(w)} .
Classification similarity learning
Given are pairs of similar objects ( x i , x i + ) {\displaystyle (x_{i},x_{i}^{+})} and non similar objects ( x i , x i - ) {\displaystyle (x_{i},x_{i}^{-})} . An equivalent formulation is that every pair ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} is given together with a binary label y i ? { 0 , 1 } {\displaystyle y_{i}\in \{0,1\}} that determines if the two objects are similar or not. The goal is again to learn a classifier that can decide if a new pair of objects is similar or not.
Ranking similarity learning
Given are triplets of objects ( x i , x i + , x i - ) {\displaystyle (x_{i},x_{i}^{+},x_{i}^{-})} whose relative similarity obey a predefined order: x i {\displaystyle x_{i}} is known to be more similar to x i + {\displaystyle x_{i}^{+}} than to x i - {\displaystyle x_{i}^{-}} . The goal is to learn a function f {\displaystyle f} such that for any new triplet of objects ( x , x + , x - ) {\displaystyle (x,x^{+},x^{-})} , it obeys f ( x , x + ) > f ( x , x - ) {\displaystyle f(x,x^{+})>f(x,x^{-})} . This setup assumes a weaker form of supervision than in regression, because instead of providing an exact measure of similarity, one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large scale applications.
Locality sensitive hashing (LSH)
hashes input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in nearest neighbor search on large scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases.

A common approach for learning similarity, is to model the similarity function as a bilinear form. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} .


Maps Similarity learning



Metric learning

Similarity learning is closely related to distance metric learning. Metric learning is the task of learning a distance function over objects. A metric or distance function has to obey four axioms: non-negativity, Identity of indiscernibles, symmetry and subadditivity / triangle inequality. In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric.

When the objects x i {\displaystyle x_{i}} are vectors in R d {\displaystyle R^{d}} , then any matrix W {\displaystyle W} in the symmetric positive semi-definite cone S + d {\displaystyle S_{+}^{d}} defines a distance pseudo-metric of the space of x through the form D W ( x 1 , x 2 ) 2 = ( x 1 - x 2 ) ? W ( x 1 - x 2 ) {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }W(x_{1}-x_{2})} . When W {\displaystyle W} is a symmetric positive definite matrix, D W {\displaystyle D_{W}} is a metric. Moreover, as any symmetric positive semi-definite matrix W ? S + d {\displaystyle W\in S_{+}^{d}} can be decomposed as W = L ? L {\displaystyle W=L^{\top }L} where L ? R e × d {\displaystyle L\in R^{e\times d}} and e >= r a n k ( W ) {\displaystyle e\geq rank(W)} , the distance function D W {\displaystyle D_{W}} can be rewritten equivalently D W ( x 1 , x 2 ) 2 = ( x 1 - x 2 ) ? L ? L ( x 1 - x 2 ) = ? L ( x 1 - x 2 ) ? 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }L^{\top }L(x_{1}-x_{2})=\|L(x_{1}-x_{2})\|_{2}^{2}} . The distance D W ( x 1 , x 2 ) 2 = ? x 1 ? - x 2 ? ? 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=\|x_{1}'-x_{2}'\|_{2}^{2}} corresponds to the Euclidean distance between the projected feature vectors x 1 ? = L x 1 {\displaystyle x_{1}'=Lx_{1}} and x 2 ? = L x 2 {\displaystyle x_{2}'=Lx_{2}} . Some well-known approaches for metric learning include Large margin nearest neighbor, Information theoretic metric learning (ITML).

In statistics, the covariance matrix of the data is sometimes used to define a distance metric called Mahalanobis distance.


Miguel A Bautista on Twitter:
src: pbs.twimg.com


Applications

Similarity learning is used in information retrieval for learning to rank, in face verification or face identification, and in recommendation systems. Also, many machine learning approaches rely on some metric. This includes unsupervised learning such as clustering, which groups together close or similar objects. It also includes supervised approaches like K-nearest neighbor algorithm which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches .


Machine Learning - Text Similarity with Python - YouTube
src: i.ytimg.com


Scalability

Metric and similarity learning naively scale quadratically with the dimension of the input space, as can easily see when the learned metric has a bilinear form f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . Scaling to higher dimensions can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL, and with COMET.


Atefe Dehghan on Twitter:
src: pbs.twimg.com


See also

  • Latent semantic analysis

Intelligent Learning: Similarity Control and Knowledge Transfer ...
src: i.ytimg.com


Further reading

For further information on this topic, see the surveys on metric and similarity learning by Bellet et al. and Kulis.


How the Brain Learns: Chapter 4 - ppt video online download
src: slideplayer.com


References

Source of the article : Wikipedia

Comments
0 Comments