CSC/ECE 517 Fall 2013/oss E815 saa: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 12: Line 12:
The following are the possible matches that can be found :
The following are the possible matches that can be found :


* If v and w are exactly the same. This match is given a weight value of 6.
* If v and w are <b>exactly</b> the same. This match is given a weight value of 6.


* If v and w are synonymous. This match is given a weight of 5.
* If v and w are synonymous. This match is given a weight of 5.
Line 20: Line 20:
* If v is a meronym of w (i.e., v is a sub-part of w) or vice versa. Or v is a holonym of w (i.e., v contains w as a sub-part) or vice-versa. For example, “leg” is a meronym of the token “body” and “body” is the holonym of the term “leg”. This type of match is given a weight of 3.
* If v is a meronym of w (i.e., v is a sub-part of w) or vice versa. Or v is a holonym of w (i.e., v contains w as a sub-part) or vice-versa. For example, “leg” is a meronym of the token “body” and “body” is the holonym of the term “leg”. This type of match is given a weight of 3.


*
* If v and w have common parents

Revision as of 21:13, 30 October 2013

Refactoring and Testing of wordnet_based_similarity.rb


Introduction to WordNet

Expertiza is an application that aims at 'reusable learning objects through peer review'. It allows students to submit their work and review other's work. Our OSS project aims at refactoring and testing of the wordnet_based_similarity.rb. The file determines how similar concepts are, based on the WordNet database and is used in NLP analysis of reviews. WordNet is widely used resource for measuring similarity. It is a network of nouns,verbs,adjectives and adverbs which are grouped into synsets(synonymous words), and linked by lexical relations. WordNet is faster to query and involves no additional pre-processing despite having limitations in terms of domains it covers and lack of real world knowledge when compared to Wikipedia.It allows comparisons across different word forms.

WordNet relations-based semantic metric

In order to identify similarity, a relations-based metric is used. Relatedness between two terms v and w, known as match(v, w) is one of those listed below. Each of these different types of matches is given a weight value based on the importance of the match that is found. Hence the matches are assigned values in the range of 0–6. A value of 6 is assigned when the best match (an exact match) occurs and a value of 0 is assigned when a distinct or non-match occurs. The following are the possible matches that can be found :

  • If v and w are exactly the same. This match is given a weight value of 6.
  • If v and w are synonymous. This match is given a weight of 5.
  • If v is a hypernym of w (i.e., v is more generic than the token w) or vice versa. Or v is a hyponym of w (i.e., v is a more specific form of w)or vice-versa. This match is given a weight of 4.
  • If v is a meronym of w (i.e., v is a sub-part of w) or vice versa. Or v is a holonym of w (i.e., v contains w as a sub-part) or vice-versa. For example, “leg” is a meronym of the token “body” and “body” is the holonym of the term “leg”. This type of match is given a weight of 3.
  • If v and w have common parents