`man -k'
keyword retrieval mechanism for Unix man pages.
Effectiveness results were analysed using the measures of recall and precision:
R = Recall = Relevant doc. retrieved/ Relevant doc. in the library P = Precision = Relevant doc. retrieved/ Total doc. retrievedShort descriptions of man pages for 459 Unix commands were analysed. Effectiveness results were computed for two queries:
`grep'
family commands (`search a file for a string')
were the only commands in the analysed man pages
considered relevant to the queries.
Search by word | R | P |
---|---|---|
look | 0 | 0 |
string | 1 | 0.2 |
file | 1 | 0.008 |
Table 3 shows the values of recall and precision resulting
from using `man -k'
to search for commands with
keywords extracted from Q1. Searching with the `look'
term, resulted in no relevant retrieved commands and thus
a zero value for both recall and precision. Searching with
the other terms in the query (`string' and `file') succeeded
in retrieving the relevant command. However, very small
precision values were reached for both terms.
Query | R | P | Threshold | 'SCase' for the |
---|---|---|---|---|
(SCase>Threshold) | retrieved doc. | |||
'look for a string in a file' | 1 | 1 | 0.66 | 1 |
'examine a document for lines matching a string' | 1 | 1 | 033 | 0.41 |
As opposed to `man -k'
, very good effectiveness was
reached for the query Q1 through the main retrieval
mechanism (Table 4). The `grep'
command was retrieved
with a similarity value of 1: all semantic cases in the query
had a match and terms in semantic cases were identical or
synonyms. The precision value was 1 for a threshold of
0.66, i. e. next computed similar descriptions had a
similarity value not greater than 0.66. Thus, non relevant
components that can affect precision values if retrieved,
were placed on a low enough threshold through their
similarity value. Recall was ensured by handling
synonyms: synonym (`look for', `search').
Search by word | R | P |
---|---|---|
examine | 0 | 0 |
document | 0 | 0 |
lines | 0 | 0 |
match | 0 | 0 |
string | 1 | 0.2 |
Table 5 shows the values of recall and precision resulting
from using `man -k'
to search for commands with keywords
extracted from Q2. Searching with four of the keywords
resulted in no relevant retrieved command and thus a zero
value for both recall and precision. Only searching with the
term `string' succeeded in retrieving the relevant command.
However, a small precision value was reached.This
example shows how simple keyword retrieval mechanisms
like `man -k'
are inappropriate to deal with descriptions that
are conceptually similar, like Q1 and Q2.
For Q2, the retrieval mechanism proposed in this work
succeeded to retrieve the relevant command (Table 4)
because it also looked for generalizations or specializations
of descriptions. Also, it located the relevant command, even
though the term `string' was discarded (qualifiers in a
sentence are currently ignored). The `grep'
command was
retrieved with a similarity value of 0.41 and highest score:
all semantic cases in the query had a match and terms in the
semantic cases of the retrieved command were
generalization or specializations of terms in Q2:
Site Hosting: Bronco