The relationship between dictionary look-up frequency and corpus frequency revisited: a log-file analysis of a decade of user interaction with a Swahili-English dictionary

In an earlier publication it was claimed that there is no useful relationship between Swahili- English dictionary look-up frequencies and the occurrence frequencies for the same wordforms in Swahili-English corpora, at least not beyond the top few thousand wordforms. This result was challenged us...

Full description

Saved in:
Bibliographic Details
Main Authors: de Schryver, Gilles-Maurice, Wolfer, Sascha, Lew, Robert
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2019
Online Access:http://journalarticle.ukm.my/14095/1/34284-114239-1-PB.pdf
http://journalarticle.ukm.my/14095/
http://ejournal.ukm.my/gema/issue/view/1227
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-ukm.journal.14095
record_format eprints
spelling my-ukm.journal.140952020-01-31T23:04:31Z http://journalarticle.ukm.my/14095/ The relationship between dictionary look-up frequency and corpus frequency revisited: a log-file analysis of a decade of user interaction with a Swahili-English dictionary de Schryver, Gilles-Maurice Wolfer, Sascha Lew, Robert In an earlier publication it was claimed that there is no useful relationship between Swahili- English dictionary look-up frequencies and the occurrence frequencies for the same wordforms in Swahili-English corpora, at least not beyond the top few thousand wordforms. This result was challenged using data for German by a different team of researchers using an improved methodology. In the present article the original Swahili-English data is revisited, using ten years’ worth of it rather than just two, and using the improved methodology. We conclude that there is indeed a positive relationship. In addition, we show that online dictionary look-up behaviour is remarkably similar across languages, even when, as in our case, one is dealing with languages from very dissimilar language families. Furthermore, online dictionaries turn out to have minimum look-up success rates, below which they simply cannot go. These minima are language-sensitive and vary depending on the regularity of the searched-for entries, but are otherwise constant no matter the size of randomly sampled dictionaries. Corpus-informed sampling always improves on any random method. Lastly, from the point of view of the graphical user interface, we argue that the average user of an online bilingual dictionary is better served with a single search box, rather than separate search boxes for each dictionary side. Penerbit Universiti Kebangsaan Malaysia 2019-11 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/14095/1/34284-114239-1-PB.pdf de Schryver, Gilles-Maurice and Wolfer, Sascha and Lew, Robert (2019) The relationship between dictionary look-up frequency and corpus frequency revisited: a log-file analysis of a decade of user interaction with a Swahili-English dictionary. GEMA: Online Journal of Language Studies, 19 (4). pp. 1-27. ISSN 1675-8021 http://ejournal.ukm.my/gema/issue/view/1227
institution Universiti Kebangsaan Malaysia
building Tun Sri Lanang Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Kebangsaan Malaysia
content_source UKM Journal Article Repository
url_provider http://journalarticle.ukm.my/
language English
description In an earlier publication it was claimed that there is no useful relationship between Swahili- English dictionary look-up frequencies and the occurrence frequencies for the same wordforms in Swahili-English corpora, at least not beyond the top few thousand wordforms. This result was challenged using data for German by a different team of researchers using an improved methodology. In the present article the original Swahili-English data is revisited, using ten years’ worth of it rather than just two, and using the improved methodology. We conclude that there is indeed a positive relationship. In addition, we show that online dictionary look-up behaviour is remarkably similar across languages, even when, as in our case, one is dealing with languages from very dissimilar language families. Furthermore, online dictionaries turn out to have minimum look-up success rates, below which they simply cannot go. These minima are language-sensitive and vary depending on the regularity of the searched-for entries, but are otherwise constant no matter the size of randomly sampled dictionaries. Corpus-informed sampling always improves on any random method. Lastly, from the point of view of the graphical user interface, we argue that the average user of an online bilingual dictionary is better served with a single search box, rather than separate search boxes for each dictionary side.
format Article
author de Schryver, Gilles-Maurice
Wolfer, Sascha
Lew, Robert
spellingShingle de Schryver, Gilles-Maurice
Wolfer, Sascha
Lew, Robert
The relationship between dictionary look-up frequency and corpus frequency revisited: a log-file analysis of a decade of user interaction with a Swahili-English dictionary
author_facet de Schryver, Gilles-Maurice
Wolfer, Sascha
Lew, Robert
author_sort de Schryver, Gilles-Maurice
title The relationship between dictionary look-up frequency and corpus frequency revisited: a log-file analysis of a decade of user interaction with a Swahili-English dictionary
title_short The relationship between dictionary look-up frequency and corpus frequency revisited: a log-file analysis of a decade of user interaction with a Swahili-English dictionary
title_full The relationship between dictionary look-up frequency and corpus frequency revisited: a log-file analysis of a decade of user interaction with a Swahili-English dictionary
title_fullStr The relationship between dictionary look-up frequency and corpus frequency revisited: a log-file analysis of a decade of user interaction with a Swahili-English dictionary
title_full_unstemmed The relationship between dictionary look-up frequency and corpus frequency revisited: a log-file analysis of a decade of user interaction with a Swahili-English dictionary
title_sort relationship between dictionary look-up frequency and corpus frequency revisited: a log-file analysis of a decade of user interaction with a swahili-english dictionary
publisher Penerbit Universiti Kebangsaan Malaysia
publishDate 2019
url http://journalarticle.ukm.my/14095/1/34284-114239-1-PB.pdf
http://journalarticle.ukm.my/14095/
http://ejournal.ukm.my/gema/issue/view/1227
_version_ 1657565467174240256
score 13.19449