Options
Background data (adapted from Jenset McGillivray 2017) for: Down-sampling from hierarchically structured corpus data
Contributor(s):
Contact Person:
Producer:
Alan Turing Institute, University Of Cambridge
Publisher Information:
DataverseNO
Year of publication:
2023
Language:
English
DOI:
Abstract:
Dataset description This dataset, which is adapted from Jenset and McGillivray (2017), contains tabular files documenting the alternating usage of -(e)th and -(e)s to mark third-person verb inflection in Early Modern English. The data provided by Jenset and McGillivray (2017) are drawn from the PPCEME corpus (Kroch et al. 2004) and cover the period from 1500 to 1700. In total, 13,757 third-person singular tokens (excluding the verb BE) were annotated by these authors for a range of variables. For the purposes of the present methodological study, this dataset was reduced to a subset of 11,645 tokens, and the coding of variables was in some parts revised, completed, or modified. The dataset includes information about the Author and Verb Lemma, as well as a number of predictor variables, including Genre, Year, Frequency (of the verb lemma in the third-person singular), Phonological Context (stem-final sound), and the Gender of the author.
Type:
Dataset
Keywords: ; ; ; ; ; ; ; ; ;
Early Modern English
verb inflection
language change
lexical diffusion
third person singular
methodology
down-sampling
corpus linguistics
PPCEME
Penn-Helsinki Parsed Corpus of Early Modern English
Format:
text/tab-separated-values
Version:
1.1
Permalink
https://fis.uni-bamberg.de/handle/uniba/92242