Corpus Linguistics and Language Teaching (LFTM05)

CorpusLinguistics and Language Teaching (LFTM05)

Assignment1

BY SUBMITTING THIS PAPER, YOUCONFIRM THAT YOU COMPLETED IT ON YOUR OWN, WITHOUT CONSULTING FELLOWSTUDENTS.

YOU AREALLOWED TO USE ANY MATERIALS YOU WISH TO HELP YOU ANSWER THEQUESTIONS.

Type your answers in the boxesbeneath the questions by clicking on ‘Click here to enter text’.You can save this file as you go along.

PLEASE ENTER YOUR NAME IN THE BOXBELOW:

Click here toenter text.

SEND YOUR COMPLETED TEST PAPER TOME BY EMAIL ([email protected])ON OR BEFORE 6.00pm GMT ON MONDAY 14TH MARCH2016. THIS TEST IS WORTH 25% OF YOUR TOTAL MARK FOR THE MODULE. THEREARE 100 MARKS AVAILABLE.

SECTION A [36 marks]

This section contains questions onusing the Corpus of Contemporary American English (COCA).

  1. What search string is needed to carry out the following searches in COCA? Just write the search in the answer boxes, nothing else.

    1. A search for all instances of the lemma GROW. [1]

grow

    1. A search for synonyms of sad. [1]

[=sad]

    1. All instances of the word change functioning as a verb. [1]

[change* vb]

    1. All sequences of ‘in the X of a Y Z’ where X = any noun, Y = any adjective and Z = any noun. [3]

[*adj*]

    1. All forms of the verb get followed by an –ing participle. [2]

[get* ing]

    1. ‘What a(n)’ followed by any noun. [2]

[whata *n]

    1. The base form of all verbs beginning with the consonant cluster pl-. [2]

[pl*v]

  1. What is the most frequent item belonging to the following word classes in the whole of COCA? Just write the word in the answer boxes, nothing else.

    1. Reflexive pronoun [2]

himself

    1. Plural noun [2]

students

    1. Modal verb [2]

would

    1. Comparative adjective [2]

better

  1. List the top four singular nouns in the following sections of COCA. (You should write four words in each box, nothing else.)

    1. Science fiction/fantasy [4]

Way,man, head, hand

    1. Academic (Humanities) [4]

Music, art, work, education

    1. Academic (Medicine) [4]

Health, study, patient, care

    1. News (Money) [4]

Company, business, market, money

  1. In the whole of COCA, what are the most frequent collocates of the following words and phrases?

    1. Most frequent singular noun collocate of the word blue occurring in a span of 4 to the right of the node? [2]

sky

    1. Most frequent adjective collocate of the word feels occurring in a span of 1 to the right of the node? [2]

good

    1. Most frequent plural noun collocate of Obama occurring in a span of 4 to the right and 4 to the left of the node? [2]

Democrats

SECTION B [20 marks]

This section contains questions onusing BNCweb.

  1. What is the most economical query syntax required for the following searches? Just write the search in the answer boxes, nothing else.

  1. All adjectives ending in -ly. [1]

{***-ly}

  1. Any word ending in -ful. [1]

{***-ful}

  1. All nouns ending in -sation or -zation. [2]

{*-sation*-zation}

  1. Using the ‘frequency list’ function, answer the following questions:

  1. What is the most frequent man’s first name beginning with B in the whole BNC? [2] Brian

  2. What is the most frequent city beginning with C in the whole BNC? [2]

California

  1. What is the most frequent interjection in the whole of the BNC? [2]

What

  1. Use ‘standard query’ and ‘distribution’ to find out about the distribution of the lemma CHAT (use Simple Query Syntax help if you need to). Answer the following questions.

  1. Is the verb chat more common in ‘fiction and verse’ or ‘spoken conversation’? [2]

Spokenconversation

  1. Is the verb chat more common in written texts by males or written texts by females? [2] Females

  2. Is the noun chat more common in ‘fiction and verse’ or ‘spoken conversation’? [2]

Fictionand verse

  1. The table below shows the distribution by gender of the word darling in the demographically sampled part of the spoken component of the BNC. What are the values for A, B, C and D? [4]

The following distribution was found:

Sex:

Category

No. of words

No. of hits

Dispersion (over speakers

Frequency per million words

Female

2,264,094

308

102/559

103.6

Male

1,454344

126

44/509

45.8

total

3,718,438

541

146/1,068

145.49

SECTION C [6 marks]

This section contains questions onusing AntConc.

Load An Outcast of the Islands(in SunSpace Unit 5) into AntConc and answer the followingquestions (remember to ‘treat all data as lower case’).

  1. What is the most frequent word in the novel? [1]

Islands

  1. What is the most frequent 3-word cluster/n-gram in the novel? [2]

the

  1. Using the default wild-card settings, what query syntax is needed to search for all words ending with the sequence –ms? [2]

{***}

  1. Is the concordance plot below for Willems or Lingard? [1]

Lingard

SECTION D [32 marks]

This section tests your knowledge ofconcepts in corpus linguistics and language pedagogy. It consists ofa series of definitions. You have to supply the words or phraseswhich match the definitions and write them in the box.

  1. Lexicon

The typicalgrammatical patterning into which a word or grammatical constructionenters. [2]

  1. Cross-reference matrix

The number ofwords either side of a search item to set the parameters whensearching for co-occurrence patterns. [2]

  1. General corpus

A large corpuswhich attempts to represent the general usage of a language throughthe careful selection of a cross-section of texts. [2]

  1. Semantic prosody

The tendencyfor certain words to be related to a particular area of evaluative orattitudinal meaning, even when the meaning of the word itself appearsto be evaluatively neutral. [2]

  1. Lexical density

A measurementof the proportion of lexical words to function words in a text. [2]

  1. Inductive approach

An approach tolearning in which students ‘discover’ rules and probabilitiesfrom the corpus examples they find. [2]

  1. Normalise

A process toenable direct comparisons of frequency to be made when comparingtexts and corpora of different sizes. [2]

  1. Key Word In Context (KWIC)

The onscreendisplay of a concordance in which the search word is centred, withco-text on either side. [2]

  1. annotation

The term usedto describe a corpus used as a background against which to comparefindings from another corpus (particularly in relation to theanalysis of keyness). [2]

  1. Noun

The canonicalform of a word. [2]

POStagging

The process bywhich each word in a corpus is assigned a label according to whichword class it belongs. [2]

  1. Lexical relations

Therelationship between an individual word and a set of semanticcategories. [2]

  1. Open class

This class ofwords is sometimes referred to as grammatical words and consists ofpronouns, prepositions, determiners, conjunctions, etc. [2]

  1. Keyword

A word which isfound significantly more often in one corpus when compared with areference corpus. [2]

  1. Dispersion

Ameasure of the rate of occurrence of a word or phrase across aparticular file or corpus. [2]

  1. Log-Likelihood

The defaultstatistical measure of keyness in AntConc. [2]

ENDOF TEST

10