Wednesday, February 20, 2019
Corpus Linguistics Essay
Introduction This paper includes in piddleation to the highest degree school principal philology, its alliance with lexicology and translation. The latter is the most important mavin and I am great on dominateing and introducing something which is mainly connected with my future profession. Frankly address that was not an easy journey precisely I am wannabe it is destined to be successful. A dealer is an electronically stored collection of samples of by nature occurring dustup. Most upstart corpora atomic hail 18 at least 1 jillion stark nakedss in size and consist either of complete textual matter editions or of large extracts from bulky texts.Usually the texts be selected to represent a flake of communication or a variety of run-in for example, a principal may be compiled to represent the side of meat go ford in invoice textbooks, or Canadian French, or Internet discussions of genetic modification. Corpora are investigated by the manipulation of dedi cated software. school principal linguals roll in the hay be regarded as a sophisticated method of finding answers to the kinds of questions linguists receive al government agencys asked. A large dealer can be a test move back for hypotheses and can be used to add a quantitative balance to numerous linguistic studies.It is also true, however, that lead software presents the researcher with dustup in a form that is not normally encountered and that this can set off patterning that oftentimes goes unnoticed. principal philology has also, therefore, led to a reassessment of what spoken manner of speaking is like. During this journey we will try to find out What is principal philology principal sum Linguistics Terms and Their Meanings History of dealer Linguistics Resources and Methodologies for Corpus Linguistics, Corpora Translation Corpus Linguistics and Linguistic Theory, Corpus-Based Descriptions So fasten the lieu belts we are flyingWhat is Corpus Linguisti cs? Corpus linguistics is a see of pronounceing and a method of linguistic analysis which uses a collection of native or documentary script texts cognise as dealer. Corpus linguistics is used to analyse and research a proceeds of linguistic questions and offers a unique insight into the dynamic of address which has make it unitary of the most widely used linguistic methodologies. Since head linguistics involves the use of large corpora that consist of millions or sometimes even gazillion words, it relies heavily on the use of computers to de borderine what rules govern the lectureand what patters ( grammatical or lexical for instance) occur.Thus it is not surprising that corpus linguistics emerged in its modern form besides after the computer revolution in the mid-eighties. The brown Corpus, the first modern and electronically readable corpus, however, was created by hydrogen Kucera and W. Nelson Francis as proterozoic as the 1960s. Corpus Linguistics Terms and T heir Meanings Corpus (plural corpora). It refers to a collection of corpseatically or randomly collected texts of immanent language which is electronically stored and processed. Corpus can consist of texts in a bingle or three-fold languages.It contains a large number of texts which dispense with the researchers to 1 / 6 analyse linguistic rules but the corpus does not represent the inherent language, no matter how large it is. Multilingual corpus. interchangeable its name suggests, multilingual corpus consists of texts in multiple languages. Parsed corpus (treebank). It is a collection of texts in course occurring language in which all(prenominal) denounce is parsed syntactically analysed and annotated. Syntactic analysis is typically given in a tree-like structure which is why parsed corpus is also known as treebank. check corpora.The term refers to a collection of texts which are translations of each other. bank bill. It refers to an extension of the text by summatio n of various linguistic training. Examples include parsing, tagging, etc. Annotation is often used in reference to corpora as opposed to annotated corpora which consist of plain text in the raw state. Collocation. It refers to a sequence or pattern in which the words appear together or co-occur. Concordance. The term encompasses a word or phrase and its immediate context.In corpus linguistics, concordance is used to analyse antithetical use of a unity word, word absolute frequency andphrases or idioms. Orthography. It is a standardised writing carcass of a fussy language and includes various grammatical rules such(prenominal)(prenominal) as spelling, capitalisation and punctuation marks. Orthography can pose a conundrum in analysis of writing systems which use accents because the native vocalisers of these languages sometimes use alternative characters to the accented letters or omit them completely.Token. It is an occurrence of an someone word which is plays an important ro le in the so-called tokenisation that involves division of the text or collection of words into token. This method is oftenused in the study of languages which do not delimit words with space. Lemmasation. The term derives from the word lemma which refers to a set of different forms of a single word such as laugh and laughed for example. Lemmasation is the process of grouping of the words that obligate the similar meaning. Wildcard.It refers to special characters such as question mark (? ) or asterisk (*) which can represent a character or word. 3A perspective. It is a research method that is used in corpus linguistics which was introduced by S. Wallis and G. Nelson. 3A stands for annotation, abstraction and analysis. History of Corpus LinguisticsHistory of corpus linguistics is typically divided into two tips early corpus linguistics, also known as pre-Chomsky corpus linguistics and modern corpus linguistics The early examples of corpus linguistics date to the late 19th centur y Germany.In 1897, German linguist J. Kading used a large corpus consisting of astir(predicate) 11 million words to analyse distribution of the letters and their sequences in German language. The impressively sized corpus that corresponds with the size of a modern corpus was revolutionary at the time.Other early linguists to use corpus to study language include Franz Boas (Handbook of NativeAmerican Indian Languages, 1911), Zellig Harris (Methods in Structural Linguistics, 1951), Charles C. Fries (The structure of English, 1952), Leonard Bloomfield (Language, 1933), Archibald A. Hill and others, mostly American morphologic and field linguists. Some of them such as Fries and A. Aileen Traver also started to use corpus in pedagogical study of remote language.In 1961, total heat Kucera and W. Nelson Francis from the Brown University started to work on the Brown University Standard Corpus of current American English, comm barely if known simply as the Brown Corpus which is the fir st modern, electronically readable corpus.It consists of 1 million word American English texts that are organised into 15 categories. For the modern standards of corpus linguistics, the Brown Corpus is kind of small, however, it is widely considered one of the most important workings in history of corpus linguistics. But this was also the time of Chomskys criticism of corpus linguistics which would result in a period of decline. Chomsky rejected the use of corpus as a tool for linguistic studies, arguing that linguist must model language on competence instead of performance. And according to Chomsky, corpus does allow 2 / 6 language modelling on competence.Corpus linguistics was not abandoned completely, however, it was not until the 1980s when linguists began to show an increased interest in the use of corpus for research. The revival of corpus linguistics and its emergence in the modern form was greatly influenced by the advent of computers and network technology in the 1980s wh ich allowed the linguists to use electronic language samples as well as electronic tools.The use of computers, however, dates back to the early 1970s when the Montreal French Project au and thentic the first computerised form of spoken language, while Jan Svartvik began to work on the London-Lund corpus with the aid of theBrown Corpus and the Survey of English Usage (SEU) at University College London.All mentioned works before the 1980s as well as the early examples of corpus linguistics paved the way to modern study of language on the basis of corpora as we know it today. The term corpus linguistics has been finally adopted after J. Aarts and W. Meijs published Corpus linguistics Recent developments in the use of computer corpora in English language research in 1984. Resources and Methodologies for Corpus Linguistics, Corpora The basic resource for corpus linguistics is a collection of texts, called a corpus.Corpora can be of varying sizes, are compiled for different purposes, and are composed of texts of different types. All corpora are undiversified to a trustworthy extent they are composed of texts from one language or one variety of a language or one study, etc. They also are all heterogeneous to a certain extent, in that at the very least they are composed of a number of different texts. Most corpora contain information in addition to the texts that make them up, such as information about the texts themselves, part-of- speech tags for each word, and parsing information. ?What Corpus Linguistics DoesGives an access to natura keyic linguistic information. As mentioned before, corpora consist of real word texts which are mostly a product of real life situations. This makes corpora a valuable research source for dialectology, sociolinguistics and stylistics. Facilitates linguistic research. electronically readable corpora have dramatic eventtically reduced the time needed to find particular words or phrases. A research that would take old age or even ye ars to complete manually can be done in a matter of seconds with the highest degree of accuracy. Enables the study of wider patterns and collocation of words.Before the advent of computers, corpus linguistics was studying lone(prenominal) single words and their frequency. Modern technology allowed the study of wider patters and collocation of words. Allows analysis of multiple parameters at the same time. Various corpus linguistics software programmes, online merchandising and analytical tools allow the researchers to analyse a large number of parameters simultaneously. In addition, many corpora are enriched with various linguistic information such as annotation.Facilitates the study of the second language. Study of the second language with the use of naturallanguage allows the students to get a better feeling for the language and reveal the language like it is used in real rather than invented situations. What Corpus Linguistics Does Not Does not explain why. The study of corpo ra tells us what and how happened but it does not tell us why the frequency of a particular word has increased over time for instance. Does not represent the entire language.Corpus linguistics studies the language by using randomly or systematically selected corpora. They typically consist of a large number of naturally occurring texts, however, they do not represent the entire language.Linguistic analyses that use the methods and tools of corpus linguistics thus do not represent the entire language. Searches, Software, and Methodologies Corpora are interrogated through the use of dedicated software, the nature of which inevitably reflects assumptions about methodological analysis in corpus investigation. At the most basic level, corpus software . searches the corpus for a given target event, 3 / 6 . counts the number of instances of the target item in the corpus and calculates relative frequencies, . displays instances of the target item so that the corpus user can carry out upgr ade investigation.It is apparent that corpus methodologies are essentially quantitative. Indeed, corpus linguistics has been criticized for allowing only the observation of relative quantity and for failing to expand the explanatory author of linguistic theory (for discussion, see Meyer, 2002 25). It is shown in this article that corpus linguistics can indeed enrich language theory, though only if preconceptions about what that theory consists of are allowed to change. Here, however, we leave that argument aside as we review corpus investigation software in more(prenominal)(prenominal) detail. Corpus Linguistics and Linguistic Theory, Corpus-Based Descriptions.As has been noted, corpus linguistics is essentially a methodology or set of methodologies, rather than a theory of language description. Essentially, corpus linguistics means this . looking at naturally occurring language . looking at comparatively large amounts of such language . notice relative frequencies, either in ra w form or liaise through statistical operations . observing patterns of association, either between a feature and a text type or between groups of words. cut back to its essence in this way, corpus linguistics appears to be theory neutral, although the employment of doing corpus linguistics is never neutral, as each practitioner defines what is meant by a feature and what frequencies should be observed, in line with a hypothetical approach to what matters in language. Approaches to the use of a corpus that essentially aver on the existence of categories derived from noncorpus investigations of language are sometimes referred to as corpus based (Tognini-Bonelli, 2001).Studies of this kind can test hypotheses arising from grammatical descriptions based on intuition or on limited data. Experiments have been designed specifically to do this (Nelson et al., 2002 257283).For example, Meyer (2002 78) describes work on ellipsis from a typological and psycholinguistic point of view that p redicts that of the three possible clause locations of ellipsis in American spoken English, one will be much more ordinary than the others. A corpus study reveals this to be an accurate prediction. On the other hand, the study of pseudo-titles mentioned in the slit Languages and Varieties shows how assumptions about language in this instance about the influence of one variety of English on another can be shown to be false. Biber et al.(1999 7) comment that corpus-based analysis of grammatical structure can uncover characteristics that were previously unsuspected. They mention as examples of this the astonishingly high frequency of complex relative clause constructions in conversation, and the frequency of simplified grammatical constructions in academic prose. A clearer integration between linguistic theory and corpus linguistics is demonstrated by Matthiessens work on probability (see the section Probability).This work takes its categories from an breathing description of Eng lish (Hallidays (1985) systemic functionalgrammar), but the corpus study was more integral to the theory, as it was the only way that statements about probability of occurrence of each item in the system could be made with accuracy. Corpus-Driven Descriptions However, more radical challenges to language description can be found. Sinclair (1991, 2004) argues that the kind of patterning observable in a corpus (and nowhere else) select descriptions of a markedly different kind from those commonly available.Both the descriptions and the theories that they in turn inspire are, in Tognini-Bonellis (2001) call, corpus driven. Someof the challenges to customs that corpus-driven theories involve are these . Lexis and grammar are not distinct, and grammar is not an abstract system underlying language . Choice of any kind is heavily circumscribe by choice of lexis . Meaning is not atomistic, residing in words, but prosodic, belong to variable units of meaning and always located in texts.4 / 6 Evidence for these claims is presented in the section Observing patterned behavior above. The archetype of pattern grammar focuses on the way that different lexical items behave otherwise in terms of how they are complemented.Grammatical generalizations about complementation cannot be made without describing that individual lexical behavior. Similarly, choice between features such as positive and negative depends to some extent on lexical item, as some verbs (such as afford) occur in the negative much more frequently than most. In other words, the probability of any grammatical septs occurring is strongly affected not only by the register but also by the lexis used. Finally, the evidence of phraseology is that it makes more mind to see meaning as belonging to phrases than to individual words.Findings such as these have led many writers to see a need for descriptions of language that are radically different from those currently available. Sinclair (1991, 2004) proposes, for example, that meaning be seen as belonging to units of meaning, each unit being describable in the way set out in He criticized conventional grammar for distinguishing between structures (a series of slots) and lexis (the fillers), such that it appears that any slot can be filled by any filler there are no restrictions other than what the speaker wishes to say.This is clearly sometimes the case, andwhen it is, Sinclair Translation Corpora can be used to gibe translators, used as a resource for practicing translators, and used as a means of studying the process of translation and the kinds of choices that translators make. Parallel corpora are often used in these applications, and software exists that will align two corpora such that the translation of each sentence in the original text is right away identifiable. This allows one to observe how a given word has been translated in different contexts.One interesting finding is that apparently equivalent words such as English go and Swedish ga , orEnglish with and German mit (Viberg, 1996 Schmied and Fink, 2000) occur as translations of each other in only a minority of instances. This suggests differences in the ways those languages use the items concerned. More generally, interrogatory of parallel corpora emphasizes that what translators translate is not the word but a larger unit (Teubert andC ? erma? kova? , 2004).Although a single word may have many equivalents when translated, a word in context may well have only one such equivalent. For example, although travail as an individual word is sometimes translated as work and sometimes as labor, the phrase travaux pre?paratoires is translated only as preparatory work. Thus, Teubert and C ? erma? kova? argue, travaux pre? paratoires and preparatory work may be considered to be equivalent translation units, whereas no such claim can be made for travaux and work. As well as giving information about languages, corpus studies have also indicated that translated la nguage is not the same as nontranslated language.Studies of corpora of translated texts have shown that they tend to have higher incidences of very frequent words and that they tend to be more explicit in terms of grammar (Baker, 1993). They may also be influenced by the structureof the source language, as was indicated in the discussion of wh- clefts in English and Swedish in the section Languages and Varieties. In communities where people read a large number of translated texts, the foreign language, via its translations, may even influence the home language. Gellerstam (1996) notes that some words in Swedish have taken on the meanings of English that look similar and argues that this is because translators tend to translate the English word with the similar looking Swedish word, thereby using the Swedish word with a new meaning, which then enters the language.One example is the Swedish word dramatisk, which used to indicate something relating to drama but which now, like the Engl ish word dramatic, also means cheering and surprising. Conclusion So every journey has its end. Ours isnt an exception. It was a long journey but it was worth it. Corpus linguistics is a relatively new discipline, and a fast-changing one. As computer resources, particularly web-based ones, develop, sophisticated corpus investigations come within the reach of 5 / 6 the ordinary translator, language learner, or linguist.Our understanding of the ways that types oflanguage might vary from one another, and our appreciation of the ways that words pattern in language, have been boundlessly improved by corpus studies. Even more significant, perhaps, is the development of new theories of language that take corpus research as their starting point. The list of used literature 1. M. A. K. Halliday Lexicology and Corpus Linguistics 2. Teubert and C ? erma? kova? 2004 3. Wallis, S. and Nelson G. Knowledge stripping in grammatically analysed corpora. Data Mining and Knowledge Discovery, 5 3073 40. 2001 cater BY TCPDF (WWW. TCPDF. ORG)
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment