Week 2:
Workload
Lectures : 18 x 1 hr lectures
Practicals : 8 x 2 hrs practicals
Private study : 54 hrs
Assessment : 10 hrs
Requirements
Good practical knowledge of logic programming (Prolog) and familiarity
with context-free grammars and syntactic analysis of formal languages
are expected (at the level of IPL). Students should be acquainted with
the basics of natural language grammar, e.g. they should be able
to distinguish a verb from noun, and a past tense from
present. NLP and MCL would suitably complement
this module with additional theoretical insight. However, neither
of those two modules is formally required
as a pre-requisite, and the overlap of material will be limited.
Assessment
Closed : none
Open : a report covering both theoretical and practical aspects of the
subject. The discussion will be based on the application of
existing software tools to natural language data. (50 marks)
Description
Machine learning of language studies the acquisition of linguistic knowledge from examples. This approach is used in practice to develop tools able to analyse textual data at various levels, such as morpholexical analysis, part-of-speech tagging, syntactic and semantic analysis. The wide availability of text corpora and suitable learning techniques make language learning a less costly and more flexible alternative to the manual development of tools for natural language processing (NLP). While statistical learning methods are widely spread nowadays, they have their clear limitations. The need for adequate modelling of semantics, and acquisition of human-comprehensible theories has led to an increased use of symbolic learning in the recent years.
The module will provide an insight into some of the most up-to-date research in machine learning, and focus on three main issues involved in the symbolic learning of tailored NLP tools:
Aims
To familiarise with some of the
most commonly used linguistic resources, such as
Penn Treebank Corpus and WordNet;
become familiar with the principles of two symbolic machine learning
methods, namely, transformation-based learning and inductive
logic programming; gain hands-on experience with the
application of these methods to the automatic acquisition
of NLP tools.
Content
Lecture slides and important papers will be
made available on the module WWW page.
Recommended books
** Brill E. A Corpus-Based Approach to Language Learning. PhD
Thesis,
UPenn, 1993. URL: http://www.cs.jhu.edu/~brill/dissertation.ps
** Kazakov D. Natural Language Processing Applications
of Machine
Learning. PhD Thesis, Czech Technical
University, Prague, 1999.
URL: ftp://ftp.cs.york.ac.uk/pub/aig/Papers/dimitar.kazakov/content.ps.gz
** Roberts S. An Introduction to Progol. Technical Manual. University
of York, 1997. URL: ftp://ftp.cs.york.ac.uk/pub/ML_GROUP/Papers/progol-intro.ps.gz
++ Ritchie G., Russell G., Black A.
and Pulman, S. Computational
Morphology: Practical Mechanisms for the
English Lexicon. The MIT
Press, 1992.
++ Pereira F., Shieber S. Prolog and Natural Language Analysis. CSLI/SRI,
1987.
+ Mitchell T. Machine Learning. McGraw-Hill, 1997.
+ Antal van den Bosch. Basics of memory-based and decision-tree learning ().