Symbolic Learning of Natural Language (SLL)
 
 
 
 
 
 
 
 

Week 2:



Lecturer(s): DLK     Taken by: CS IIIb
 

Workload

Lectures      : 18 x 1 hr lectures
Practicals    :  8 x 2 hrs practicals
Private study : 54 hrs
Assessment    : 10 hrs

Requirements

Good practical knowledge of logic programming (Prolog) and familiarity with context-free grammars and  syntactic analysis of formal languages are expected (at the level of IPL). Students should be acquainted with the basics of natural language grammar, e.g. they should  be able to distinguish a verb from noun,  and a  past tense  from present.   NLP and  MCL  would suitably complement this module  with additional theoretical insight.  However, neither  of  those two  modules  is formally  required  as  a pre-requisite, and the overlap of material will be limited.
 

Assessment

Closed : none

Open : a report covering both theoretical and practical aspects of the subject. The discussion  will be based on the  application of existing software tools to natural language data. (50 marks)
 

Description

Machine  learning of  language studies  the acquisition  of linguistic knowledge from examples. This approach  is used in practice to develop tools  able  to  analyse  textual  data at  various  levels,  such  as morpholexical analysis, part-of-speech tagging, syntactic and semantic analysis.  The wide availability of text corpora and suitable learning techniques  make language  learning a  less costly  and  more flexible alternative to  the manual development  of tools for  natural language processing  (NLP).   While  statistical  learning methods  are  widely spread  nowadays, they  have their  clear limitations.   The  need for adequate    modelling    of     semantics,    and  acquisition  of human-comprehensible theories has led  to an increased use of symbolic learning in the recent years.

The module  will provide an insight  into some of  the most up-to-date research in machine learning, and  focus on three main issues involved in the  symbolic learning of  tailored NLP tools:

  1. linguistic data and its efficient  manipulation,
  2. transformation-based learning and inductive  logic  programming  as  two symbolic  learning  methods  of choice, and,
  3. the application  of those methods to  particular NLP tasks (see example).
The  necessary minimum  of  linguistic  theory  will also  be provided.
 

Aims

To  familiarise  with  some  of  the  most  commonly  used  linguistic resources, such  as Penn Treebank Corpus and  WordNet; become familiar with the principles of  two symbolic machine learning methods, namely, transformation-based  learning and  inductive logic  programming; gain hands-on  experience with  the  application of  these  methods to  the automatic acquisition of NLP tools.
 

Content
 

Teaching material

Lecture  slides and  important papers  will be  made available  on the module WWW page.
 

Recommended books

** Brill E. A Corpus-Based  Approach to Language Learning. PhD Thesis,
   UPenn, 1993. URL: http://www.cs.jhu.edu/~brill/dissertation.ps

** Kazakov  D.  Natural  Language Processing  Applications  of Machine
   Learning.  PhD Thesis,  Czech Technical  University,  Prague, 1999.
   URL: ftp://ftp.cs.york.ac.uk/pub/aig/Papers/dimitar.kazakov/content.ps.gz

** Roberts S. An Introduction  to Progol. Technical Manual. University
   of York, 1997. URL: ftp://ftp.cs.york.ac.uk/pub/ML_GROUP/Papers/progol-intro.ps.gz

++ Ritchie  G.,  Russell G.,  Black  A.   and  Pulman, S.   Computational
   Morphology:  Practical Mechanisms  for the  English Lexicon.  The MIT
   Press, 1992.

++ Pereira F., Shieber S. Prolog and Natural Language Analysis. CSLI/SRI,
   1987.

Mitchell T. Machine Learning. McGraw-Hill, 1997.

+ Antal van den Bosch. Basics of memory-based and decision-tree learning (On-line visual demo).