Package gPy :: Module Data :: Class _ADTree
[hide private]
[frames] | no frames]

Class _ADTree

source code


ADTree implementation

Ref:

@Article{moore98:_cached_suffic_statis_effic_machin,
author =    {Andrew Moore and Mary Soon Lee},
title =     {Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets},
journal =   {Journal of Artificial Intelligence Research},
year =      1998,
volume =    8,
pages =     {67--91},
url = {http://www.jair.org/media/453/live-453-1678-jair.pdf}
}
Instance Methods [hide private]
 
__getstate__(self) source code
 
__setstate__(self, state) source code
 
__init__(self, numvals, records, depth, rmin)
Initialise an _ADTree object
source code
list
_flatten(self, variables_info, depth)
Return the data for a Factor
source code
int
size(self)
Return the number of nodes in the tree
source code
 
__str__(self)
str(x)
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__

Instance Variables [hide private]
int _count
A count of the number of records 'in' the tree
Various _data
may be .1 a tuple each element of which is either an _ADTree object or None, There is one element for each value of the variable corresponding to the top node of the tree.
Various _mcvindex
In cases 1) and 3) this is an integer stating which value has the most records.
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, numvals, records, depth, rmin)
(Constructor)

source code 

Initialise an _ADTree object

Parameters:
  • numvals (list) - The ith element of this list is the number of values for the ith variable in the tree
  • records (list) - The data as a list. Each element is a tuple of value indices, one for each variable, plus an extra count field as the final element.
  • depth (int) - The depth of this tree within its containing CompactFactor. Also the index of the variable associated with the top node of this tree
  • rmin (int) - If the number of records is below rmin then tree growing stops and records is stored. Note that a single record may represent many datapoints since all records have an extra 'count' field.
Overrides: object.__init__

_flatten(self, variables_info, depth)

source code 

Return the data for a Factor

The Factor's variables will generally be a subset of those for which data is stored in the tree

Parameters:
  • variables_info (list) - Contains the necessary information on the variables sought without naming them. The ith element of variables_info contains information on the ith variable sought. Each element of variables_info is a 3 element list: variables_info[i][0] is the depth in the _ADTree which deals with the ith variable sought. variables_info[i][1] is the number of values of the ith variable sought. variables_info[i][2] is the number of data values in the eventual factor which correspond to each value of the ith variable sought. (Clearly this depends on the number of values of 'later' variables.)
Returns: list
Values for the factor

size(self)

source code 

Return the number of nodes in the tree

Returns: int
The number of nodes in the tree

__str__(self)
(Informal representation operator)

source code 

str(x)

Overrides: object.__str__
(inherited documentation)

Instance Variable Details [hide private]

_data

may be .1 a tuple each element of which is either an _ADTree object or None, There is one element for each value of the variable corresponding to the top node of the tree. A None value indicates 0 records for the corresponding value .2 a tuple of records .3 a single _ADTree object. This is a space saving mechanism used when there is only one value of the variable with any records associated with it

_mcvindex

In cases 1) and 3) this is an integer stating which value has the most records. To this value *all* records are associated rather than just those with the appropriate value. In case 2) this is None