Have a look at make_data.py. It reads in the ever-present Asia example, and generates data from it using forward sampling, saving the data to a file whose name is supplied on the command line. Run it to create some data.
This is a bulky representation since multiple occurrences of a given instantiation are written out in full each time. Alter make_data.py to make a new program called make_data2.py. This new program should produce output where each instantiation is written out once together with a count of how many times it was sampled. See data.txt for an example of the desired output.
Have a look at score_asia.py. Run it by supplying a CSV data file as a command line argument. (Use data.txt if you like.) Note that it uses a BN method called bdeu_score to score an entire BN, and also CPT method of the same name to score individual `families' (variable + its parents). The BDeu score is just the marginal log-likelihood with a choice of Dirichlet prior parameters which ensures likelihood equivalence.
Add to score_asia.py so that you get the score for the BNM model whose ADG is like that of Asia except that the arrow from Smoking to Cancer is missing.
Write a program which finds the highest scoring ADG (Bayesian network model) for data.txt. Use BDeu scoring, and take advantage of the fact that this score is a sum of variable-specific scores. To make your life easier only consider:
You will find it useful to cannibalise bits of the previous programs.
Last modified: Fri Dec 7 09:31:41 GMT 2007