|
Construct a list of records from a CSV file
The CSV file must be have 3 sections.
Section 1 has lines of the form:
variable:value1,value2,... There is one line for each
variable
Section 2 is a single line of sep separated variables. If
the jth one of these is varname then the jth field of each record is a value for
varname.
Section 3 consists of records, one per line with sep
separating the fields. Either each record has an extra 'count' field or
none do.
No further lines are read after an empty line (so trailing empty lines
do not cause an IOError).
Here's part of an acceptable CSV file (where the optional extra count
field is present):
A:N,Y
S:0,1
T:0,1
L:0,1
B:0,1
E:0,1
D:0,1
X:0,1
A,S,T,L,B,E,D,X
N,1,0,1,1,1,1,1,12
N,1,0,0,0,0,0,0,66
- Parameters:
fobj (File) - CSV file object (NOT the file name)
sep (String) - The field separator in fobj.
- Returns: Tuple
(header, values, variables, records) where:
-
header is a list of the variables in the data in
the order they are
given in the original data file.
[A,S,T,L,B,E,D,X] in the example above.
-
values is a dictionary mapping variables to
their values, where
these values are a list.
{'A':['N','Y'],'S':['0','1']...} in the example
above.
-
variables is an ordered list of the variables in
the data. [A,B,D,E,L,T,S,X] in the example
above.
-
records is a list of records, each record is a
tuple of integers. For each tuple the jth
element is the index of the value of the jth variable of variables found
in that record. The final integer is a count of how often the
record appeared.
- Raises:
IOError - If any record has the wrong number of fields
To Do:
It would be nice to allow this to be an iterator where possible.
|