[1]:
import emat
import pandas as pd
TableParser Example¶
In this notebook, we will illustrate the use of a TableParser with a few simple examples.
[2]:
from emat.model.core_files.parsers import (
TableParser,
loc, loc_sum, loc_mean,
iloc, iloc_sum, iloc_mean
)
Parsing a Labeled Table¶
First, let’s consider a TableParser for extracting values from a simple CSV table of traffic counts by time period. We’ll begin by writing such a table as a temporary file to be processed:
[3]:
sample_file_labeled_table = """
LinkID,Count_AM,Count_MD,Count_PM,Count_EV
123,3498,2340,3821,1820
234,4011,2513,4101,1942
345,386,103,441,251
"""
with open('/tmp/emat_sample_file_labeled_table.csv', 'wt') as f:
f.write(sample_file_labeled_table)
If we wanted to read this table one time, we could easily do so using pandas.read_csv
:
[4]:
df = pd.read_csv('/tmp/emat_sample_file_labeled_table.csv', index_col='LinkID')
df
[4]:
Count_AM | Count_MD | Count_PM | Count_EV | |
---|---|---|---|---|
LinkID | ||||
123 | 3498 | 2340 | 3821 | 1820 |
234 | 4011 | 2513 | 4101 | 1942 |
345 | 386 | 103 | 441 | 251 |
It is then simple to manually extract individual values by label, or by position, or we could extract a row total to get a daily total count for a link, or take the mean of a column:
[5]:
{
'A': df.loc[123,'Count_AM'], # by label
'B': df.iloc[1,0], # by position
'C': df.loc[345,:].sum(), # sum a row
'D': df.iloc[:,1].mean(), # mean of a column
}
[5]:
{'A': 3498, 'B': 4011, 'C': 1181, 'D': 1652.0}
The TableParser
object makes it easy to combine these instructions to extract the same values from the same file in any model run.
[6]:
parser = TableParser(
'emat_sample_file_labeled_table.csv',
{
'A': loc[123,'Count_AM'], # by label
'B': iloc[1,0], # by position
'C': loc_sum[345,:], # sum a row
'D': iloc_mean[:,1], # mean of a column
},
index_col='LinkID',
)
We can now execute all these instructions by using the read
method of the parser.
[7]:
parser.read(from_dir='/tmp')
[7]:
{'A': 3498.0, 'B': 4011.0, 'C': 1181, 'D': 1652.0}
Using the TableParser
has some advantages over just writing a custom function for each table to be processed. The most important is that we do not need to actually parse anything to access the names of the keys available in the parser’s output.
[8]:
parser.measure_names
[8]:
['A', 'B', 'C', 'D']
Parsing Labeled Values¶
The TableParser
can also be used to read performace measures from a file that contains simply a list of labeled values, as this can readily be interpreted as a table with one index column and a single data column.
[9]:
sample_file_labeled_values = """
Mean Highway Speed (mph),56.34
Mean Arterial Speed (mph),31.52
Mean Collector Speed (mph),24.80
"""
with open('/tmp/emat_sample_file_labeled_values.csv', 'wt') as f:
f.write(sample_file_labeled_values)
Reading this file with pandas.read_csv
can be done neatly by giving a few extra keyword arguments:
[10]:
pd.read_csv(
'/tmp/emat_sample_file_labeled_values.csv',
header=None,
names=['Label','Value'],
index_col=0,
)
[10]:
Value | |
---|---|
Label | |
Mean Highway Speed (mph) | 56.34 |
Mean Arterial Speed (mph) | 31.52 |
Mean Collector Speed (mph) | 24.80 |
We can simply pass these same keyword arguments on to the TableParser
, and proceed as above to define the values to extract.
[11]:
parser = TableParser(
'emat_sample_file_labeled_values.csv',
{
'Highway Speed': loc['Mean Highway Speed (mph)','Value']
},
header=None,
names=['Label','Value'],
index_col=0,
)
[12]:
parser.read(from_dir='/tmp')
[12]:
{'Highway Speed': 56.34}
Parsing Labeled Values¶
Lastly, the TableParser
can be used to read performace measures from a file that contains an unlabeled array of values, as sometimes is generated from popular transportation modeling tools.
[13]:
sample_file_unlabeled_array = """
11,22,33
44,55,66
77,88,99
"""
with open('/tmp/emat_sample_file_unlabeled_array.csv', 'wt') as f:
f.write(sample_file_unlabeled_array)
The labels are not required to read this data using pandas.read_csv
, as a default set of row and column index labels are generated.
[14]:
pd.read_csv(
'/tmp/emat_sample_file_unlabeled_array.csv',
header=None,
)
[14]:
0 | 1 | 2 | |
---|---|---|---|
0 | 11 | 22 | 33 |
1 | 44 | 55 | 66 |
2 | 77 | 88 | 99 |
But the table is loaded, and individual values or slices can be taken using the iloc
tool.
[15]:
parser = TableParser(
'emat_sample_file_unlabeled_array.csv',
{
'upper_left': iloc[0,0],
'lower_right': iloc[-1,-1],
'partial_row': iloc_sum[0,1:],
'top_corner_sum': iloc[0,0] + iloc[0,-1],
},
header=None,
)
[16]:
parser.read(from_dir='/tmp')
[16]:
{'upper_left': 11.0,
'lower_right': 99.0,
'partial_row': 55,
'top_corner_sum': 44.0}