[1]:
import emat
import pandas as pd

TableParser Example

In this notebook, we will illustrate the use of a TableParser with a few simple examples.

[2]:
from emat.model.core_files.parsers import (
    TableParser,
    loc, loc_sum, loc_mean,
    iloc, iloc_sum, iloc_mean
)

Parsing a Labeled Table

First, let’s consider a TableParser for extracting values from a simple CSV table of traffic counts by time period. We’ll begin by writing such a table as a temporary file to be processed:

[3]:
sample_file_labeled_table = """
LinkID,Count_AM,Count_MD,Count_PM,Count_EV
123,3498,2340,3821,1820
234,4011,2513,4101,1942
345,386,103,441,251
"""

with open('/tmp/emat_sample_file_labeled_table.csv', 'wt') as f:
    f.write(sample_file_labeled_table)

If we wanted to read this table one time, we could easily do so using pandas.read_csv:

[4]:
df = pd.read_csv('/tmp/emat_sample_file_labeled_table.csv', index_col='LinkID')
df
[4]:
Count_AM Count_MD Count_PM Count_EV
LinkID
123 3498 2340 3821 1820
234 4011 2513 4101 1942
345 386 103 441 251

It is then simple to manually extract individual values by label, or by position, or we could extract a row total to get a daily total count for a link, or take the mean of a column:

[5]:
{
    'A': df.loc[123,'Count_AM'],  # by label
    'B': df.iloc[1,0],            # by position
    'C': df.loc[345,:].sum(),     # sum a row
    'D': df.iloc[:,1].mean(),     # mean of a column
}
[5]:
{'A': 3498, 'B': 4011, 'C': 1181, 'D': 1652.0}

The TableParser object makes it easy to combine these instructions to extract the same values from the same file in any model run.

[6]:
parser = TableParser(
    'emat_sample_file_labeled_table.csv',
    {
        'A': loc[123,'Count_AM'],  # by label
        'B': iloc[1,0],            # by position
        'C': loc_sum[345,:],       # sum a row
        'D': iloc_mean[:,1],       # mean of a column
    },
    index_col='LinkID',
)

We can now execute all these instructions by using the read method of the parser.

[7]:
parser.read(from_dir='/tmp')
[7]:
{'A': 3498.0, 'B': 4011.0, 'C': 1181, 'D': 1652.0}

Using the TableParser has some advantages over just writing a custom function for each table to be processed. The most important is that we do not need to actually parse anything to access the names of the keys available in the parser’s output.

[8]:
parser.measure_names
[8]:
['A', 'B', 'C', 'D']

Parsing Labeled Values

The TableParser can also be used to read performace measures from a file that contains simply a list of labeled values, as this can readily be interpreted as a table with one index column and a single data column.

[9]:
sample_file_labeled_values = """
Mean Highway Speed (mph),56.34
Mean Arterial Speed (mph),31.52
Mean Collector Speed (mph),24.80
"""

with open('/tmp/emat_sample_file_labeled_values.csv', 'wt') as f:
    f.write(sample_file_labeled_values)

Reading this file with pandas.read_csv can be done neatly by giving a few extra keyword arguments:

[10]:
pd.read_csv(
    '/tmp/emat_sample_file_labeled_values.csv',
    header=None,
    names=['Label','Value'],
    index_col=0,
)
[10]:
Value
Label
Mean Highway Speed (mph) 56.34
Mean Arterial Speed (mph) 31.52
Mean Collector Speed (mph) 24.80

We can simply pass these same keyword arguments on to the TableParser, and proceed as above to define the values to extract.

[11]:
parser = TableParser(
    'emat_sample_file_labeled_values.csv',
    {
        'Highway Speed': loc['Mean Highway Speed (mph)','Value']
    },
    header=None,
    names=['Label','Value'],
    index_col=0,
)
[12]:
parser.read(from_dir='/tmp')
[12]:
{'Highway Speed': 56.34}

Parsing Labeled Values

Lastly, the TableParser can be used to read performace measures from a file that contains an unlabeled array of values, as sometimes is generated from popular transportation modeling tools.

[13]:
sample_file_unlabeled_array = """
11,22,33
44,55,66
77,88,99
"""

with open('/tmp/emat_sample_file_unlabeled_array.csv', 'wt') as f:
    f.write(sample_file_unlabeled_array)

The labels are not required to read this data using pandas.read_csv, as a default set of row and column index labels are generated.

[14]:
pd.read_csv(
    '/tmp/emat_sample_file_unlabeled_array.csv',
    header=None,
)
[14]:
0 1 2
0 11 22 33
1 44 55 66
2 77 88 99

But the table is loaded, and individual values or slices can be taken using the iloc tool.

[15]:
parser = TableParser(
    'emat_sample_file_unlabeled_array.csv',
    {
        'upper_left': iloc[0,0],
        'lower_right': iloc[-1,-1],
        'partial_row': iloc_sum[0,1:],
        'top_corner_sum': iloc[0,0] + iloc[0,-1],
    },
    header=None,
)
[16]:
parser.read(from_dir='/tmp')
[16]:
{'upper_left': 11.0,
 'lower_right': 99.0,
 'partial_row': 55,
 'top_corner_sum': 44.0}