[1]:

import emat
import pandas as pd

TableParser Example¶

In this notebook, we will illustrate the use of a TableParser with a few simple examples.

[2]:

from emat.model.core_files.parsers import (
    TableParser,
    loc, loc_sum, loc_mean,
    iloc, iloc_sum, iloc_mean
)

Parsing a Labeled Table¶

First, let’s consider a TableParser for extracting values from a simple CSV table of traffic counts by time period. We’ll begin by writing such a table as a temporary file to be processed:

[3]:

sample_file_labeled_table = """
LinkID,Count_AM,Count_MD,Count_PM,Count_EV
123,3498,2340,3821,1820
234,4011,2513,4101,1942
345,386,103,441,251
"""

with open('/tmp/emat_sample_file_labeled_table.csv', 'wt') as f:
    f.write(sample_file_labeled_table)

If we wanted to read this table one time, we could easily do so using pandas.read_csv:

[4]:

df = pd.read_csv('/tmp/emat_sample_file_labeled_table.csv', index_col='LinkID')
df

[4]:

	Count_AM	Count_MD	Count_PM	Count_EV
LinkID
123	3498	2340	3821	1820
234	4011	2513	4101	1942
345	386	103	441	251

It is then simple to manually extract individual values by label, or by position, or we could extract a row total to get a daily total count for a link, or take the mean of a column:

[5]:

{
    'A': df.loc[123,'Count_AM'],  # by label
    'B': df.iloc[1,0],            # by position
    'C': df.loc[345,:].sum(),     # sum a row
    'D': df.iloc[:,1].mean(),     # mean of a column
}

[5]:

{'A': 3498, 'B': 4011, 'C': 1181, 'D': 1652.0}

The TableParser object makes it easy to combine these instructions to extract the same values from the same file in any model run.

[6]:

parser = TableParser(
    'emat_sample_file_labeled_table.csv',
    {
        'A': loc[123,'Count_AM'],  # by label
        'B': iloc[1,0],            # by position
        'C': loc_sum[345,:],       # sum a row
        'D': iloc_mean[:,1],       # mean of a column
    },
    index_col='LinkID',
)

We can now execute all these instructions by using the read method of the parser.

[7]:

parser.read(from_dir='/tmp')

[7]:

{'A': 3498.0, 'B': 4011.0, 'C': 1181, 'D': 1652.0}

Using the TableParser has some advantages over just writing a custom function for each table to be processed. The most important is that we do not need to actually parse anything to access the names of the keys available in the parser’s output.

[8]:

parser.measure_names

[8]:

['A', 'B', 'C', 'D']

Parsing Labeled Values¶

The TableParser can also be used to read performace measures from a file that contains simply a list of labeled values, as this can readily be interpreted as a table with one index column and a single data column.

[9]:

sample_file_labeled_values = """
Mean Highway Speed (mph),56.34
Mean Arterial Speed (mph),31.52
Mean Collector Speed (mph),24.80
"""

with open('/tmp/emat_sample_file_labeled_values.csv', 'wt') as f:
    f.write(sample_file_labeled_values)

Reading this file with pandas.read_csv can be done neatly by giving a few extra keyword arguments:

[10]:

pd.read_csv(
    '/tmp/emat_sample_file_labeled_values.csv',
    header=None,
    names=['Label','Value'],
    index_col=0,
)

[10]:

	Value
Label
Mean Highway Speed (mph)	56.34
Mean Arterial Speed (mph)	31.52
Mean Collector Speed (mph)	24.80

We can simply pass these same keyword arguments on to the TableParser, and proceed as above to define the values to extract.

[11]:

parser = TableParser(
    'emat_sample_file_labeled_values.csv',
    {
        'Highway Speed': loc['Mean Highway Speed (mph)','Value']
    },
    header=None,
    names=['Label','Value'],
    index_col=0,
)

[12]:

parser.read(from_dir='/tmp')

[12]:

{'Highway Speed': 56.34}

Parsing Labeled Values¶

Lastly, the TableParser can be used to read performace measures from a file that contains an unlabeled array of values, as sometimes is generated from popular transportation modeling tools.

[13]:

sample_file_unlabeled_array = """
11,22,33
44,55,66
77,88,99
"""

with open('/tmp/emat_sample_file_unlabeled_array.csv', 'wt') as f:
    f.write(sample_file_unlabeled_array)

The labels are not required to read this data using pandas.read_csv, as a default set of row and column index labels are generated.

[14]:

pd.read_csv(
    '/tmp/emat_sample_file_unlabeled_array.csv',
    header=None,
)

[14]:

	0	1	2
0	11	22	33
1	44	55	66
2	77	88	99

But the table is loaded, and individual values or slices can be taken using the iloc tool.

[15]:

parser = TableParser(
    'emat_sample_file_unlabeled_array.csv',
    {
        'upper_left': iloc[0,0],
        'lower_right': iloc[-1,-1],
        'partial_row': iloc_sum[0,1:],
        'top_corner_sum': iloc[0,0] + iloc[0,-1],
    },
    header=None,
)

[16]:

parser.read(from_dir='/tmp')

[16]:

{'upper_left': 11.0,
 'lower_right': 99.0,
 'partial_row': 55,
 'top_corner_sum': 44.0}