Tutorial: Scripted Data Input

Overview

This lesson teaches you how to read data into the Partner system using a script. Almost all applications require access to external data of some sort, so this is a fundamental skill.

audience

  • system administrators and IT staff
  • power users
  • developers

objectives

  • learn about files and the Virtual File System (VFS)
  • learn about data input, especially DataRecordSource types
  • read a simple data file in and log the contents

prerequisites

You need a working Partner installation and basic familiarity with the Workbench. We’ll be working in a module. Tutorial: Making A Module
and its prerequisites should

give you everything you need.

Virtual File System

The Partner system provides a powerful abstraction layer for file I/O, called Virtual File System or VFS for short.

VFS provides three major benefits: * a cleaner and more consistent model of directories and files * simple, but powerful tools for searching, examining, and modifying a file system * an abstract directory-and-file interface that can also be applied to remote servers, hierarchical data stores, and similar systems

Like the majority of modern file systems, VFS is represented as a hierarchical tree of files and directories.

Nodes, Files, and Directories

All parts of a VFS are either a VfsFile or a VfsDirectory:

  • a VfsFile represents a handle to an actual file, which contains file data but cannot contain other files or directories
  • a VfsDirectory can contain both directories and files, but cannot contain file data

Thus, a VfsDirectory is a usually a branch on the tree structure, while a file is always a leaf node.

When referring to nodes in general, without reference to whether they are directories or files, they are called VfsNodes.

All nodes other than the root directory of the VFS have a single defined directory that they are in.

VfsNodes are really handles - they point at real files and directories, but you shouldn’t think of them as BEING that file or directory. When, for example, you move a file from one VfsFile to another, the VfsFile objects do not change, but the underlying file is no longer present for the first handle and is now available on the second.

All VfsNodes have an exists() method you can use to test to see if they are there.

Paths

All VfsNodes have a Path that uniquely identifies the node by providing the complete list of directory names above the node and of course the node’s name itself.

We always use forward slashes (Unix-style) to divide paths, since the DOS-style backslashes are escape characters in Java strings.

Here are some example paths:

  • data/Example/SomeTextFile.txt
  • logs/partner.log
  • update/info/
  • ../../system/apps/

The last uses the “parent directory” syntax of two periods (dot-dot-slash) to indicate a relative path to a directory two levels up then down the system/apps/ path.

Paths can be relative or absolute. Generally absolute paths have a slash as the prefix (e.g. /this/is/absolute.txt).

System VFS

The installation has a single, standard VFS defined which is the filesystem based in the installation directory. For example, if you installed to c:PartnerDemo, the system VFS would start in that directory as root.

Here is how you access the System VFS:

import com.partnersoft.system.SystemServices;
vfs = SystemServices.vfs();

Custom VFS

You can also create your own custom VFS at another specified location. That’s easy enough:

import com.partnersoft.io.vfs.FileVfs;
vfs = new FileVfs("c:/SomeDirectory/IAmInterestedIn/");

Get a Handle On It

Getting VfsNode handles is also easy, either from the VFS or from other node objects:

logFile = vfs.fileFor("logs/partner.log");
logsDir = logFile.getDirectory();

updateDir = vfs.directoryFor("update/");
infoDir = updateDir.directoryNamed("info");

VfsDirectory also has a number of useful finding and listing functions:

subdirs = updateDir.listDirectories();
textFiles = updateDir.findFilesWithExtension(".txt");

Data Files, Records, and Sources

explained

Data input is obviously a very large topic. Data comes in many forms and formats. There are some common patterns, however.

We can divide the problem roughly into how you get to the data and access its contents, and how you parse (understand) the data format.

Data is usually stored in either a file (locally or on a network), an SQL database, or a web service.

files

Data files come in an enormous number of formats. You can divide these roughly into text and binary, and separately into tabular and document-oriented. Tabular data is common enough that we have specific tools for reading tabular files.

Examples of data files include:

  • dBase .dbf (tabular binary)
  • .csv (tabular text)
  • .xml (document text)
  • .xls (binary, can be read as document or tabular)

databases

SQL databases are sort of a location and a format, and bear special mention. There are file-based SQL databases like Microsoft Access and the Hypersonic HSQLDB included with Partner. There are also server-based SQL databases like MySQL, Microsoft SQL Server, and Oracle.

Each one has its peculiarities, but you connect and communicate with all of them using Structured Query Language (SQL). SQL data is inherently tabular, and the details of formatting and storage are taken care of by the database server or driver.

web services

Web services are programmatic interfaces that utilize the now-standard HTTP transport mechanism. They are similar to web pages, but are designed for software-to-software communication rather than human use.

Web services are request-and-response oriented. Both the request and response are formatted as XML text documents.

DataRecordSource

The Partner platform uses a common abstraction, DataRecordSource, to represent tabular data with named field values. Generally, your script will:

  • locate the database, file, or web service
  • create a DataRecordSource specific to the format
  • iterate through each record (or row) in the source
  • close the DataRecordSource

File-Based DataRecordSource Example

overview

CSV (comma-separated values) is a useful, text-based tabular file format. It’s supported by most database and spreadsheet tools, and is easy to edit and troubleshoot with a text editor.

Let’s make a CSV file, then write an app script to read it back and log the values.

data file

Using the workbench, create a data directory (if it doesn’t already exist) for our module: data/Tutorial/. Create a file named PoleHits.csv in it. This will contain data about how many times a particular pole has been hit. Copy the following contents into it:

data/Tutorial/PoleHits.csv

pole,hits
15926,3
15730,5
15563,1
27699,4
54859,6
47507,2

The first line contains the field names, the other lines the data. All values are separated by commas. We won’t use commas in the values; if we did we would need to quote the values or escape the commas or do other nonsense. Best to keep it simple.

Input Script

Now, we need to make a reader for it. Create a workbench action in our module named Read Pole Hits.groovy:

modules/seat/Tutorial/workbench/actions/Read Pole Hits.groovy

import com.partnersoft.system.SystemServices;
import com.partnersoft.formats.csv.CsvDataRecordSource;

file = SystemServices.vfs().fileFor("data/Tutorial/PoleHits.csv");
source = new CsvDataRecordSource(file);
for (record in source) {
    pole = record.get("pole");
    hits = record.get("hits");
    log.info("Read record: " + record);
    log.info("Pole = " + pole);
    log.info("Hits = " + hits);
}
source.close();

Detailed Explanation

Let’s go through the script section by section.

import com.partnersoft.system.SystemServices;
import com.partnersoft.formats.csv.CsvDataRecordSource;

This part identifies Java classes in the Partner Platform that we will use. SystemServices provides various things including the standard VFS object, and CsvDataRecordSource knows how to read a CSV-formatted data file in.

file = SystemServices.vfs().fileFor("data/Tutorial/PoleHits.csv");
source = new CsvDataRecordSource(file);

We identify the file in the VFS we want, and then create a data record source for it.

for (record in source) {

Read this as “for each record in the source”. This is an iteration loop, and everything between the curly braces will be called once for each data line in the CSV file. Note that in Groovy you have to use the word “in” rather than a colon as in Java or BeanShell (unless you declare your types; then you can use the same syntax as Java).

pole = record.get("pole");
hits = record.get("hits");

These two lines show how to pull specific values out of the record, using their field names. They are now assigned to fields named “pole” and “hits”. CSV files are text, so these values are always Strings by default.

log.info("Read record: " + record);

This logs the entire record as a data structure. You should see all the values in the log.

        log.info("Pole = " + pole);
log.info("Hits = " + hits);

These two statements log the individual variables we created by pulling field values out.

}
source.close();

The closing brace closes the loop, and then we need to close the source, which releases any locks on the file. Closing is important, and easy to forget since it doesn’t always cause obvious problems in your code. It is especially important with database connections and SqlDataRecordSources.

If everything went according to plan, you should be able to click Run and see the following results in your log panel: [[Image:ScriptedInputResults.png|none|frame|Pole Hits log output.]]

Summary

This was an extremely basic introduction to the world of input. However you should now have some familiarity with the basic concepts, and some specific knowledge of files and data record sources. We will build on these basic tools in the remaining tutorials.

Moving On

Now that you’ve mastered data input (hah!), apply your skills to Tutorial: Making a Mapset.