Tutorial: Scripted Data Input ****************************** Overview ================================ This lesson teaches you how to read data into the Partner system using a script. Almost all applications require access to external data of some sort, so this is a fundamental skill. audience -------------------------------- * system administrators and IT staff * power users * developers objectives -------------------------------- * learn about files and the Virtual File System (VFS) * learn about data input, especially DataRecordSource types * read a simple data file in and log the contents prerequisites -------------------------------- You need a working Partner installation and basic familiarity with the Workbench. We'll be working in a module. :doc:`modules` and its prerequisites should give you everything you need. Virtual File System ================================ The Partner system provides a powerful abstraction layer for file I/O, called Virtual File System or VFS for short. VFS provides three major benefits: * a cleaner and more consistent model of directories and files * simple, but powerful tools for searching, examining, and modifying a file system * an abstract directory-and-file interface that can also be applied to remote servers, hierarchical data stores, and similar systems Like the majority of modern file systems, VFS is represented as a hierarchical tree of files and directories. Nodes, Files, and Directories -------------------------------- All parts of a VFS are either a VfsFile or a VfsDirectory: * a VfsFile represents a handle to an actual file, which contains file data but cannot contain other files or directories * a VfsDirectory can contain both directories and files, but cannot contain file data Thus, a VfsDirectory is a usually a branch on the tree structure, while a file is always a leaf node. When referring to nodes in general, without reference to whether they are directories or files, they are called VfsNodes. All nodes other than the root directory of the VFS have a single defined directory that they are in. VfsNodes are really handles - they point at real files and directories, but you shouldn't think of them as BEING that file or directory. When, for example, you move a file from one VfsFile to another, the VfsFile objects do not change, but the underlying file is no longer present for the first handle and is now available on the second. All VfsNodes have an exists() method you can use to test to see if they are there. Paths -------------------------------- All VfsNodes have a Path that uniquely identifies the node by providing the complete list of directory names above the node and of course the node's name itself. We always use forward slashes (Unix-style) to divide paths, since the DOS-style backslashes are escape characters in Java strings. Here are some example paths: * data/Example/SomeTextFile.txt * logs/partner.log * update/info/ * ../../system/apps/ The last uses the "parent directory" syntax of two periods (dot-dot-slash) to indicate a relative path to a directory two levels up then down the system/apps/ path. Paths can be relative or absolute. Generally absolute paths have a slash as the prefix (e.g. /this/is/absolute.txt). System VFS -------------------------------- The installation has a single, standard VFS defined which is the filesystem based in the installation directory. For example, if you installed to c:\PartnerDemo\, the system VFS would start in that directory as root. Here is how you access the System VFS:: import com.partnersoft.system.SystemServices; vfs = SystemServices.vfs(); Custom VFS -------------------------------- You can also create your own custom VFS at another specified location. That's easy enough:: import com.partnersoft.io.vfs.FileVfs; vfs = new FileVfs("c:/SomeDirectory/IAmInterestedIn/"); Get a Handle On It -------------------------------- Getting VfsNode handles is also easy, either from the VFS or from other node objects:: logFile = vfs.fileFor("logs/partner.log"); logsDir = logFile.getDirectory(); updateDir = vfs.directoryFor("update/"); infoDir = updateDir.directoryNamed("info"); VfsDirectory also has a number of useful finding and listing functions:: subdirs = updateDir.listDirectories(); textFiles = updateDir.findFilesWithExtension(".txt"); JavaDoc -------------------------------- * http://developer.partnersoft.com/releases/development/javadoc/com/partnersoft/io/vfs/VfsFile.html * http://developer.partnersoft.com/releases/development/javadoc/com/partnersoft/io/vfs/VfsDirectory.html * http://developer.partnersoft.com/releases/development/javadoc/com/partnersoft/io/vfs/VfsNode.html Data Files, Records, and Sources ================================ explained -------------------------------- Data input is obviously a very large topic. Data comes in many forms and formats. There are some common patterns, however. We can divide the problem roughly into how you get to the data and access its contents, and how you parse (understand) the data format. Data is usually stored in either a file (locally or on a network), an SQL database, or a web service. files -------------------------------- Data files come in an enormous number of formats. You can divide these roughly into text and binary, and separately into tabular and document-oriented. Tabular data is common enough that we have specific tools for reading tabular files. Examples of data files include: * dBase .dbf (tabular binary) * .csv (tabular text) * .xml (document text) * .xls (binary, can be read as document or tabular) databases -------------------------------- SQL databases are sort of a location and a format, and bear special mention. There are file-based SQL databases like Microsoft Access and the Hypersonic HSQLDB included with Partner. There are also server-based SQL databases like MySQL, Microsoft SQL Server, and Oracle. Each one has its peculiarities, but you connect and communicate with all of them using Structured Query Language (SQL). SQL data is inherently tabular, and the details of formatting and storage are taken care of by the database server or driver. web services -------------------------------- Web services are programmatic interfaces that utilize the now-standard HTTP transport mechanism. They are similar to web pages, but are designed for software-to-software communication rather than human use. Web services are request-and-response oriented. Both the request and response are formatted as XML text documents. DataRecordSource -------------------------------- The Partner platform uses a common abstraction, DataRecordSource, to represent tabular data with named field values. Generally, your script will: * locate the database, file, or web service * create a DataRecordSource specific to the format * iterate through each record (or row) in the source * close the DataRecordSource File-Based DataRecordSource Example ================================ overview -------------------------------- CSV (comma-separated values) is a useful, text-based tabular file format. It's supported by most database and spreadsheet tools, and is easy to edit and troubleshoot with a text editor. Let's make a CSV file, then write an app script to read it back and log the values. data file -------------------------------- Using the workbench, create a data directory (if it doesn't already exist) for our module: data/Tutorial/. Create a file named PoleHits.csv in it. This will contain data about how many times a particular pole has been hit. Copy the following contents into it: data/Tutorial/PoleHits.csv ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: pole,hits 15926,3 15730,5 15563,1 27699,4 54859,6 47507,2 The first line contains the field names, the other lines the data. All values are separated by commas. We won't use commas in the values; if we did we would need to quote the values or escape the commas or do other nonsense. Best to keep it simple. Input Script -------------------------------- Now, we need to make a reader for it. Create a workbench action in our module named Read Pole Hits.groovy: modules/seat/Tutorial/workbench/actions/Read Pole Hits.groovy ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: import com.partnersoft.system.SystemServices; import com.partnersoft.formats.csv.CsvDataRecordSource; file = SystemServices.vfs().fileFor("data/Tutorial/PoleHits.csv"); source = new CsvDataRecordSource(file); for (record in source) { pole = record.get("pole"); hits = record.get("hits"); log.info("Read record: " + record); log.info("Pole = " + pole); log.info("Hits = " + hits); } source.close(); Detailed Explanation -------------------------------- Let's go through the script section by section. :: import com.partnersoft.system.SystemServices; import com.partnersoft.formats.csv.CsvDataRecordSource; This part identifies Java classes in the Partner Platform that we will use. SystemServices provides various things including the standard VFS object, and CsvDataRecordSource knows how to read a CSV-formatted data file in. :: file = SystemServices.vfs().fileFor("data/Tutorial/PoleHits.csv"); source = new CsvDataRecordSource(file); We identify the file in the VFS we want, and then create a data record source for it. :: for (record in source) { Read this as "for each record in the source". This is an iteration loop, and everything between the curly braces will be called once for each data line in the CSV file. Note that in Groovy you have to use the word "in" rather than a colon as in Java or BeanShell (unless you declare your types; then you can use the same syntax as Java). :: pole = record.get("pole"); hits = record.get("hits"); These two lines show how to pull specific values out of the record, using their field names. They are now assigned to fields named "pole" and "hits". CSV files are text, so these values are always Strings by default. :: log.info("Read record: " + record); This logs the entire record as a data structure. You should see all the values in the log. :: log.info("Pole = " + pole); log.info("Hits = " + hits); These two statements log the individual variables we created by pulling field values out. :: } source.close(); The closing brace closes the loop, and then we need to close the source, which releases any locks on the file. Closing is important, and easy to forget since it doesn't always cause obvious problems in your code. It is especially important with database connections and SqlDataRecordSources. If everything went according to plan, you should be able to click Run and see the following results in your log panel: [[Image:ScriptedInputResults.png|none|frame|Pole Hits log output.]] Summary ================================ This was an extremely basic introduction to the world of input. However you should now have some familiarity with the basic concepts, and some specific knowledge of files and data record sources. We will build on these basic tools in the remaining tutorials. Moving On ================================ Now that you've mastered data input (hah!), apply your skills to :doc:`mapset`.