Principles¶

Data is Dirty¶

Data is generally kind of a mess. Even supposedly well-managed data tends to have at least a few issues, and the average data set is full of problems. This is Partner Axiom 1: “All data is dirty.”

Ultimately, data is entered by and maintained by human beings. They manage to introduce quirks into even the most well-constrained database. Dealing with those quirks by interpreting, translating, transforming, mangling, or otherwise messing with the data is something we do a lot of in the Partner System. This general process we refer to as “Turd Polishing”.

Humans are Flexible¶

And, let’s be fair - computers make distinctions about data that human beings don’t generally understand and don’t generally care about.

Numbers for example. A human looks at a number like 12345, and doesn’t care if it’s an integer number, floating point number, sequence of characters, character string, or a picture. They interpret the type based on need and context. So, a human would expect to enter something like “12345” into a text box, but then be able to treat it as a number when they need to do math. And they don’t generally consider a integer of value 12345 and a string of value “12345” to be any different.

Data structures have similar issues. Do you need a set, a list, a map, a stack, a tree, or some other structure? Is it sorted or unsorted? What if you have to treat the same set of objects in different ways for different purposes? How do you transform between? What about messy data that’s got duplicates and other issues?

A lot of a programmer’s early training is learning about data types and type conversion, and non-programmers are usually baffled by the entire topic.

Computers in general, and Java in particular, tend to be case-sensitive, too. Very rarely do humans treat text case-sensitively, which causes all kinds of mistakes and aggravation when they try to deal with a case-sensitive computer, language, or system.

Most humans also don’t care about whitespace in particular. (An exception is typesetters.) It’s there to keep things from running into each other - the specific type and amount of space is immaterial. Nothing is more frustrating than trying to find a file or key that has leading or trailing white space in it - that doesn’t show up at all in a printout or GUI.

So, generally, we want to ensure that data frameworks can convert naturally from one type to another at need, that strings are treated case-insensitively, and that things like leading/trailing whitespace don’t bite us in sensitive places.

There’s a Hole between Data Structures and Class Frameworks¶

Java provides a wealth of great general-purpose data structures - lists, maps, sets, etc. You can combine them in various ways. And you can define your own data structures and application-specific class frameworks.

However, quite often you’re writing a script, a conversion tool, or some other piece of software that doesn’t need a full-fledged object hierarchy and its accompanying implementation and maintenance costs. More importantly, you need a complex data structure that is defined at runtime via configuration, rather than compile-time via Java classes. Or you just need a structure to bang on data with - you load an XML file, change some things around, and put it out as a CSV, for example, or you transform your class model into something different from a report.

While you can combine lists and maps and such into complex structures, in practice this tends to be annoying - you have to e.g. put in extra code to create parts of the structure as needed, check for nulls, etc.

Data Should Be Readable, Legible, and Portable¶

When all else fails, we end up pulling out the text editor and just looking at data. So, whatever fancy nonsense we come up with, and whatever sophisticated GUIs we build to dress up the nonsense, we should always ensure that we can edit the nonsense with a no-nonsense tool.

No framework can defend us from bad ideas and bad implementations, but a good data framework should always exhibit transparency and allow manipulation via brute-force methods.

Having a simple model for complex data structures, and using simple formats such as XML to store them, allows this and also provides for portability. For while we primarily run in Java, and have discussed these principles in the context of Java, they apply equally well to any system and a well-designed data structure and format should work in any reasonable computer platform or programming language.

Summary¶

all data is dirty
type conversion should be easy, flexible, and tolerant
strings are case-insensitive
white space should be ignored in general, especially leading/trailing
nontrivial applications often need a complex data structure, but not necessarily a full class framework
stored data should be readable, legible, and portable

Principles¶

Data is Dirty¶

Humans are Flexible¶

There’s a Hole between Data Structures and Class Frameworks¶

Data Should Be Readable, Legible, and Portable¶

Summary¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

Principles¶

Data is Dirty¶

Humans are Flexible¶

There’s a Hole between Data Structures and Class Frameworks¶

Data Should Be Readable, Legible, and Portable¶

Summary¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation