Goals
A “data model” should specify the meta data to describe the different types of data being used. In particular, it should support:
- units and unit conversions
- input/output serialization
- conversion to network-portable / db-neutral byte ordering
- descriptions
- ranges or more general validations, enumeration mappings, etc
- structured data consisting of multiple values (eg, PVT triplets)
- automatic generation of SQL statements to create/update/query tables
- automatic generation of HTML input forms and javascript validation code
- a specialization hierarchy: declaring one type to be more specialized/constrained than a base type
- scope (in the context of a state chart) when data is valid
The Wikipedia articles on data modeling and data dictionaries are mostly oriented to the RDMS aspects and don’t address most of the topics above. Another area with similar problems is in defining datatypes for network protocols, eg, RPC, SOAP, IDL, …
RPyC is a python RPC framework but since it is exclusively python-python, it probably doesn’t address my issues.
SOAP seems to pass data around as text (embedded in xml).
What about a DTD? This page describes how datatypes are specified in an XML schema.
Perhaps the problem should be decomposed into two separate concerns:
- “datavalue”: a specification of a particular type of scalar/primitive data value
- “datatype”: a specification of the structure of a (possibly vector/complex) type of data
Since (my)sql is an important interface, check out its data types…
- type spec includes total width and width after decimal place for numerics
- signed / unsigned
- tiny/small/medium/big-int
- float/double/decimal
SQL syntax for creating a column is here and rules for naming columns are here.
Implementation
What do I want python valuetypes to look like?
t = data.temperature(75.1)
print t
-> "75.1 C"
type(t)
-> <type "type">
data.temperature
-> <type "temperature">
convert(t,'F')
-> 123.0
t2 = data.temperature("123.0 F")
print t2
-> "75.1 C"
help(temperature)
-> A temperature value measured in C in the range (-274,1000).
-> ...description of its public methods...
Read up on python type system to figure out how to define my valuetypes. Is ‘temperature’ a ‘type’ or a subclass of ‘float’ or both!?
This page is good reading on the relationship between types and objects and classes:
- type is-a object and type is-an-instance-of-a type
- object is-an-instance-of-a type but has no base class (is-a relationship)
- yes, those are circular!
- in general, X.__class__ shows ‘is-an-instance-of-a’ relationships and X.__bases__ shows ‘is-a’ relationships
- isinstance is recursive, so isinstance(object,object) is True (via object is-an-instance-of-a type is-a object) even though object.__bases__ is empty
- only ‘type objects’ can be the base class for a new class (so ‘t’ cannot be a base class)
- a class that inherits from ‘type’ is a ‘metaclass’ (‘type’ is the only built-in metaclass)
Here is a good discussion of metaprogramming in three parts from IBM’s charming python series: one | two | three
- a metaclass constructor looks like the 3-arg type(name,bases,dict) ctor and so instances specify their base class(es), which is generally not the metaclass itself.
Another charming python article on using ‘decorators’ to clean up metaprogramming syntax.
I have been struggling to understand python’s super() and whether it is important for correctly defining a valuetype metaclass. The definitive reference is Guido’s article on unifying types and classes which includes a pure python implementation of the super() class [ super is a class, not a built-in function ]. There is a spirited thread discussing super() that starts here and mainly concludes that:
- the name is misleading since super(A,B) returns a superclass of B rather than of A
- in the case of multiple inheritance, the appropriate superclass is determined by the method resolution order (MRO) of type(B), which is generally a list of metaclasses (ie, classes inheriting from type).
- super() returns the element in this MRO list immediately after A. This page explains it nice and clearly.
Using super() is the correct thing to do for ensuring that all ctors in an inheritance graph get called so I should do it (even though it requires that all other upstream types in the MRO do the same and I can’t imagine how valuetype would end up in a multiple inheritance graph…)
Refer to the TCC keyword dictionary to get a list of concrete valuetypes that I need to declare:
- temperature (C) of outside air
- angle (deg) of axis motor positions, limits
- velocity (deg/s) of axis motor
- angle error (arcsec) of axis motor
- time (s relative to ??) of TCC
- enumerations such as AxisCmdState, AxisCmdError, …
- rms error, focus offset (microns)
- guider, instrument center and limits (unbinned pixels) – not relevant for 2.5m?
- relative humidity (fraction 0-1)
- local apparent sidereal time (deg)
- various strings
- various small (but non-enumerated) integers