http://people.csail.mit.edu/jaffer/MIXF/CMIXF11  
Representation of currency and SI units in character strings for information interchanges  
 
This document describes a character string encoding for numerical values, monetary units, and SI units which:
According to [NASA 1999] Arthur Stephenson, chairman of the Mars Climate Orbiter Mission Failure Investigation Board:
"The 'root cause' of the loss of the spacecraft was the failed translation of English units into metric units in a segment of groundbased, navigationrelated mission software, ..."
Although the [ISO 6093] standard for automated interchange of numerical data is widely used, standardized measurement units (other than for page formating) are not routinely attached to interchange data.
The audience for metric standards extends beyond scientists and engineers. In the preface to Guide for the Use of the International System of Units (SI) [NIST 811], B. Taylor writes:
The International System of Units, universally abbreviated SI, is the modern metric system of measurement. Long the dominant measurement system used in science, the SI is becoming the dominant measurement system used in international commerce.
For use in commerce, monetary units should be mixed with SI units.
But most currency symbols are not in codepage 0. Furthermore,
monetary units are prefixes in some locales and suffixes in others.
[ISO 4217] currency names are composed of
characters from codepage 0 and are widely used. Because
CMIXF
is not a presentation format, we can unify the
suffix treatment of SI units with monetary units.
The 1986 standard Representations for U.S. Customary, SI, and Other Units to Be Used in Systems with Limited Character Sets [ANSI X3.50] states:
This standard was not designed for ... usage by humans as input to, or output from, data systems. ... They should never be printed out for publication or for other forms of public information transfer.
[ANSI X3.50] representations of units are ambiguous. "min" is both "minute" and "milliinch"; "cd" is both "candela" and "centiday".
Apart from SI units, [ANSI X3.50] supports only U.S. local units, is not complete in that support, and has no provision for extension to other locales. But nonSI unit systems are in such disarray that using them for interchange is not practical. Unit names signify different volumes in different locales; the Canadian gallon is 4.54609 liters, while the U.S. gallon is 3.785412 liters. The CRC Handbook of Chemistry and Physics [CRC] lists no less than six distinct (incompatible) systems of wire gauges.
The character set limitations targeted by [ANSI X3.50], namely single alphabetic case, are no longer common in data interchanges. But much of its double case "Form I" SI unit representations are similar to those presented here.
Cascading Style Sheets [CSS2] permit certain units directly appended to numbers. Only a fixed set of units is supported; these units cannot be combined with other prefixes or each other.
" CSS2 syntax and basic data types" describes the units:
Of these CSS2 units, cm, mm, rad, ms, s, Hz, and kHz are compatible with the Metric Interchange Formats described here. The remainder, em, ex, px, %, in, pt, pc, deg, and grad do not spoof any metricinterchange units. Note that the CSS2 degree unit, deg, is not recognized as the metricinterchage degree unit, o.
Guide for the Use of the International System of Units (SI) [NIST 811] details a methodology for expressing measurement units in both text and symbolic form in scientific and other documents. Its unit expressions combine over 40 metric base and derived unit symbols unambiguously. Taylor's unit symbols are the basis for this metric interchange format.
A monetary unit symbol is formed from 3 uppercase alphabetic characters. The metric symbols are from the symbol column of the Metric Unit Symbols table.
Within a compound unit, each of the base and derived symbols can optionally have an attached SI prefix. The binary prefixes can be used with base units B (byte) and bit.
Unit symbols formed from other unit symbols by multiplication are indicated by means of a PERIOD (".") placed between them.
Unit symbols formed from other unit symbols by division are indicated by means of a SOLIDUS ("/") or negative exponents. The SOLIDUS must not be repeated in the same compound unit unless contained within a parenthesized subexpression.
The grouping formed by a prefix symbol attached to a unit symbol constitutes a new inseparable symbol (forming a multiple or submultiple of the unit concerned) which can be raised to a positive or negative power and which can be combined with other unit symbols to form compound unit symbols.
The grouping formed by surrounding compound unit symbols with parentheses ("(" and ")") constitutes a new inseparable symbol which can be raised to a positive or negative power and which can be combined with other unit symbols to form compound unit symbols.
Compound prefix symbols, that is, prefix symbols formed by the juxtaposition of two or more prefix symbols, are not permitted.
Prefix symbols are not used with the timerelated unit symbols min (minute), h (hour), d (day). No prefix symbol may be used with dB (decibel). Only submultiple prefix symbols may be used with the unit symbols L (liter), Np (neper), o (degree), oC (degree Celsius), rad (radian), and sr (steradian). Submultiple prefix symbols may not be used with the unit symbols t (metric ton), r (revolution), or Bd (baud).
A unit exponent follows the unit, separated by a CIRCUMFLEX ("^"). Exponents may be positive or negative. Fractional exponents must be parenthesized.
Monetary units must be composed of 3 uppercase letters.
The case of letters in metric unit symbols must match the symbols specified in the Metric Unit Symbols table. Metric unit symbols are composed of lowercase letters except that:
The prefix symbols Y (yotta), Z (zetta), E (exa), P (peta), T (tera), G (giga), and M (mega) are printed in uppercase letters while all other prefix symbols are printed in lowercase letters.
When used with a numerical value of a quantity, the unit symbol is placed after the numerical value. A PERIOD ("."), SPACE (" "), or nothing is placed between the numerical value and the unit symbol. A SPACE is preferred.


These binary prefixes are valid only with the units B (byte) and bit. However, decimal prefixes can also be used with bit; and decimal multiple (not submultiple) prefixes can also be used with B (byte).
Factor  Powerof2  Name  Symbol 

1.152921504606846976e18  2^{60}  exbi  Ei 
1.125899906842624e15  2^{50}  pebi  Pi 
1.099511627776e12  2^{40}  tebi  Ti 
1.073741824e9  2^{30}  gibi  Gi 
1.048576e6  2^{20}  mebi  Mi 
1.024e3  2^{10}  kibi  Ki 
Type of Quantity  Name  Symbol  Equivalent 

time  second  s  
time  minute  min  = 60.s 
time  hour  h  = 60.min 
time  day  d  = 24.h 
frequency  hertz  Hz  s^1 
signaling rate  baud  Bd  s^1 
length  meter  m  
volume  liter  L  dm^3 
plane angle  radian  rad  
solid angle  steradian  sr  rad^2 
plane angle  revolution  r  =*6.283185307179586.rad 
plane angle  degree  o  =*2.777777777777778e3.r 
information capacity  bit  bit  
information capacity  byte, octet  B  = 8.bit 
mass  gram  g  
mass  ton  t  Mg 
mass  unified atomic mass unit  u  = 1.660538782e27.kg 
amount of substance  mole  mol  
catalytic activity  katal  kat  mol/s 
thermodynamic temperature  kelvin  K  
temperature  degree Celsius  oC  
luminous intensity  candela  cd  
luminous flux  lumen  lm  cd.sr 
illuminance  lux  lx  lm/m^2 
force  newton  N  m.kg.s^2 
pressure, stress  pascal  Pa  N/m^2 
energy, work, heat  joule  J  N.m 
energy  electronvolt  eV  = 1.602176487e19.J 
power, radiant flux  watt  W  J/s 
logarithm of power ratio  neper  Np  
logarithm of power ratio  decibel  dB  =*0.1151293.Np 
electric current  ampere  A  
electric charge  coulomb  C  s.A 
electric potential, EMF  volt  V  W/A 
capacitance  farad  F  C/V 
electric resistance  ohm  Ohm  V/A 
electric conductance  siemens  S  A/V 
magnetic flux  weber  Wb  V.s 
magnetic flux density  tesla  T  Wb/m^2 
inductance  henry  H  Wb/A 
radionuclide activity  becquerel  Bq  s^1 
absorbed dose energy  gray  Gy  m^2.s^2 
dose equivalent  sievert  Sv  m^2.s^2 
*  The exact formulas are:  

r/rad  = 8 * atan(1)  
o/r  = 1 / 360  
db/Np  = ln(10) / 20 
Most of these are from [NIST 811]  Examples of SI derived units ... and Essentials of the SI: Base & derived units
Type of Quantity  Name  Symbol 

area  square meter  m^2 
volume  cubic meter  m^3 
speed, velocity  meter per second  m/s 
acceleration  meter per second squared  m/s^2 
wave number  reciprocal meter  m^1 
mass density (density)  kilogram per cubic meter  kg/m^3 
specific volume  cubic meter per kilogram  m^3/kg 
current density  ampere per square meter  A/m^2 
magnetic field strength  ampere per meter  A/m 
concentration  mole per cubic meter  mol/m^3 
luminance  candela per square meter  cd/m^2 
angular velocity  radian per second  rad/s 
angular acceleration  radian per second squared  rad/s^2 
dynamic viscosity  pascal second  Pa.s 
moment of force  newton meter  N.m 
surface tension  newton per meter  N/m 
heat flux density  watt per square meter  W/m^2 
radiant intensity  watt per steradian  W/sr 
radiance  watt per square meter steradian  W/(m^2.sr) 
heat capacity, entropy  joule per kelvin  J/K 
specific heat or entropy  joule per kilogram kelvin  J/(kg.K) 
specific energy  joule per kilogram  J/kg 
thermal conductivity  watt per meter kelvin  W/(m.K) 
energy density  joule per cubic meter  J/m^3 
electric field strength  volt per meter  V/m 
electric charge density  coulomb per cubic meter  C/m^3 
electric flux density  coulomb per square meter  C/m^2 
permittivity  farad per meter  F/m 
permeability  henry per meter  H/m 
molar energy  joule per mole  J/mol 
molar entropy or heat  joule per mole kelvin  J/(mol.K) 
exposure (x and g rays)  coulomb per kilogram  C/kg 
absorbed dose rate  gray per second  Gy/s 
rotational speed  revolution per minute  r/min 
catalytic concentration  katal per cubic meter  kat/m^3 
data rate  mebibit per second  Mib/s 
noise voltage density  nanovolt per root hertz  nV/Hz^(1/2) 
hourly rate  US Dollars per hour  USD/h 
price  Euros per kilogram  EUR/kg 
exchange rate  Japanese Yen per US Dollar  JPY/USD 
Programming language support for metric unit interchange should be provided by a function of two unit arguments returning a conversion factor. Multiplying a numerical value expressed in the second unit by the returned conversion factor yields the numerical value expressed in the first unit. This function must return a nonpositive number if either of its arguments is not a syntactically valid unit; or if the conversion factor does not exist.
UCF("km/s", "m/s" ) > 0.001 UCF("N" , "m/s" ) > 0 UCF("moC" , "oC" ) > 1000 UCF("mK" , "oC" ) > 0 UCF("rad" , "o" ) > 0.0174533 UCF("K" , "o" ) > 0 UCF("K" , "K" ) > 1 UCF("oK" , "oK" ) > 3 UCF("" , "s/s" ) > 1 UCF("km/h", "mph" ) > 2
Lexical numerical constants in the programming languages C, Pascal, and Scheme could be extended to incorporate Metric Interchange Syntax compatibly with their current syntaxes; but this is not required for supporting input and output of units.
"Representation of numerical values in character strings for information interchanges", [ISO 6093], specifies the three machinereadable presentations in widespread use (Integer, Decimal, and Exponential notations) using only the characters:
<space>  
<leftparenthesis>  ( 
<rightparenthesis>  ) 
<comma>  , 
<plussign>  + 
<hyphenminus>   
<period>  . 
<E>  E 
<e>  e 
<digit>  0  9 
In [UTF7] the character PLUSSIGN ("+") is not directly encoded, requiring multioctet encoding. But every [ISO 6093] numeric value can be expressed without the use of PLUSSIGN. So the number syntax given here does not include PLUSSIGN.
Locale charsets all support the digits 0 to 9. There are only 3 LC_NUMERIC attributes: decimal_point, thousands_sep, and grouping. [ISO 6093] specifies use of either "." or "," for the decimal point. [ISO 6093] does not allow grouping. There is no LC_NUMERIC attribute for exponent. Thus Latin characters ("e" or "E") must be available in all languages which support [ISO 6093].
The programming languages C, Fortran, PL/I, Pascal, and Scheme accept [ISO 6093] numbers both as lexical constants and as input data.
Of the SI symbols, the "micro" prefix (GREEKSMALLLETTERMU or MICROSIGN), "ohm" symbol (GREEKCAPITALLETTEROMEGA), and "degree" symbol (DEGREESIGN) are not supported by all charset encodings. By substituting "u", "Ohm", and "o" respectively, the unit symbols remain readable while preserving the system's unambiguity.
Taylor recommends using the MIDDLEDOT character between multiplied unit symbols. To support those charset encodings lacking MIDDLEDOT, metric interchange format instead uses PERIOD (".").
The unit superscript exponents could be formed using SUPERSCRIPTMINUS, SUPERSCRIPTONE, SUPERSCRIPTTWO, SUPERSCRIPTTHREE, etc. But these characters are not universal. So the CIRCUMFLEX ("^") is placed between a unit and its exponent, written with a portable (HYPHENMINUS and) digit.
The symbol for the liter, L, was adopted by the General Conference on Weights and Measures in order to avoid the risk of confusion between the letter l and the number 1 (see [NIST 811]  Units Outside the SI).
Metric Interchange Format (including numbers) uses only the characters:
<leftparenthesis>  ( 
<rightparenthesis>  ) 
<comma>  , 
<hyphenminus>   
<period>  . 
<solidus>  / 
<circumflex>  ^ 
<digit>  0  9 
<upper>  A  Z 
<lower>  a  z 
Computer professionals sometimes use the term "kilobyte" to mean 1024 bytes. However, standards for data interchange must be unambiguous in all contexts. In December 1998 the International Electrotechnical Commission (IEC) approved as an IEC International Standard [IEC 600272] names and symbols for prefixes for binary multiples for use in the fields of data processing and data transmission.
As of 2000, the units bit and byte have not been accepted for use with SI, but are in widespread use. The IEC symbols are "B" for byte and "bit" for bit. To avoid conflict for "B", the bel was replaced by the decibel (dB).
Because white noise power in a bandwidth is proportional to that bandwidth, electronic noise units can have fractional exponents as in nV/Hz^(1/2) (nanovolt per root hertz).
Degree Celsius (oC) is not convertible to kelvin (K) by multiplication of a constant. Thus the formula "oC = K  273.15" does not appear in the "Unit Symbols" table; and the conversionfactor function must return a nonpositive number when called to convert between oC and K.
Because a PERIOD (".") after a numerical lexical constant is not specified in the syntax of the programming languages C, Pascal, and Scheme, the syntax of their lexical constants could be extended to incorporate SI unit symbol suffixes separated by a PERIOD. The syntax of "double" in Java could similarly be extended.
Arnold G. Reinhold and Jon Krom helped develop both MIXF
and CMIXF
.
quantity_value : real  real ' ' unit  real '.' unit  real unit ; unit : unit_product  unit_product '/' single_unit ; unit_product : single_unit  unit_product '.' single_unit ; single_unit : punit  punit '^' uxponent  '(' unit ')'  '(' unit ')^' uxponent ; uxponent : uinteger  '' uinteger  '(' uinteger '/' uinteger ')'  '(' uinteger '/' uinteger ')' ; punit : decimal_multiple_prefix unit_p_symbol  decimal_submultiple_prefix unit_n_symbol  decimal_multiple_prefix unit_b_symbol  decimal_submultiple_prefix unit_b_symbol  binary_prefix 'B'  binary_prefix 'bit'  unit_p_symbol  unit_n_symbol  unit_b_symbol  unit___symbol ; decimal_multiple_prefix : 'E'  'G'  'M'  'P'  'T'  'Y'  'Z'  'da'  'h'  'k' ; decimal_submultiple_prefix : 'a'  'c'  'd'  'f'  'm'  'n'  'p'  'u'  'y'  'z' ; binary_prefix : 'Ei'  'Gi'  'Ki'  'Mi'  'Pi'  'Ti' ; unit_p_symbol : 'B'  'Bd'  'r'  't' ; unit_n_symbol : 'L'  'Np'  'o'  'oC'  'rad'  'sr' ; unit_b_symbol : 'A'  'Bq'  'C'  'F'  'Gy'  'H'  'Hz'  'J'  'K'  'N'  'Ohm'  'Pa'  'S'  'Sv'  'T'  'V'  'W'  'Wb'  'bit'  'cd'  'eV'  'g'  'kat'  'lm'  'lx'  'm'  'mol'  's'  currency_symbol ; unit___symbol : 'd'  'dB'  'h'  'min'  'u' ; currency_symbol : upper_letter upper_letter upper_letter ; upper_letter : 'A'  'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'  'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'  'Y'  'Z' ; real : ureal  '' ureal ; ureal : numerical_value  numerical_value suffix ; numerical_value : uinteger  dot uinteger  uinteger dot uinteger  uinteger dot ; dot : '.'  ',' ; uinteger : digit uinteger  uinteger ; suffix : exponent_marker uinteger  exponent_marker '' uinteger ; exponent_marker : 'e'  'E' ; digit : '0'  '1'  '2'  '3'  '4'  '5'  '6'  '7'  '8'  '9' ;
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and these terms are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to Voluntocracy, except as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be revoked by Voluntocracy or its successors or assigns.
This document and the information contained herein is provided on an "as is" basis and Voluntocracy disclaims all warranties, express or implied, including but not limited to any warranty that the use of the information herein will not infringe any rights or any implied warranties of merchantability or fitness for a particular purpose.
I am a guest and not a member of the MIT Computer Science and Artificial Intelligence Laboratory.
My actions and comments do not reflect in any way on MIT.  
Aubrey Jaffer  agj @ alum.mit.edu  Go Figure! 