beatl2dump

This documentation describes the functionality of the beatl2dump tool which is part of the Basic Envisat Atmospheric Toolbox (BEAT).

Contents

General description

With beatl2dump you can ingest data from any product file that is supported by BEAT-II. The tool allows you to view the ingested BEAT-II record structure, export the record into one of the supported export formats. The tool also allows you to retrieve a BEAT-II record by importing it from one of the supported import file formats.

The operations of beatl2dump can be separated in two stages. First there is the data retrieval stage, in which either data is ingested from one or more product files, or a BEAT-II record is imported from one of the supported import file formats. The second stage involves an action on the data that was retrieved. Currently beatl2dump allows a simple 'list' operation that provides an overview of the fields and its attributes of the data record, an 'export' operation that allows you to export the data record into one of the supported export formats, and a 'test' operation that just stops after the first stage. The way you provide options to beatl2dump for each of the four combinations is shown below.

    beatl2dump list [-f '<filter>'] <product files>
        List the fields of the product file ingestion record

    beatl2dump list -i <import format> <import file>
        List the fields of the BEAT-II record that is stored in 'import file'

    beatl2dump test [-f '<filter>'] <product files>
        Only perform the ingestion (and print any error messages)

    beatl2dump test -i <import format> <import file>
        Only perform the import (and print any error messages)

    beatl2dump <export format> [-o <export file>] [-f '<filter>'] <product files>
        Export the product file ingestion record

    beatl2dump <export format> [-o <export file>] -i <import format> <import file>
        Export the BEAT-II record that is stored in 'import file' to the format
        as specified by 'export format'

The import and export functionality is taken from the BEAT-II C library and the supported import/export formats are thus the same that are supported by this C library. Currently these formats include "ASCII", "BINARY", "NETCDF", "HDF4" and "HDF5". A description of these formats is given below.

When ingesting data from product files you can (and sometimes must) provide an ingestion filter. The contents of this filter string depend on the type of product you want to ingest. For more information on what type of filter options you can pass, look at the documentation for the specific product type you want to ingest in the BEAT-II Data Description documentation that came with the BEAT package.

BEAT-II import/export file formats

BEAT-II supports several file formats into which you can store BEAT-II record data. You are also free to modify this stored data (for instance, to provide your own processing). As long as your modifications to the data stay within the specifications of the data format, BEAT will be able to import your data again. Below is an overview of the format specifications for each of the supported import/export formats of BEAT-II.

ASCII

A BEAT-II ascii file contains a single BEAT-II record. Fields are stored in sequential order in the file and fields are separated by a single empty line. For each field a single header line and one or more lines of data is stored. The header for a field has the following format:

fieldname (datatype) [dim1,dim2,...]

The name of the field should be a valid identifier (which means that the first character should be an alphabetical character (A-Za-z) and the second and following characters may be either alphabetical characters, digits (0-9) or the character '_'. After the field name comes a space and a '(' followed by the data type of the field. This should be either "int32", "double", or "string". The data type should be terminated with a ')'. If the field contains zero dimensional data (i.e. only a single element) then the ')' is immediately folowed by an end of line character. Otherwise the ')' is followed by a space, a dimension specification (a comma-separated list of dimension sizes without spaces) between '[' ']' and a final end of line character. The size of each dimension should be larger than zero. The total amount of dimensions should not be larger than the internal limit that is defined in the BEAT-II C library (this limit is defined in the beatl2.h include file for this library).

All line terminations in the BEAT-II ascii file use the unix convention. This means that lines are terminated by a single end of line character and no carriage return characters are used.

After the header line comes the data for the field. Each array element of the field is printed on a single line. Each line begins with the array indices for each dimension, each index printed as an integer and appended by a single space character. Depending on the data type the data for an array element is printed as follows:

All array elements should be stored in ascending order applying the C style ordering for multidimensional arrays (this means that the last dimension should be the fastest running).

A small example of a BEAT-II ascii file that contains a single string and a 2x2 integer array is given below:

description (string)
This is a test data set containing some sample data

data (int32) [2,2]
0 0 357
0 1 219
1 0 403
1 1 534

BINARY

The binary export method uses a private binary storage format to store a single BEAT-II record. The advantage of this format is that it usually produces smaller data files than the other formats. The disadvantage, however, is that it is more difficult to access data from these files if you do not make use of the available BEAT-II interfaces.

Files in BINARY format have a header section and a sequence of zero or more blocks containing the data for each of the fields. The header for a BINARY file has the following format:

namedescriptiontype
identifierThe identifier has a fixed value: BEATL2DF. This value is used as magic number for the BEAT-II BINARY file format.array [8] of char
versionThe product format version, starting with a value of 0.uint8
num_fieldsThe number of fields in the BEAT-II record (>= 0).int32

The header is followed num_fields blocks with the data for each field. Each block has the following format:

namedescriptiontype
name_lengthThe length of the fieldname (in bytes).int32
nameThe fieldname. This should be a valid identifier, which means that the first character should be an alphabetical character (A-Za-z) and the second and following characters may be either alphabetical characters, digits (0-9) or the character '_'.array [name_length] of char
basic_typeThe basic type of the data in this field (0 = uint8; 1 = int32; 2 = double; 3 = string).uint8
num_dimsThe number of dimensions for the data in this field (>= 0; a value of 0 means the data is a scalar).uint8
dimThe size for each of the dimensions. Each dimension should have a size > 0.array [num_dims] of int32
num_elementsThe total number of elements in this field. This value should equal the product of the dimensions in dim (or 1 if num_dims = 0).int32
dataThe actual data for this field.array [num_elements] of
case basic_type of
valuetype
0uint8
1int32
2double
3
record
namedescriptiontype
string_lengthThe length of the string (in bytes).int32
stringThe string valuearray [string_length] of char

All array elements should be stored in ascending order applying the C style ordering for multidimensional arrays (this means that the last dimension should be the fastest running).

The basic types are defined as follows:

typesize [bytes]description
char1ASCII character
uint81unsigned 8-bit integer
int324signed 32-bit integer
double8IEEE754 double

All values in the BEAT-II BINARY format are stored in big-endian format (i.e. the most significant byte is stored first).

The prefered extension to use for exported files in BINARY format is .bl2 (which is an abbreviation of 'BEAT Layer 2')

NETCDF

A netCDF file contains a single BEAT-II record. Each field of the record is stored as a netCDF variable. There are no attributes (either global or variable attributes) written to the file. A variable gets the same name as the field. Uint8 data is stored as NC_BYTE, int32 as NC_INT, double as NC_DOUBLE, and string as NC_CHAR. Since netCDF does not support strings directly, a BEAT-II record field that contains string data is converted to a variable of type NC_CHAR. This means that the variable will receive an extra dimension at the end containing the maximum string length of all the strings in the BEAT-II record field. The string data is written to this variable without the null termination character and appended with white space characters if a string is shorter than the maximum string length. When a variable is imported back again all white space at the end of a string will be removed (this will have the side effect that if you export a string that ends with one or more spaces to a netCDF file and you import the string again these spaces will be gone!).

The prefered extension to use for exported files in netCDF format is .nc

HDF4

A HDF4 file contains a single BEAT-II record. Each field of the record is stored in a separate SD (Scientific Dataset). All SDs are stored in the root of the HDF4 file (i.e. there is no utilization of VGroups). An SD is given the same name as the field. For uint8, int32 and double data the dimensionality of the SD is the same as that of the field and the data type is also kept equal (i.e. UINT8, INT32 and FLOAT64). However, there is one exception. Because HDF4 does not support zero dimensional data a BEAT-II record field with zero dimensions is stored as a 1 dimensional SD in HDF4 and the SD will have an attribute "num_dims" associated with it of type INT8 with a single value of 0. During import this attribute will be used again to create a field of zero dimensions.

Since HDF4 does not support strings directly, a BEAT-II record field that contains string data is converted to a SD of type CHAR. This approach is identical to the way string data is handled for the netCDF export (see above).

The prefered extension to use for exported files in HDF4 format is .hdf

HDF5

Just as for HDF4, a HDF5 file contains a single BEAT-II record. Each field of a record is stored in a separate Dataset. All Datasets are stored in the root group ("/") and are given the same name as the field of the BEAT-II record. The dimensionality of the Dataspace used for a Dataset is the same as that of the field (this is also true for string data). The BEAT-II data types uint8, int32, double, and string are mapped to H5T_STD_U8BE, H5T_STD_I32BE, H5T_IEEE_F64BE, and H5T_C_S1. The H5T_C_S1 data is stored with the variable size property enabled.

The prefered extension to use for exported files in HDF5 format is .h5 or .hdf5