Segmentation Settings

Segmentation is a vital aspect of how 1010data operates. Segments are how data is broken up into files on the 1010data servers, and it is done specifically to ensure that all like-values for a given segmentation arrangement are stored together. For instance, if a table is segmented first by date, then all records for each date in the table are guaranteed to be stored in the same logical space. Note that many dates may be contained in a single logical storage space. However, segmentation makes sure that a single date cannot be stored separately. Segmentation is very important for using 1010data's most powerful features, such as g_functions and time-series analysis. Tenup provides several options for specifying the segmentation of your new table.

To begin, Tenup provides an option for specifying the target of the segments of a given table in terms of the number of records in each segment. 1010data servers can handle segments of more than 8 million rows. However, 3 - 5 million row segments are the most common. To specify the segment size of a table, use the -b option:

The default segment size for extract and load jobs that do not specify segment size is 8388608. Note that some segmentation strategies may cause the final segment size to be different than that specified with the -b option.

Segmentation Columns

When loading data with Tenup you may choose to segment the data by specifying a column or columns. The column(s) specified indicate that no like-values for that column may be stored separately, and ensures that all like values for the columns used for segmentation are stored and accessed in the same location. So, if a table is segmented by date, then all values for a given date must be co-located in the same segment. The -j option is used for specifying segmentation columns:
Note: If you use the -j option you must specify at least one column.
$ tenup64 -u [USERNAME] -p [PASSWORD] -C [CONNSTR] [PATH_TO_NEW_TABLE] [QUERY] -b 5000000 -j [SEGBY_COL1],[SEGBY_COL2],...[SEGBY_COLn]

It is also possible to specify how the table is to be sorted within a segment after the data is loaded in 1010data. Sort order is specified with the -$ option:

$ tenup64 -u [USERNAME] -p [PASSWORD] -C [CONNSTR] [PATH_TO_NEW_TABLE] [QUERY] -b 5000000 -j date,transid -$ [SORT_COL1],[SORT_COL2],...[SORT_COLn] 

Note that the table is sorted in reverse order of the column names provided. In the instance above, the table will first be sorted by [SORTCOLn], then by [SORT_COL2], and finally by [SORT_COL1].

SortSeg Control

A SortSeg is a special kind of segmentation that stipulates that each group of like values in a segment starts with the lowest or most recent value and ends with the highest or latest value. SortSeg improves performance for some kinds of analysis by enabling the system to assume the start and end points and skip past entire groups of like values or even entire segments. SortSeg is controlled with the -J option:


Indexing Columns

Tenup provides an option for indexing specific columns after they are loaded in 1010data. Indexing improves the performance of some kinds of analysis by mapping the location of specific values in a column based on where they are located in a segment. Indexing is performed with the -X option:

Linking to an Existing Table in 1010data

Tenup provides an option for linking to an existing table during a load process. Linking with Tenup actually performs a prelink between the table that is being loaded and the foreign table. A prelink simply creates a map between the two tables, based on the columns specified in the list of local columns and the list of foreign columns. The map is then saved with the new table. In Tenup, the -, option specifies a link:
Tenup also provides an optional argument for the -, switch where a list of columns in the foreign table to denormalize into the loaded table may be specified, as follows:
Note: When a list of columns to denormalize is included the foreign table is not pre-linked to the base table.