Reading TDMS files

To read a TDMS file, create an instance of the TdmsFile class using one of the static nptdms.TdmsFile.read() or nptdms.TdmsFile.open() methods, passing the path to the file, or an already opened file. The read() method will read all channel data immediately:

tdms_file = TdmsFile.read("my_file.tdms")

If using the open() method, only the file metadata will be read initially, and the returned TdmsFile object should be used as a context manager to keep the file open and allow channel data to be read on demand:

with TdmsFile.open("my_file.tdms") as tdms_file:
    # Use tdms_file
    ...

Using an instance of TdmsFile, groups within the file can be accessed by indexing into the file with a group name, or all groups can be retrieved as a list with the groups() method:

group = tdms_file["group name"]
all_groups = tdms_file.groups()

A group is an instance of the TdmsGroup class, and can contain multiple channels of data. You can access channels in a group by indexing into the group with a channel name or retrieve all channels as a list with the channels() method:

channel = group["channel name"]
all_group_channels = group.channels()

Channels are instances of the TdmsChannel class and act like arrays. They can be indexed with an integer index to retrieve a single value or with a slice to retrieve all data or a subset of data as a numpy array:

all_channel_data = channel[:]
data_subset = channel[100:200]
first_channel_value = channel[0]

If the channel contains waveform data and has the wf_start_offset and wf_increment properties, you can get an array of relative time values for the data using the time_track() method:

time = channel.time_track()

In addition, if the wf_start_time property is set, you can pass absolute_time=True to get an array of absolute times in UTC.

A TDMS file, group and channel can all have properties associated with them, so each of the TdmsFile, TdmsGroup and TdmsChannel classes provide access to these properties as a dictionary using their properties attribute:

# Iterate over all items in the file properties and print them
for name, value in tdms_file.properties.items():
    print("{0}: {1}".format(name, value))

# Get a single property value from the file
property_value = tdms_file.properties["my_property_name"]

# Get a group property
property_value = tdms_file["group name"].properties["group_property_name"]

# Get a channel property
property_value = tdms_file["group name"]["channel name"].properties["channel_property_name"]

In addition to the properties dictionary, all groups and channels have name and path attributes. The name is the human readable name of the group or channel, and the path is the full path to the TDMS object, which includes the group name for channels:

group = tdms_file["group name"]
channel = group["channel name"]
print(group.name)    # Prints "group name"
print(group.path)    # Prints "/'group name'"
print(channel.name)  # Prints "channel name"
print(channel.path)  # Prints "/'group name'/'channel name'"

Reading large files

TDMS files are often too large to easily fit in memory so npTDMS offers a few ways to deal with this. A TDMS file can be opened for reading without reading all the data immediately using the static open() method, then channel data is read as required:

with TdmsFile.open(tdms_file_path) as tdms_file:
    channel = tdms_file[group_name][channel_name]
    all_channel_data = channel[:]
    data_subset = channel[100:200]

TDMS files are written in multiple segments, where each segment can in turn have multiple chunks of data. When accessing a value or a slice of data in a channel, npTDMS will read whole chunks at a time. npTDMS also allows streaming data from a file chunk by chunk using nptdms.TdmsFile.data_chunks(). This is a generator that produces instances of DataChunk. For example, to compute the mean of a channel:

channel_sum = 0.0
channel_length = 0
with TdmsFile.open(tdms_file_path) as tdms_file:
    for chunk in tdms_file.data_chunks():
        channel_chunk = chunk[group_name][channel_name]
        channel_length += len(channel_chunk)
        channel_sum += channel_chunk[:].sum()
channel_mean = channel_sum / channel_length

This approach can be useful to stream TDMS data to another format on disk or into a data store. It’s also possible to stream data chunks for a single channel using nptdms.TdmsChannel.data_chunks():

with TdmsFile.open(tdms_file_path) as tdms_file:
    channel = tdms_file[group_name][channel_name]
    for chunk in channel.data_chunks():
        channel_chunk_data = chunk[:]

If you don’t need to read the channel data at all and only need to read metadata, you can also use the static read_metadata() method:

tdms_file = TdmsFile.read_metadata(tdms_file_path)

In cases where you need to work with large arrays of channel data as if all data was in memory, you can also pass the memmap_dir argument when reading a file. This will read data into memory mapped numpy arrays on disk, and your operating system will then page data in and out of memory as required:

with tempfile.TemporaryDirectory() as temp_memmap_dir:
    tdms_file = TdmsFile.read(tdms_file_path, memmap_dir=temp_memmap_dir)

Timestamps

By default, timestamps are read as numpy datetime64 objects with microsecond precision. However, TDMS files are capable of storing times with a precision of 2-64 seconds. If you need access to this higher precision timestamp data, all methods for constructing a TdmsFile accept a raw_timestamps parameter. When this is true, any timestamp properties will be returned as a TdmsTimestamp object. This has seconds and second_fractions attributes which are the number of seconds since the epoch 1904-01-01 00:00:00 UTC, and a positive number of 2-64 fractions of a second. This class has methods for converting to a numpy datetime64 object or datetime.datetime. For example:

>>> timestamp = channel.properties['wf_start_time']
>>> timestamp
TdmsTimestamp(3670436596, 11242258187010646344)
>>> timestamp.seconds
3670436596
>>> timestamp.second_fractions
11242258187010646344
>>> print(timestamp)
2020-04-22T21:43:16.609444
>>> timestamp.as_datetime64('ns')
numpy.datetime64('2020-04-22T21:43:16.609444037')
>>> timestamp.as_datetime()
datetime.datetime(2020, 4, 22, 21, 43, 16, 609444)

When setting raw_timestamps to true, channels with timestamp data will return data as a TimestampArray rather than as a datetime64 array. This is a subclass of numpy.ndarray with additional properties and an as_datetime64() method for converting to a datetime64 array, and elements in the array are returned as TdmsTimestamp instances:

>>> timestamp_data = channel[:]
>>> timestamp_data
TimestampArray([(8942011409353408512, 3670436596), (9643130391967563776, 3670436596),
                (9661619779500244992, 3670436596), ..., (1366710545511612416, 3670502040),
                (1476995959824056320, 3670502040), (1587685994415521792, 3670502040)],
               dtype=[('second_fractions', '<u8'), ('seconds', '<i8')])
>> timestamp_data[0]
TdmsTimestamp(3670436596, 8942011409353408512)
>>> timestamp_data.seconds
array([3670436596, 3670436596, 3670436596, ..., 3670502040, 3670502040, 3670502040], dtype=int64)
>>> timestamp_data.second_fractions
array([8942011409353408512, 9643130391967563776, 9661619779500244992, ..., 1366710545511612416,
       1476995959824056320, 1587685994415521792], dtype=uint64)
>>> timestamp_data.as_datetime64('us')
array(['2020-04-22T21:43:16.484747', '2020-04-22T21:43:16.522755', '2020-04-22T21:43:16.523757', ...,
       '2020-04-23T15:54:00.074089', '2020-04-23T15:54:00.080068', '2020-04-23T15:54:00.086068'],
      dtype='datetime64[us]')

Timestamps in TDMS files are stored in UTC time and npTDMS does not do any timezone conversions. If timestamps need to be converted to the local timezone, the arrow package is recommended. For example:

import datetime
import arrow

timestamp = channel.properties['wf_start_time']
local_time = arrow.get(timestamp.astype(datetime.datetime)).to('local')
print(local_time.format())

Here we first convert the numpy datetime64 object to Python’s built in datetime type before converting it to an arrow time, then convert it from UTC to the local timezone.

Scaled data

The TDMS format supports different ways of scaling data, and DAQmx raw data in particular is usually scaled. The data retrieved from a TdmsChannel has scaling applied. If you have opened a TDMS file with read(), you can access the raw unscaled data with the raw_data property of a channel. Note that DAQmx channels may have multiple raw scalers rather than a single raw data channel, in which case you need to use the raw_scaler_data property to access the raw data as a dictionary of scaler id to raw data array.

When you’ve opened a TDMS file with open(), you instead need to use read_data, passing scaled=False:

with TdmsFile.open(tdms_file_path) as tdms_file:
    channel = tdms_file[group_name][channel_name]
    unscaled_data = channel.read_data(scaled=False)

This will return an array of raw data, or a dictionary of scaler id to raw scaler data for DAQmx data.

Conversion to other formats

npTDMS has convenience methods to convert data to Pandas DataFrames or HDF5 files. The TdmsFile class has as_dataframe() and as_hdf() methods to convert a whole file to a DataFrame or HDF5 file. In addition there is an as_dataframe() method on TdmsGroup and an as_dataframe() method on TdmsChannel for converting a single group or channel to a Pandas DataFrame.

Thread safety

When a TDMS file is opened with open(), the returned TdmsFile object is not thread-safe and reading from it concurrently will result in undefined behaviour. If you need to read from the same file concurrently you should open a new TdmsFile per thread.

When a TDMS file is read with read(), the returned TdmsFile is safe to read from concurrently as all data has been read from the file upfront.