Multifractal analysis of heart rate data

This tutorial demo will show how to perform multifractal analysis on a database of heart rate data.

It will illustrate the use of class MF_BS_tool_inter_db to analyze an entire database in a single command.

Roberto Leonarduzzi, Patrice Abry June 2018

Contents

Setup path

Add root folder of toolbox to path (subfolders will be added automatically)

addpath ~/Documents/code/mf_bs_tool

Input data

We start by reading the data.

The file 'data_hrv.mat' contains a a part of the "Normal Sinus Rhythm RR Interval Database', made available by Phyisionet: https://physionet.org/physiobank/database/nsr2db/

The file only contains the first 6 hours of each record, in a cell array containing the time of each beat and the RR interval.

% Load the cell array data
load ('data_hrv.mat', 'data')

% Plot an example
select = 3;
figure (99)
time = data{select, 1} / 3600; % Convert time to hours
rr = data{select, 2};
plot (time, rr , 'r')

Preprocessing

As we can see from figure 99, the data contains a fair number of large outliers. Multifractal anlaysis is extremely sensitive to such large events, and we need to remove them.

We remove the ouliers in a simple way. We reject interval RR_i if:

1) RR_i is larger than 0.35 s OR RR_i is larger than 2 s

2) RR_i is more than 20% larger than the median interval on a 41 beat moving window.

After the filtering, we interpolate the data to a uniform grid with a sampling frequency of 8 Hz, using cubic splines.

Finally, we verify the performance of the procedure on a selected record.

rr_min = 0.35;  % Smallest accepted rr_interval
rr_max = 2;     % Smallest accepted rr_interval
win_size = 41;  % Size of local median window
halfw = floor (win_size / 2);  % Half window size
fs = 8;  % Sampling frequency for interpolation
data_orig = data;

for ir = 1 : length (data)
    rr = data{ir, 2};
    t = data{ir, 1};

    % Mark intervals exceeding thresholds
    rr(rr < rr_min | rr > rr_max) = NaN;

    % Moving median filters
    for k = 1 : length (rr)
        win_beg = max (k - halfw + 1, 1);
        win_end = min (k + halfw, length (rr));
        if rr(k) > 1.2 * nanmedian (rr(win_beg : win_end));
            rr(k) = NaN;  % Mark unacceptable intervals
        end
    end

    % Remove marked
    t = t(~isnan (rr));
    rr = rr(~isnan (rr));

    % Uniform spline interpolation to target sampling frequency
    t_new = t(1) : 1 / fs : t(end);
    rr_new = interp1 (t, rr, t_new, 'spline');

    data{ir, 1} = t_new;
    data{ir, 2} = rr_new;
end

% Verify denoising on selected record
figure (99)
plot (data_orig{select, 1} / 3600, data_orig{select, 2}, 'k')
hold on
plot (data{select, 1} / 3600, data{select, 2}, 'r')
hold off
axis tight
ylim ([rr_min rr_max])
legend ({'original', 'clean'}, 'Location', 'SouthEast')
xlabel ('Time [hours]')
ylabel ('RR [seconds]')

Multifractal analysis of entire database: setup

The toolbox allows to perform the analysis of an entire database with a single command. In this case, we need to use the class 'MF_BS_tool_inter_db', where 'db' stands for database.

% Create an object of class 'MF_BS_tool_inter_db'
mo = MF_BS_tool_inter_db;

% Select which multiresolution quantities to use.
% 1: wavelet coefficients (DWT), 2: wavelet leaders (LWT)
mo.method_mrq = [1 2];

% Select the number of vanishing moments for de Daubechies wavelet
mo.nwt = 3;

% Select the statistical orders q
mo.q = [-5 -3 -2 -1 1 2 3 5];
% This will produce 10 values of q between 0.01 and 10, and then add
% their opossites also.

% Select the number of cumulants to compute
mo.cum = 2;

% Indicate starting number for figures
mo.fig_num = 100;

% Set up versbosity level.
mo.verbosity = 1;   % Number xy, with x,y either 0 or 1
                    %    x = 1: display figures
                    %    y = 1: display tables with estimates

% Select way to present results:
mo.bs_plot_type = 'superimpose';

Reader object

The tool can analyze databases in many formats, like a matrices or cell arrays with all the data, a list of files, or a directory. To have an abstract access to all these formats, it needs a 'reader' object that manage the access to the data.

In this case, since we have our data in a cell array, we need to use the function 'create_db_reader_cell' to create the proper reader.

% Select only rr_data from cell array:
rr_data = data(:, 2);

% Create reader object to read the cell array:
reader = create_db_reader_cell (rr_data);

Perform multifractal analysis

Finally, we perform the analysis by calling the analyze method with the reader we have just created.

As usuall, we concentrate first on the determination of the scaling range and fractional integration order, using figures 103 and 105. Since we selected bs_plot_type = 'superimpose', we see the logscale diagrams for the individual realizations, which is useful for this task.

% Analyze the database accessed by the reader
mo.analyze (reader);

% Momentarily close figures we don't need
for f = [100, 101, 102, 104]; close(f); end

Setup scaling range:

From figures 103 and 105 we see that the default scaling range is not a good choice, and that a better one would be, for instance, [6-11].

We also see that, in that scaling range, the logscale plot for h_min shows slopes around 0, both positive and negative. Just to be conservative, we will select a value of gamint = 1, equivalent to performing a full integration on the data.

We also increase the range of q, to produce smoother estimates of the spectrum.

We repeat the analysis with the new setup. We need to create a new instance of the reader, since the old one has already been exhausted.

% Adjust scaling range
mo.j1 = 6;
mo.j2 = 12;

% Adjust franctional integration
mo.gamint = 1;

% Increase range of q
mo.q = build_q_log (0.01, 10, 10);

% Analyze the data again
reader = create_db_reader_cell (rr_data);
mo.analyze (reader);

% Momentarily close figures we don't need
for f = [103, 104]; close(f); end

Show means and confidence intervals

In figure 100 we see boxplots of the estimates for c1 and c2, as well as their mean (black line). They already show an interesting fact: the confidence intervals for c2 computed from LWT indicate that c2 is significantly different from 0, confirming that the data are multifractal.

Figure 101 shows the estimates of the multifractal spectrum. However, it is difficult to visualize with the superposition of all spectra. We will change the visualization type, indicating the toolbox to show confidence intervals instead.

In this case we do not need to recompute everything. If we call
the analyze method without parameters, the tool will recreate
the plot without recomputing.
% Change visualization type to confidence intervals
mo.bs_plot_type = 'ci';

% Recreate the plots without recomputing
mo.analyze ();

% Momentarily close figures we don't need
for f = [100, 103, 104, 105]; close(f); end

Scatter plots of results

Finally, we create a scatter plot with the multifractal parameters c1 and c2.

The mean estimates are stored in field mo.est.LWT.t, and the individual estimates of each signals are stored in mo.est.LWT.BS.T.

As usual, both fields contain all the estimates: zeta(q), D(q), h(q) and cp. To extract only the log-cumulants, we use the get_cid function to get the corresponding indices.

There doesn't seem to be a large structure on the scatter plot.

% Get indices for log cumulants
cid = mo.get_cid ();

% Access mean estimates
cp_mean = mo.est.LWT.t(cid);

% Access individual estimates for each signal
cp_all = mo.est.LWT.BS.T(:, cid);

% Load gender grouping variables as logical values:
gender = load ('genders_binary.txt');  % 0: feminin, 1: masculin
gender = logical (gender);

% Create scatter plot
figure (110)
% Plot feminine subjects
scatter (cp_all(~gender, 1), cp_all(~gender, 2), 'ro')
hold on
% Plot masculine subjects
scatter (cp_all(gender, 1), cp_all(gender, 2), 'k+')
hold off
grid on
xlabel ('c_1')
ylabel ('c_2')
title ('Scatter plot: multifractal parameters')
legend ('Masc', 'Fem', 'Location', 'NorthWest')

Classify using k-NearestNeighbors

As an illustration, we attempt to predict gender based on multifractal attributes.

We use a simple k-NearestNeighbors classifier. We use multifractal parameters $c_1$ and $c_2$ (columns of variable cp_all) as predictors, and the gender as response.

% Train classifier using 11 neighbours, and 5-fold cross-validation
knn_model = fitcknn(cp_all, gender, ...
                    'NumNeighbors', 11, 'Standardize', 1, 'kFold', 10);

% Measure classification accuracy:
accuracy = 1 - knn_model.kfoldLoss ();

fprintf ('Accuracy: %g\n', accuracy)
Accuracy: 0.703704