Correlation and inference within and across brain areas¶

Our project aims at inferring or predicting the neuronal activity and thoroughly studying the strength of the correlation measure. We chose diferent models and methods to perform these analysis and understand if and how these results change on the basis of the structure of brain areas.

0. Importing libraries¶

In [1]:
from allensdk.brain_observatory.ecephys.ecephys_project_cache import EcephysProjectCache
import os
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import classification_report
import seaborn as sns
import matplotlib.pyplot as plt
from mycolorpy import colorlist as mcp
import pickle
import plotly.express as px
import math
from matplotlib.patches import Patch
/Users/isottamagistrali/opt/anaconda3/envs/envallen/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

1. The data¶

1.1 Loading the data¶

The first step is to load the data from the Allen Brain Observatory database.

In [2]:
# Change this path to the location of the data in your computer

data_dir = "/Users/isottamagistrali/Library/Mobile Documents/com~apple~CloudDocs/Bocconi/Semestre 3.2/Neurosciencee/allendata"

# Loading the metadata contained in the cache which refers to all experiments

manifest_path = os.path.join(data_dir, "manifest.json")
cache = EcephysProjectCache.from_warehouse(manifest=manifest_path)

The available data is divided in sessions. Each session corresponds to an experiment conducted on a different individual. Each session has some information regarding the type of experiment done and informations about the specimen as can be seen in the following table.

In [ ]:
# Showing the first five entries of the session table

cache.get_session_table().head()
published_at specimen_id session_type age_in_days sex full_genotype unit_count channel_count probe_count ecephys_structure_acronyms
id
715093703 2019-10-03T00:00:00Z 699733581 brain_observatory_1.1 118.0 M Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt 884 2219 6 [CA1, VISrl, nan, PO, LP, LGd, CA3, DG, VISl, ...
719161530 2019-10-03T00:00:00Z 703279284 brain_observatory_1.1 122.0 M Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt 755 2214 6 [TH, Eth, APN, POL, LP, DG, CA1, VISpm, nan, N...
721123822 2019-10-03T00:00:00Z 707296982 brain_observatory_1.1 125.0 M Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt 444 2229 6 [MB, SCig, PPT, NOT, DG, CA1, VISam, nan, LP, ...
732592105 2019-10-03T00:00:00Z 717038288 brain_observatory_1.1 100.0 M wt/wt 824 1847 5 [grey, VISpm, nan, VISp, VISl, VISal, VISrl]
737581020 2019-10-03T00:00:00Z 718643567 brain_observatory_1.1 108.0 M wt/wt 568 2218 6 [grey, VISmma, nan, VISpm, VISp, VISl, VISrl]

There is a lot more material that could be extracted from the metadata contained inside the cache besides the list of all sessions, but given the vastness of the available information, going deeper inside the structure of the data of all sessions combined is beyond the scope of our research.

For our research we are going to focus only on the data for one specific session, although the work that will follow could be extended by including data from multiple sessions. We will discuss at the end our ideas for possible further work.

In [3]:
# Loading data for our chosen session

session_id = 798911424 
oursession = cache.get_session_data(session_id, timeout=3000)
/Users/isottamagistrali/opt/anaconda3/envs/envallen/lib/python3.11/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
  return func(args[0], **pargs)
/Users/isottamagistrali/opt/anaconda3/envs/envallen/lib/python3.11/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.2 because version 2.6.0-alpha is already loaded.
  return func(args[0], **pargs)

1.2 Available data¶

Even for a single session there are a lot of data at our disposal, which can be summarized in the following categories:

  • Metadata
  • Probes
  • Channels
  • Units
  • Stimuli
  • Spikes

Metadata: general information regarding the session and the individual taking part in the experiment

In [ ]:
oursession.metadata
{'specimen_name': 'Vip-IRES-Cre;Ai32-421338',
 'session_type': 'brain_observatory_1.1',
 'full_genotype': 'Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt',
 'sex': 'F',
 'age_in_days': 110.0,
 'rig_equipment_name': 'NP.1',
 'num_units': 825,
 'num_channels': 2233,
 'num_probes': 6,
 'num_stimulus_presentations': 70931,
 'session_start_time': datetime.datetime(2018, 12, 21, 0, 2, 57, tzinfo=tzoffset(None, -28800)),
 'ecephys_session_id': 798911424,
 'structure_acronyms': ['LP',
  'DG',
  'CA1',
  'VISam',
  nan,
  'APN',
  'TH',
  'Eth',
  'CA3',
  'VISrl',
  'HPF',
  'ProS',
  'SUB',
  'VISp',
  'CA2',
  'VISl',
  'MB',
  'NOT',
  'LGv',
  'VISal'],
 'stimulus_names': ['spontaneous',
  'gabors',
  'flashes',
  'drifting_gratings',
  'natural_movie_three',
  'natural_movie_one',
  'static_gratings',
  'natural_scenes',
  'drifting_gratings_contrast']}

Here we can see some information that will be useful later such as the acronymns for the structures of the brain and the list of all stymulus that have been presented during the experiment. We show them again separately for more clarity.

In [ ]:
oursession.structure_acronyms
['LP',
 'DG',
 'CA1',
 'VISam',
 nan,
 'APN',
 'TH',
 'Eth',
 'CA3',
 'VISrl',
 'HPF',
 'ProS',
 'SUB',
 'VISp',
 'CA2',
 'VISl',
 'MB',
 'NOT',
 'LGv',
 'VISal']
No description has been provided for this image
In [ ]:
oursession.stimulus_names
['spontaneous',
 'gabors',
 'flashes',
 'drifting_gratings',
 'natural_movie_three',
 'natural_movie_one',
 'static_gratings',
 'natural_scenes',
 'drifting_gratings_contrast']

Spontaneous refers to a period in which there is nothing shown on the screen. We can also see that in this particular experiment the 'Dot motion' stimulus is not present.

alt text

Probes and Channels: information regarding the phisical instruments used to collect the data, which consist of probes with numerous channels each. We are going to ignore this data in our research, but we show it for completeness since it might be used in possible extensions.

In [ ]:
print('Number of probes:', len(oursession.probes))
oursession.probes.head()
Number of probes: 6
description location sampling_rate lfp_sampling_rate has_lfp_data
id
800036196 probeA See electrode locations 29999.965974 1249.998582 True
800036198 probeB See electrode locations 29999.917201 1249.996550 True
800036200 probeC See electrode locations 29999.996048 1249.999835 True
800036202 probeD See electrode locations 29999.919900 1249.996662 True
800036204 probeE See electrode locations 29999.999422 1249.999976 True
In [ ]:
print('Number of channels:', len(oursession.channels))
oursession.channels.head()
Number of channels: 2233
filtering probe_channel_number probe_horizontal_position probe_id probe_vertical_position structure_acronym ecephys_structure_id ecephys_structure_acronym anterior_posterior_ccf_coordinate dorsal_ventral_ccf_coordinate left_right_ccf_coordinate
id
849854220 AP band: 500 Hz high-pass; LFP band: 1000 Hz l... 4 43 800036198 60 LP 218.0 LP 7647.0 3029.0 7470.0
849854224 AP band: 500 Hz high-pass; LFP band: 1000 Hz l... 6 59 800036198 80 LP 218.0 LP 7650.0 3011.0 7469.0
849854226 AP band: 500 Hz high-pass; LFP band: 1000 Hz l... 7 27 800036198 80 LP 218.0 LP 7651.0 3002.0 7469.0
849854230 AP band: 500 Hz high-pass; LFP band: 1000 Hz l... 9 11 800036198 100 LP 218.0 LP 7654.0 2985.0 7468.0
849854234 AP band: 500 Hz high-pass; LFP band: 1000 Hz l... 11 27 800036198 120 LP 218.0 LP 7657.0 2967.0 7466.0
Probes Channels in a probe
No description has been provided for this image
No description has been provided for this image

Units: information about neruons and their general properties across the whole experiment

In [ ]:
print('Number of units:', len(oursession.units))
oursession.units.head()
Number of units: 825
waveform_PT_ratio waveform_amplitude amplitude_cutoff cluster_id cumulative_drift d_prime firing_rate isi_violations isolation_distance L_ratio ... ecephys_structure_id ecephys_structure_acronym anterior_posterior_ccf_coordinate dorsal_ventral_ccf_coordinate left_right_ccf_coordinate probe_description location probe_sampling_rate probe_lfp_sampling_rate probe_has_lfp_data
unit_id
951088679 0.620607 82.147455 0.023654 1 480.97 2.575648 7.429131 0.053350 51.364291 0.028062 ... 215.0 APN 8328.0 3018.0 7095.0 probeA See electrode locations 29999.965974 1249.998582 True
951088664 0.587044 78.399165 0.001739 0 427.07 3.085334 6.843864 0.033732 40.358293 0.051513 ... 215.0 APN 8328.0 3018.0 7095.0 probeA See electrode locations 29999.965974 1249.998582 True
951088734 0.560996 187.504005 0.001249 5 311.01 5.152564 1.418476 0.356931 56.950961 0.001292 ... 215.0 APN 8316.0 2988.0 7099.0 probeA See electrode locations 29999.965974 1249.998582 True
951088721 0.474457 196.741545 0.000059 4 355.17 4.295014 11.517381 0.000271 58.358552 0.020218 ... 215.0 APN 8316.0 2988.0 7099.0 probeA See electrode locations 29999.965974 1249.998582 True
951088862 0.534674 92.125020 0.089901 16 239.50 4.043348 19.953126 0.013619 79.934237 0.011335 ... 215.0 APN 8304.0 2958.0 7103.0 probeA See electrode locations 29999.965974 1249.998582 True

5 rows × 40 columns

Stimuli: information about each stimulus presented during the experiment, including type of stimulus, time interval and stimulus specific informations

As we saw before there are only a handful of stimulus categories, but each category includes a lot of different stimuli, which are in turn presented multiple times each

In [ ]:
stimulus_table = oursession.get_stimulus_table()

print('Number of stimuli shown:', len(stimulus_table))
print('Number of unique stimuli:', len(stimulus_table['stimulus_condition_id'].unique()))
print('Number of unique stimulus names:', len(oursession.stimulus_names))

stimulus_table.head()
Number of stimuli shown: 70931
Number of unique stimuli: 5063
Number of unique stimulus names: 9
stimulus_block start_time stop_time phase spatial_frequency stimulus_name orientation frame color x_position contrast temporal_frequency y_position size duration stimulus_condition_id
stimulus_presentation_id
0 null 24.875987 84.942787 null null spontaneous null null null null null null null null 60.066800 0
1 0.0 84.942787 85.176306 [3644.93333333, 3644.93333333] 0.08 gabors 90.0 null null 20.0 0.8 4.0 -10.0 [20.0, 20.0] 0.233519 1
2 0.0 85.176306 85.426505 [3644.93333333, 3644.93333333] 0.08 gabors 0.0 null null 30.0 0.8 4.0 40.0 [20.0, 20.0] 0.250199 2
3 0.0 85.426505 85.676704 [3644.93333333, 3644.93333333] 0.08 gabors 45.0 null null 40.0 0.8 4.0 -30.0 [20.0, 20.0] 0.250199 3
4 0.0 85.676704 85.926904 [3644.93333333, 3644.93333333] 0.08 gabors 90.0 null null 0.0 0.8 4.0 -40.0 [20.0, 20.0] 0.250199 4

Spikes: information about the activity of the neurons. Each neuron is associated to an array of timestamps signaling that the neuron has spiked in that instant.

In [ ]:
list(oursession.spike_times.items())[:5]
[(951088862,
  array([3.71142233e+00, 3.72135568e+00, 3.73648903e+00, ...,
         1.04413344e+04, 1.04414476e+04, 1.04415437e+04])),
 (951088823,
  array([3.71175567e+00, 3.73015569e+00, 3.95735595e+00, ...,
         1.04408445e+04, 1.04415059e+04, 1.04415614e+04])),
 (951088939,
  array([3.73448903e+00, 3.96052262e+00, 4.44302316e+00, ...,
         1.04414054e+04, 1.04415481e+04, 1.04416177e+04])),
 (951088968,
  array([3.70888900e+00, 3.71578901e+00, 3.74058903e+00, ...,
         1.04415843e+04, 1.04416096e+04, 1.04416475e+04])),
 (951089020,
  array([3.71198900e+00, 3.72525568e+00, 3.75238905e+00, ...,
         1.04416045e+04, 1.04416177e+04, 1.04416401e+04]))]

In our study we are going to use primarily the spiking data. We are also going to take the information about the structure in which each neuron is located from the units data and information about the stimuli in order to restrict our study to specific areas and specific stimuli. From now on we are only going to consider this data.

In [ ]:
spike_times = oursession.spike_times
unit_area = oursession.units['ecephys_structure_acronym']
stimulus_table = oursession.get_stimulus_table()

1.3 Analyzing the data¶

Before starting to utilize the data to implement models to try to find an asnwer to our question we do a quick analysis in order to get a deeper understanding and subdivide it in smaller portions. This is to allow us to start with a more focused study with the hope of finding promising results in at least some of the subset of data and then progressively move on to using the whole dataset.

In [ ]:
print('Number of units in each area:')
print(unit_area.value_counts())
Number of units in each area:
VISam    135
CA1      134
VISp      94
VISal     89
VISl      78
LP        65
SUB       59
VISrl     47
DG        31
APN       25
CA3       21
NOT       16
LGv       16
ProS       9
CA2        2
TH         2
MB         1
Eth        1
Name: ecephys_structure_acronym, dtype: int64
In [ ]:
#static 3D spatial representation of neurons
units = oursession.units
color1 = mcp.gen_color(cmap="tab20",n=18)
fig = plt.figure(figsize = (15, 8))
ax = fig.add_subplot(projection='3d')
counter = 0
for struct in units.structure_acronym.unique():
    units1 = units[units.structure_acronym == struct]
    ax.scatter(units1['left_right_ccf_coordinate'], units1['dorsal_ventral_ccf_coordinate'], units1['anterior_posterior_ccf_coordinate'], color = color1[counter], s = 100, edgecolor = color1[counter]);
    counter += 1
ax.legend(units.structure_acronym.unique(), loc = (1,0.1), fontsize = 14.5);
ax.set_xlabel('anterior -> posterior')
ax.set_ylabel('left -> right')
ax.set_zlabel('dorsal -> ventral');
ax.view_init(azim=175, elev=30)
No description has been provided for this image
In [ ]:
#dynamic 3D spatial representation of neurons
fig = px.scatter_3d(x = units['left_right_ccf_coordinate'], y = units['dorsal_ventral_ccf_coordinate'], z = units['anterior_posterior_ccf_coordinate'], color = units['structure_acronym'], color_discrete_sequence= color1);
fig.update_layout(scene=dict(
    xaxis_title='left_right',
    yaxis_title='dorsal_ventral',
    zaxis_title='anterior_posterior'))
fig.update_layout(title='Spatial representation of neurons')
fig.show()
In [ ]:
print('Number of total stimuli for each type:')
print(stimulus_table['stimulus_name'].value_counts())
Number of total stimuli for each type:
natural_movie_three           36000
natural_movie_one             18000
static_gratings                6000
natural_scenes                 5950
gabors                         3645
drifting_gratings               630
drifting_gratings_contrast      540
flashes                         150
spontaneous                      16
Name: stimulus_name, dtype: int64
In [ ]:
print('Number of unique stimuli for each type:')
print(stimulus_table.groupby('stimulus_name')['stimulus_condition_id'].nunique().sort_values(ascending=False))
Number of unique stimuli for each type:
stimulus_name
natural_movie_three           3600
natural_movie_one              900
gabors                         243
static_gratings                121
natural_scenes                 119
drifting_gratings               41
drifting_gratings_contrast      36
flashes                          2
spontaneous                      1
Name: stimulus_condition_id, dtype: int64
In [ ]:
fig = plt.figure(figsize = (10,8))
counts = stimulus_table['stimulus_block'].value_counts().index
sns.set_palette('tab10')
sns.countplot(data = stimulus_table, y = 'stimulus_block', order = counts, hue = 'stimulus_name')
plt.ylabel('');
plt.title('Types of stimuli')
plt.savefig('/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/Types_of_stimuli.png', format = 'png')

We can see that there is a great variation in the number of unique stimuli presented in each category, hinting at different degrees of complexity of the stimuli themselves

In [ ]:
print('Number of iterations of the same stimulus:')
print(stimulus_table['stimulus_condition_id'].value_counts().value_counts())
Number of iterations of the same stimulus:
10     3600
20      900
15      319
50      144
49       40
48       29
47       14
46        7
45        4
75        2
194       1
44        1
30        1
16        1
Name: stimulus_condition_id, dtype: int64

We can see that most stimuli are repeated 10 times, with the majority of the remaining ones being shown 20, 15 or 50 times.

2. Empirical Correlation of Firing rates¶

2.0. Introduction¶

We start our study analysing the correlation between neurons.

In this first question we will consider the firing rate r(b) of a neuron b during a stimulus s as the number of spikes emitted by b during s divided by the length of s. Given two neurons b and c, the empirical correlation between the firing rates of b and c is defined in the usual way, considering each stimulus as an individual sample of r(b) and r(c). We proceed as follow:

Correlations of neurons in static Gratings:

    1. Correlation of neurons in VISam
    1. Correlation of neurons in VISam + Visal
    1. Correlation for every pair of areas
    1. Correlation in all the areas

And finally:

    1. Comparison with correlation from all the stimuli (not only static gratings)
    1. Simulating random data to check robustness

2.1. Correlation in VISam (Static gratings)¶

We start by looking at the correlation of neurons in VISam

We first load the data using the function get_rates, which has been defined to load the data stored on the repository

In [ ]:
visam = get_rates(["static_gratings"], {
  'VISam': 135,
}, dir = "../")
static_gratings

We calculate the empirical correlation and we do an heatmap

In [ ]:
plt.figure(figsize=(8,6))
c= pd.DataFrame(visam).corr()
plt.title("Correlation matrix, VISam, Static gratings")
sns.heatmap(c,cmap='coolwarm', center=0 );
No description has been provided for this image

In order to understand better the distribution of correlations, we plot the same matrix but ordering the units using the ward method for hierarchical clustering

In [ ]:
from scipy.cluster.hierarchy import linkage, leaves_list

link = linkage(c, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c.iloc[order, order]
In [ ]:
plt.figure(figsize=(8,6))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("Ordered correlation matrix, VISam, Static gratings")
plt.show()
No description has been provided for this image

We can see that:

  1. There is a group of neurons (on the left) which are almost no correlated between each others but they are negatively correlated with the other neurons
  2. There is a group of neurons (in the middle, approximately between 77 and 6 (numbers on the x axis)) which are quite strongly correlated between each others
  3. There is a group of neurons which are not correlated with anyone (93- 45)

As a way to understand the structure of correlations, we plot them in a graph-style. We consider a graph where each node is a neuron and there is an edge between two nodes if their correlation is greater (or lower if we want to consider negative correlation) than a certain threshold.

IMPORTANT DISCLAIMER: This is just a study of the correlation, there is absolutely no relationship between edges between nodes in the following graphs and real connections between neurons

We plot it using the real corrdinates of neurons

Threshold 0.5

In [ ]:
alpha = 0.5
connections = np.copy(c)
for i in range(len(connections[0])):
    connections[i][i] = 0
connections[connections >= alpha] = 1
connections[connections < alpha] = 0

G=nx.from_numpy_array(connections)
In [ ]:
units = oursession.units
In [ ]:
units_visam = units[units.structure_acronym == 'VISam']
x = units_visam['anterior_posterior_ccf_coordinate']
y = units_visam['dorsal_ventral_ccf_coordinate']
my_di = {}
for i in range(0, len(x)):
    my_di[i]=[x.iloc[i], y.iloc[i]]
In [ ]:
plt.figure(figsize = (20,10))
nx.draw_networkx(G, node_color = "red", pos = my_di)
No description has been provided for this image
In [ ]:
G.number_of_edges()
15

8 out of 15 edges are between the two lines -- > no difference related to distance

Or we can even plot it in 3d. In this case there is an edge between neurons indicate whether there is a correlation > 0.5 or < -0.25 (we highlight that we DON'T mean there is aconnection, just correlation).

In [ ]:
alpha_pos = 0.5
alpha_neg = -0.25


G = nx.Graph()
pos = my_di
w_edges = []
edge_xyz = []
for i in range(0, 135):
    G.add_node(i)
    for j in range(i, 135):
        if i != j:
            if c[i][j] > alpha_pos or c[i][j] < alpha_neg:
                w_edges.append(c[i][j])
                edge_xyz.append((pos[i], pos[j]))
                G.add_edge(i, j, weight = c[i][j])
node_xyz = np.array([pos[v] for v in sorted(G)])
edge_xyz = np.array(edge_xyz)
# Create the 3D figure
fig = plt.figure(figsize = (10,8))
ax = fig.add_subplot(111, projection="3d")

# Plot the nodes - alpha is scaled by "depth" automatically

colors = ['darkred', 'darkgreen']
##Plot the edges
for vizedge, weight in zip(edge_xyz, w_edges):
    if weight > alpha_pos:
        ax.plot(*vizedge.T, color=colors[0], linewidth = weight, label = 'Positive Correlation')
        break
for vizedge, weight in zip(edge_xyz, w_edges):
    if weight < alpha_neg:
        ax.plot(*vizedge.T, color=colors[1], linewidth = - weight, label = 'Negative Correlation')
        break
ax.legend(['Positive correlation', 'Negative Correlation'])
for vizedge, weight in zip(edge_xyz, w_edges):
    if weight > alpha_pos:
        ax.plot(*vizedge.T, color=colors[0], linewidth = weight * 3, label = 'Positive Correlation')
    elif weight < alpha_neg:
        ax.plot(*vizedge.T, color=colors[1], linewidth = - weight * 3, label = 'Negative Correlation')
ax.scatter(*node_xyz.T,  s = 50, ec = 'b');


ax.set_xlabel("left_right")
ax.set_ylabel("dorsal_ventral")
ax.set_zlabel("anterior_posterior")
ax.view_init(azim=30, elev=30)
ax.set_title('3D model, Correlation of neurons in VISam', fontsize = 20)

plt.show()
No description has been provided for this image

There is less correlation if we are in two distinct "lines" and with high distance in the dorsal ventral coordinate. (however decreasing to threshold = 0.4 this is no more the case)

2.2. Correlation VISam + VISal (Static gratings)¶

We now add another area to our analysis: VISal

In [ ]:
visamsal = get_rates(["static_gratings"], {
  'VISam': 135,
  'VISal': 89
}, dir = "../" )
static_gratings

We calculate the correlation

In [ ]:
c = visamsal.corr()
In [ ]:
plt.figure(figsize=(8,6))
plt.title("Correlation matrix, VISam + VISal, Static gratings")
sns.heatmap(c,cmap='coolwarm', center=0 )
plt.hlines([135], xmin = 0 , xmax = 224)
plt.vlines([135], ymin = 0 , ymax = 224)
plt.text(70, 70, "VISam" )
plt.text(170, 170, "VISal" )
Text(170, 170, 'VISal')
No description has been provided for this image

It's possible to see that are slightly more correlated within the same area

In [ ]:
link = linkage(c, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c.iloc[order, order]
plt.figure(figsize=(8,6))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("Ordered correlation matrix, VISam + VISal, Static gratings")
plt.show()
No description has been provided for this image

There is a clear distinction between groups of neurons

To check correlations between areas, we do a plot as before (notice that are plotted only the nodes with an edge)

Threshold = 0.5

In [ ]:
alpha = 0.5
connections = np.copy(c)
for i in range(len(connections[0])):
    connections[i][i] = 0
connections[connections >= alpha] = 1
connections[connections < alpha] = 0

G=nx.from_numpy_array(connections)

# Rimozione dei nodi isolati
#nodi_isolati = [nodo for nodo in nx.isolates(G)]
nodi_collegati = np.array([nodo for nodo in G.nodes() if G.degree(nodo) > 0])
#G.remove_nodes_from(nodi_isolati)
node_colors =  np.array(["red" for i in range(135)] + ["blue" for i in range(89)])[np.array([G.degree(nodo) > 0 for nodo in G.nodes() ])]
# Creazione degli handle per la legenda
legend_handles = [
    Patch(facecolor='red', label='VISam'),
    Patch(facecolor='blue', label='VISal')
]
# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_kamada_kawai(G, with_labels=False, node_color = node_colors, nodelist = nodi_collegati)
No description has been provided for this image

Considering only very high correlations, neurons in the same areaa are preferred, even though there are still nodes of different areas which have a strong correlation

Same graph but with real coordinates: threshold = 0.5

In [ ]:
units_visam = units[units.structure_acronym == 'VISam']
units_visal = units[units.structure_acronym == 'VISal']
x_visam = units_visam['anterior_posterior_ccf_coordinate']
y_visam = units_visam['dorsal_ventral_ccf_coordinate']
x_visal = units_visal['anterior_posterior_ccf_coordinate']
y_visal = units_visal['dorsal_ventral_ccf_coordinate']
my_di = {}
for i in range(0, len(x_visam)):
    my_di[i]=[x_visam.iloc[i], y_visam.iloc[i]]
for i in range(len(x_visam), len(x_visam) + len(x_visal)):
    my_di[i]=[x_visal.iloc[i - len(x_visam)], y_visal.iloc[i - len(x_visam)]]
In [ ]:
node_colors =  ["red" for i in range(135)] + ["blue" for i in range(89)]
In [ ]:
plt.figure(figsize = (20,10))
legend_handles = [
    Patch(facecolor='red', label='VISam'),
    Patch(facecolor='blue', label='VISal')
]
# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_networkx(G, node_color = node_colors, pos = my_di, with_labels= False)
No description has been provided for this image

"Long-range" correlation is present.

We note also there is a tail in visal (blue) (lower part) which do no have any high correlation with any neuron in visam, while the more you go "up", the more the neurons in visal are correlated to the ones in visal. Even though this could be just beacuse of randomness

2.3. Pair-wise correlation between areas (Static Gratings)¶

Now we want to study the correlation between neurons of different areas in general, for every possible pair of areas. While doing it, we include the possibility that the two areas in the analysis are the same

We load the data and define the dict map, which will be used after

In [ ]:
full_data = get_rates(["static_gratings"], areas, dir = "../")
static_gratings
In [ ]:
mappa = {}
temp = 0
for area in areas:
    mappa[area] = [i for i in range(temp, temp + areas[area])]
    temp += areas[area]
In [ ]:
c = full_data.corr()

We define a function to calculate the mean correlation between neurons of Area1 with neurons of Area2, for all possible combinations. (If area1 == area2 then we delete the diagonal)

In [ ]:
def mean_pair_corr(c, mappa):
    c_new = c.copy()
    for i in range(len(c)):
        c_new[i][i] = None
    k = len(mappa)
    result = np.zeros((k,k))
    for i,area1 in enumerate(mappa):
        for j,area2 in enumerate(mappa):
            if j<i:
                pass
            else:
                result[i][j] = np.nanmean(np.nanmean(np.abs(c_new[mappa[area1]].iloc[mappa[area2]])))
                if math.isnan(result[i][j]):
                    result[i][j] = 0
                result[j][i] = result[i][j]
    return result
In [ ]:
mean_pair = mean_pair_corr(c,mappa)

We can plot the result with a heatmap

In [ ]:
c3 = pd.DataFrame(mean_pair, index=list(areas.keys()), columns=list(areas.keys()))
link = linkage(c3, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c3.iloc[order, order]
plt.figure(figsize=(8,6))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("ordered Mean of absolute value correlation areas, Static gratings")
plt.show()
No description has been provided for this image

If we exclude TH, it is possible to see that there is a stronger correlation between neurons on the left, that are from CA2 to SUB. This represents a macro group of areas that we will find also later in the analysis. The second group of areas is on the right, and it is composed mainly by the areas of the type "VIS". However the distinction is not so strong

For each area, we can print in an ordered way the areas to which is more correlated

In [ ]:
for area in c3:
    print(area) 
    print(sorted(c3.loc[area].items(), reverse = True, key = lambda x: x[1]))
APN
[('APN', 0.12312738235037245), ('TH', 0.11419636558713973), ('MB', 0.110588140810455), ('NOT', 0.09990303931783591), ('CA2', 0.0916253019265115), ('SUB', 0.09021517524792873), ('ProS', 0.08835269730453453), ('LGv', 0.0837080314288179), ('DG', 0.08249329497818586), ('CA1', 0.07941135079717937), ('CA3', 0.07484984612387481), ('LP', 0.07397366356828472), ('VISam', 0.06929214244012581), ('VISrl', 0.0689161107962807), ('VISl', 0.06787723099316009), ('VISal', 0.0666781981477204), ('Eth', 0.059148395667809765), ('VISp', 0.0582369331240436)]
CA1
[('CA1', 0.09455999381562175), ('CA2', 0.08971390766323684), ('CA3', 0.08503923645907448), ('SUB', 0.08478915261569638), ('ProS', 0.08182610660019125), ('DG', 0.08152907988375972), ('APN', 0.07941135079717937), ('NOT', 0.06943535293489968), ('MB', 0.06695748015795028), ('LGv', 0.06366007490636454), ('TH', 0.060261990607070395), ('VISrl', 0.05743145743228213), ('VISam', 0.05710112509751047), ('LP', 0.05476977339807684), ('VISl', 0.0524322466299936), ('VISal', 0.051571651949880955), ('VISp', 0.04447694580544469), ('Eth', 0.036387534529382475)]
CA2
[('CA2', 0.20561101996301057), ('ProS', 0.1323695184701155), ('TH', 0.09273490623803435), ('APN', 0.0916253019265115), ('CA1', 0.08971390766323684), ('DG', 0.0895544055235609), ('MB', 0.08342086181034475), ('CA3', 0.08322747050432183), ('SUB', 0.06868056382373912), ('NOT', 0.06281772628896598), ('VISrl', 0.058023426988845245), ('VISam', 0.0550821699778706), ('LP', 0.05163057513379939), ('VISal', 0.049343812487993806), ('LGv', 0.04829941948161433), ('Eth', 0.04607934560274251), ('VISp', 0.04563597453607007), ('VISl', 0.04449140876675233)]
CA3
[('CA3', 0.08901693451153883), ('CA1', 0.08503923645907448), ('CA2', 0.08322747050432183), ('SUB', 0.07516681496209354), ('APN', 0.07484984612387481), ('ProS', 0.07214819931033241), ('DG', 0.0707390695432312), ('NOT', 0.0647017654004669), ('MB', 0.06107182419237277), ('LGv', 0.06028012261345254), ('TH', 0.05562981509231202), ('VISrl', 0.054387901283727176), ('VISam', 0.054255055724614525), ('LP', 0.052825685972040105), ('VISl', 0.048400670228927765), ('VISal', 0.047513147712161224), ('Eth', 0.044132658061415864), ('VISp', 0.040538593357979005)]
DG
[('ProS', 0.10197359031404475), ('DG', 0.09915486819237482), ('CA2', 0.0895544055235609), ('MB', 0.08442278059026626), ('APN', 0.08249329497818586), ('CA1', 0.08152907988375972), ('NOT', 0.07818036935217514), ('SUB', 0.07765112282992011), ('CA3', 0.0707390695432312), ('LGv', 0.06959817463281413), ('LP', 0.059007277450778856), ('VISl', 0.057062246032824095), ('VISal', 0.05364378283433595), ('VISam', 0.05343479962675175), ('TH', 0.053184165787395314), ('VISrl', 0.05227008180289224), ('VISp', 0.048260388934753526), ('Eth', 0.02935351667126987)]
Eth
[('APN', 0.059148395667809765), ('CA2', 0.04607934560274251), ('ProS', 0.04492473718942198), ('CA3', 0.044132658061415864), ('TH', 0.040284155126311554), ('CA1', 0.036387534529382475), ('SUB', 0.03413440363286329), ('VISam', 0.03232232810232816), ('DG', 0.02935351667126987), ('VISrl', 0.028423237802596018), ('NOT', 0.027023037789695573), ('LGv', 0.025773403261783952), ('VISp', 0.025281516684373697), ('LP', 0.025068728890092172), ('VISl', 0.024619177877270366), ('VISal', 0.023384932893865), ('MB', 0.005139553051917564), ('Eth', 0.0)]
LGv
[('LGv', 0.12846594074884535), ('MB', 0.10435864124859394), ('NOT', 0.10343315229354078), ('APN', 0.0837080314288179), ('VISp', 0.07872472140640324), ('VISal', 0.0724940307006034), ('SUB', 0.07063417878899635), ('DG', 0.06959817463281413), ('VISl', 0.06956310306360386), ('LP', 0.06752348936525888), ('ProS', 0.06650832565746582), ('CA1', 0.06366007490636454), ('VISrl', 0.06267157185066369), ('VISam', 0.06251509238898431), ('CA3', 0.06028012261345254), ('CA2', 0.04829941948161433), ('TH', 0.046292718801978855), ('Eth', 0.025773403261783952)]
LP
[('MB', 0.08424056824860766), ('NOT', 0.08391890938360556), ('APN', 0.07397366356828472), ('LGv', 0.06752348936525888), ('LP', 0.06421644555871042), ('SUB', 0.06174468431882669), ('ProS', 0.061361642695262966), ('DG', 0.059007277450778856), ('VISal', 0.058307789278875234), ('VISl', 0.055923770663281684), ('CA1', 0.05476977339807684), ('VISrl', 0.05357348795595679), ('VISam', 0.05347872185735793), ('CA3', 0.052825685972040105), ('CA2', 0.05163057513379939), ('VISp', 0.05111286675525842), ('TH', 0.04716566636780038), ('Eth', 0.025068728890092172)]
MB
[('NOT', 0.16774745881700515), ('APN', 0.110588140810455), ('VISl', 0.10589374061772659), ('LGv', 0.10435864124859394), ('VISam', 0.09884997781362934), ('VISp', 0.09410729205314883), ('ProS', 0.09072879643612462), ('VISal', 0.08923250329536997), ('DG', 0.08442278059026626), ('LP', 0.08424056824860766), ('CA2', 0.08342086181034475), ('SUB', 0.08340260980648741), ('VISrl', 0.07181290389095207), ('CA1', 0.06695748015795028), ('CA3', 0.06107182419237277), ('TH', 0.052744524697909564), ('Eth', 0.005139553051917564), ('MB', 0.0)]
NOT
[('MB', 0.16774745881700515), ('NOT', 0.14884011922377574), ('LGv', 0.10343315229354078), ('APN', 0.09990303931783591), ('VISal', 0.08989172741733721), ('ProS', 0.08477779792500018), ('LP', 0.08391890938360556), ('VISl', 0.08089062036231921), ('VISp', 0.0790859533584018), ('SUB', 0.0788819642156339), ('DG', 0.07818036935217514), ('VISam', 0.0774510123930317), ('VISrl', 0.07585926151762301), ('CA1', 0.06943535293489968), ('TH', 0.06526496879747626), ('CA3', 0.0647017654004669), ('CA2', 0.06281772628896598), ('Eth', 0.027023037789695573)]
ProS
[('CA2', 0.1323695184701155), ('DG', 0.10197359031404475), ('ProS', 0.09546224793972732), ('MB', 0.09072879643612462), ('APN', 0.08835269730453453), ('NOT', 0.08477779792500018), ('CA1', 0.08182610660019125), ('SUB', 0.08068734680591692), ('CA3', 0.07214819931033241), ('LGv', 0.06650832565746582), ('TH', 0.062428836234069966), ('LP', 0.061361642695262966), ('VISrl', 0.05834391861374082), ('VISam', 0.05730810710454194), ('VISl', 0.057017527158679526), ('VISal', 0.05493106953833446), ('VISp', 0.04873577487489248), ('Eth', 0.04492473718942198)]
SUB
[('SUB', 0.09124163203451092), ('APN', 0.09021517524792873), ('CA1', 0.08478915261569638), ('MB', 0.08340260980648741), ('ProS', 0.08068734680591692), ('NOT', 0.0788819642156339), ('DG', 0.07765112282992011), ('TH', 0.07541572622000985), ('CA3', 0.07516681496209354), ('LGv', 0.07063417878899635), ('CA2', 0.06868056382373912), ('VISam', 0.062281201429440686), ('LP', 0.06174468431882669), ('VISrl', 0.06109152649128866), ('VISl', 0.05968763977809013), ('VISal', 0.057438484577878784), ('VISp', 0.04855130890804684), ('Eth', 0.03413440363286329)]
TH
[('TH', 0.26792659516327716), ('APN', 0.11419636558713973), ('CA2', 0.09273490623803435), ('SUB', 0.07541572622000985), ('NOT', 0.06526496879747626), ('VISam', 0.06506735698073443), ('VISrl', 0.06419189958564947), ('ProS', 0.062428836234069966), ('CA1', 0.060261990607070395), ('CA3', 0.05562981509231202), ('VISl', 0.05418065531032463), ('VISal', 0.05413859362842249), ('DG', 0.053184165787395314), ('MB', 0.052744524697909564), ('LP', 0.04716566636780038), ('LGv', 0.046292718801978855), ('VISp', 0.04165479405564982), ('Eth', 0.040284155126311554)]
VISal
[('NOT', 0.08989172741733721), ('MB', 0.08923250329536997), ('VISal', 0.07825975111318538), ('LGv', 0.0724940307006034), ('VISl', 0.07112060449171467), ('VISp', 0.06839174011755207), ('VISam', 0.06717159480868491), ('APN', 0.0666781981477204), ('VISrl', 0.06616088599189691), ('LP', 0.058307789278875234), ('SUB', 0.057438484577878784), ('ProS', 0.05493106953833446), ('TH', 0.05413859362842249), ('DG', 0.05364378283433595), ('CA1', 0.051571651949880955), ('CA2', 0.049343812487993806), ('CA3', 0.047513147712161224), ('Eth', 0.023384932893865)]
VISam
[('MB', 0.09884997781362934), ('NOT', 0.0774510123930317), ('VISam', 0.0719858674237911), ('APN', 0.06929214244012581), ('VISal', 0.06717159480868491), ('VISl', 0.06594549611867936), ('TH', 0.06506735698073443), ('LGv', 0.06251509238898431), ('SUB', 0.062281201429440686), ('VISrl', 0.06224861143943695), ('VISp', 0.05741898232869215), ('ProS', 0.05730810710454194), ('CA1', 0.05710112509751047), ('CA2', 0.0550821699778706), ('CA3', 0.054255055724614525), ('LP', 0.05347872185735793), ('DG', 0.05343479962675175), ('Eth', 0.03232232810232816)]
VISl
[('MB', 0.10589374061772659), ('NOT', 0.08089062036231921), ('VISl', 0.07198567305323106), ('VISal', 0.07112060449171467), ('LGv', 0.06956310306360386), ('APN', 0.06787723099316009), ('VISam', 0.06594549611867936), ('VISrl', 0.06350865228706969), ('VISp', 0.06328733532613531), ('SUB', 0.05968763977809013), ('DG', 0.057062246032824095), ('ProS', 0.057017527158679526), ('LP', 0.055923770663281684), ('TH', 0.05418065531032463), ('CA1', 0.0524322466299936), ('CA3', 0.048400670228927765), ('CA2', 0.04449140876675233), ('Eth', 0.024619177877270366)]
VISp
[('MB', 0.09410729205314883), ('VISp', 0.0793190235583615), ('NOT', 0.0790859533584018), ('LGv', 0.07872472140640324), ('VISal', 0.06839174011755207), ('VISl', 0.06328733532613531), ('APN', 0.0582369331240436), ('VISam', 0.05741898232869215), ('VISrl', 0.0538300251793051), ('LP', 0.05111286675525842), ('ProS', 0.04873577487489248), ('SUB', 0.04855130890804684), ('DG', 0.048260388934753526), ('CA2', 0.04563597453607007), ('CA1', 0.04447694580544469), ('TH', 0.04165479405564982), ('CA3', 0.040538593357979005), ('Eth', 0.025281516684373697)]
VISrl
[('NOT', 0.07585926151762301), ('MB', 0.07181290389095207), ('VISrl', 0.07079114152647305), ('APN', 0.0689161107962807), ('VISal', 0.06616088599189691), ('TH', 0.06419189958564947), ('VISl', 0.06350865228706969), ('LGv', 0.06267157185066369), ('VISam', 0.06224861143943695), ('SUB', 0.06109152649128866), ('ProS', 0.05834391861374082), ('CA2', 0.058023426988845245), ('CA1', 0.05743145743228213), ('CA3', 0.054387901283727176), ('VISp', 0.0538300251793051), ('LP', 0.05357348795595679), ('DG', 0.05227008180289224), ('Eth', 0.028423237802596018)]

We can now repeat the same analysis with the maximum correlation, that is, the maximum correlation there is between a neuron of the first area and a neuron of the second area.

Our goal is to see whether there are neurons of different areas which are strongly correlated.

In [ ]:
def max_pair_corr(c, mappa):
    c_new = c.copy()
    for i in range(len(c)):
        c_new[i][i] = 0
    k = len(mappa)
    result = np.zeros((k,k))
    for i,area1 in enumerate(mappa):
        for j,area2 in enumerate(mappa):
            if j<i:
                pass
            else:
                result[i][j] = np.max(np.max((c_new[mappa[area1]].iloc[mappa[area2]])))
                result[j][i] = result[i][j]
    return result
In [ ]:
max_pair = max_pair_corr(c,mappa)
In [ ]:
max_pair.shape
(18, 18)
In [ ]:
c1 = pd.DataFrame(max_pair, index=list(areas.keys()), columns=list(areas.keys()))
link = linkage(c1, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c1.iloc[order, order]
plt.figure(figsize=(8,6))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("ordered Max correlation areas, Static gratings")
plt.show()
No description has been provided for this image

Indeed there are, also with very high correlations and between different areas. For instance, there is a neuron in CA1 and a neuron in Virsl that are highly correlated

We can do the same also with the min (for negative correlation)

In [ ]:
def min_pair_corr(c, mappa):
    c_new = c.copy()
    for i in range(len(c)):
        c_new[i][i] = 0
    k = len(mappa)
    result = np.zeros((k,k))
    for i,area1 in enumerate(mappa):
        for j,area2 in enumerate(mappa):
            if j<i:
                pass
            else:
                result[i][j] = np.min(np.min((c_new[mappa[area1]].iloc[mappa[area2]])))
                result[j][i] = result[i][j]
    return result
In [ ]:
min_pair = min_pair_corr(c,mappa)
In [ ]:
c2 = pd.DataFrame(min_pair, index=list(areas.keys()), columns=list(areas.keys()))
link = linkage(c2, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c2.iloc[order, order]
plt.figure(figsize=(8,6))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("ordered Min correlation areas, Static gratings")
plt.show()
No description has been provided for this image

VISal , LP and DG are the ones with some units that are the most negative correlated with other areas

2.4. Correlation in all areas (Static Gratings)¶

We now consider all areas together

In [ ]:
full_data = get_rates(["static_gratings"], areas, dir = "../")
static_gratings
In [ ]:
c = full_data.corr()

We can do the heatmap ordered by similarity

In [ ]:
link = linkage(c, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c.iloc[order, order]
plt.figure(figsize=(12,9))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("Ordered correlation matrix, All_areas, Static gratings")
plt.show()
No description has been provided for this image

There is a group of neurons which are highly correlated between each others, while another group of neurons which are negatively correlated with the first one, and not correlated between each others. However it is also possible to see some subgroups of highly correlated neurons that are negatively correlated with some other highly correlated ones.

We proceed by plotting some image with a graph style

In [ ]:
colors = {0 : "#544148",
1: "#8e86a0",
2 : "#25680d",
3: "#bd83f7",
4 : "#cecbd3",
5: "#afd19c",
6: "green",
7: "#704431",
8: "purple",
9: "#932039",
10: "yellow",
11: "#e5967e",
12: "pink",
13:"blue",
14: "red",
15: "#0dbbe2",
16: "#937aa5",
17: "#5ee6ff"}
In [ ]:
areas2 = ['APN', 'CA1', 'CA2', 'CA3', 'DG', 'Eth', 'LGv', 'LP', 'MB', 'NOT',
       'ProS', 'SUB', 'TH', 'VISal', 'VISam', 'VISl', 'VISp', 'VISrl']
In [ ]:
classes = [areas2.index(area) for area in areas for j in range(areas[area])]

Threshold 0.6

In [ ]:
alpha = 0.6
connections = np.copy(c)
for i in range(len(connections[0])):
    connections[i][i] = 0
connections[connections >= alpha] = 1
connections[connections < alpha] = 0

G=nx.from_numpy_array(connections)

# Rimozione dei nodi isolati
#nodi_isolati = [nodo for nodo in nx.isolates(G)]
nodi_collegati = np.array([nodo for nodo in G.nodes() if G.degree(nodo) > 0])
#G.remove_nodes_from(nodi_isolati)
node_colors =  np.array([colors[i] for i in classes])[np.array([G.degree(nodo) > 0 for nodo in G.nodes() ])]
plt.figure(figsize=(11,8))
# Creazione degli handle per la legenda
legend_handles = [Patch(color=color, label= areas2[label]) for label, color in colors.items()]

# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_spring(G, with_labels=False, node_color = node_colors, nodelist = nodi_collegati)
No description has been provided for this image

Since there are many connected components, let's plot only the first one

In [ ]:
largest_component = max(nx.connected_components(G), key=len)
node_colors = [colors[classes[node]] for node in largest_component]
subgraph = G.subgraph(largest_component)

plt.figure(figsize=(10, 10))

# Creazione degli handle per la legenda
legend_handles = [Patch(color=color, label= areas2[label]) for label, color in colors.items()]

# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_kamada_kawai(subgraph, node_color=node_colors)
plt.show()
No description has been provided for this image

Let's try with negative correlation

In [ ]:
alpha = -0.4
connections = np.copy(c)
for i in range(len(connections[0])):
    connections[i][i] = 0
connections[connections >= alpha] = 0
connections[connections < alpha] = 1


G=nx.from_numpy_array(connections)

# Rimozione dei nodi isolati
#nodi_isolati = [nodo for nodo in nx.isolates(G)]
nodi_collegati = np.array([nodo for nodo in G.nodes() if G.degree(nodo) > 0])
#G.remove_nodes_from(nodi_isolati)
node_colors =  np.array([colors[i] for i in classes])[np.array([G.degree(nodo) > 0 for nodo in G.nodes() ])]
plt.figure(figsize=(11,8))
nx.draw_kamada_kawai(G, with_labels=False, node_color = node_colors, nodelist = nodi_collegati)
No description has been provided for this image

The structure is completely different now: there are few nodes with very high degree

(We have to consider the fact that here the threshold is smaller in absolute value)

In [ ]:
alpha = -0.5
connections = np.copy(c)
for i in range(len(connections[0])):
    connections[i][i] = 0
connections[connections >= alpha] = 0
connections[connections < alpha] = 1


G=nx.from_numpy_array(connections)

# Rimozione dei nodi isolati
#nodi_isolati = [nodo for nodo in nx.isolates(G)]
nodi_collegati = np.array([nodo for nodo in G.nodes() if G.degree(nodo) > 0])
#G.remove_nodes_from(nodi_isolati)
node_colors =  np.array([colors[i] for i in classes])[np.array([G.degree(nodo) > 0 for nodo in G.nodes() ])]
plt.figure(figsize=(11,8))
# Creazione degli handle per la legenda
legend_handles = [Patch(color=color, label= areas2[label]) for label, color in colors.items()]

# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_kamada_kawai(G, with_labels=False, node_color = node_colors, nodelist = nodi_collegati)
No description has been provided for this image

About differences between positive and negative correlations:

  1. When we consider the highest-end positive correlations, we have more nodes with high correlations but those have lower degree, and they are in a sort of chain shape
  2. When we consider the highest-end negative correlations, we have fewer nodes with high negative correlations but those have a very high degree, and they are in a sort of star shape

This could be related only by the fact that we have more neurons with high and similar positive correlations

2.5. Correlation in all areas (all the stimuli)¶

We load the data

In [ ]:
names = ["spontaneous",
"gabors",
"flashes",
"drifting_gratings",
"natural_movie_three",
"natural_movie_one",
"static_gratings",
"natural_scenes",
"drifting_gratings_contrast"]
In [ ]:
full_full_data = get_rates(names, areas, dir = "../")
spontaneous
gabors
flashes
drifting_gratings
natural_movie_three
natural_movie_one
static_gratings
natural_scenes
drifting_gratings_contrast

and compute the correlation as usual

In [ ]:
c = full_full_data.corr()
In [ ]:
c.shape
(825, 825)
In [ ]:
link = linkage(c, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c.iloc[order, order]
plt.figure(figsize=(12,9))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("Ordered correlation matrix, All_areas, All stimuli")
plt.show()
No description has been provided for this image

The correlation drops dramatically. However, there is still some structure.

Let's see the min and the max

In [ ]:
for i in range(825):
    c[i][i] = 0
In [ ]:
np.max(np.max(c))
c:\Users\lgand\anaconda3\envs\envallen\Lib\site-packages\numpy\core\fromnumeric.py:84: FutureWarning: In a future version, DataFrame.max(axis=None) will return a scalar max over the entire DataFrame. To retain the old behavior, use 'frame.max(axis=0)' or just 'frame.max()'
  return reduction(axis=axis, out=out, **passkwargs)
0.8458585764461427

We still have some neurons with very high correlation!!!

In [ ]:
np.min(np.min(c))
c:\Users\lgand\anaconda3\envs\envallen\Lib\site-packages\numpy\core\fromnumeric.py:84: FutureWarning: In a future version, DataFrame.min(axis=None) will return a scalar min over the entire DataFrame. To retain the old behavior, use 'frame.min(axis=0)' or just 'frame.min()'
  return reduction(axis=axis, out=out, **passkwargs)
-0.3817121948958405

And also very high negative correlation!

In [ ]:
areas2 = ['APN', 'CA1', 'CA2', 'CA3', 'DG', 'Eth', 'LGv', 'LP', 'MB', 'NOT',
       'ProS', 'SUB', 'TH', 'VISal', 'VISam', 'VISl', 'VISp', 'VISrl']
k = len(areas2)
In [ ]:
classes = [areas2.index(area) for area in areas for j in range(areas[area])]

Just some plots

In [ ]:
alpha = 0.40
connections = np.copy(c)
for i in range(len(connections[0])):
    connections[i][i] = 0
connections[connections >= alpha] = 1
connections[connections < alpha] = 0

G=nx.from_numpy_array(connections)

# Rimozione dei nodi isolati
#nodi_isolati = [nodo for nodo in nx.isolates(G)]
nodi_collegati = np.array([nodo for nodo in G.nodes() if G.degree(nodo) > 0])
#G.remove_nodes_from(nodi_isolati)
node_colors =  np.array([colors[i] for i in classes])[np.array([G.degree(nodo) > 0 for nodo in G.nodes() ])]
plt.figure(figsize=(11,8))
# Creazione degli handle per la legenda
legend_handles = [Patch(color=color, label= areas2[label]) for label, color in colors.items()]

# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_spring(G, with_labels=False, node_color = node_colors, nodelist = nodi_collegati)
No description has been provided for this image

Lasgest connected component

In [ ]:
largest_component = max(nx.connected_components(G), key=len)
node_colors = [colors[classes[node]] for node in largest_component]
subgraph = G.subgraph(largest_component)

plt.figure(figsize=(10, 10))

# Creazione degli handle per la legenda
legend_handles = [Patch(color=color, label= areas2[label]) for label, color in colors.items()]

# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_kamada_kawai(subgraph, node_color=node_colors)
plt.show()
No description has been provided for this image

This should be exactly the same as the one seen before!

And in general this has a meaning also for predictions: it should be simpler to predict the activity of a neuron in CA1 given the rates of the other neurons in CA1

Indeed this will be verified in the prediction part

Negative correlation

In [ ]:
alpha = -0.2
connections = np.copy(c)
for i in range(len(connections[0])):
    connections[i][i] = 0
connections[connections >= alpha] = 0
connections[connections < alpha] = 1


G=nx.from_numpy_array(connections)

# Rimozione dei nodi isolati
#nodi_isolati = [nodo for nodo in nx.isolates(G)]
nodi_collegati = np.array([nodo for nodo in G.nodes() if G.degree(nodo) > 0])
#G.remove_nodes_from(nodi_isolati)
node_colors =  np.array([colors[i] for i in classes])[np.array([G.degree(nodo) > 0 for nodo in G.nodes() ])]
plt.figure(figsize=(11,8))
# Creazione degli handle per la legenda
legend_handles = [Patch(color=color, label= areas2[label]) for label, color in colors.items()]

# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_spring(G, with_labels=False, node_color = node_colors, nodelist = nodi_collegati)
No description has been provided for this image

Similar structure, and nodes, as before

In [ ]:
nodi_collegati
array([  3,  11,  13,  24,  61,  65,  77, 106, 112, 115, 116, 117, 142,
       144, 157, 201, 207, 209, 224, 225, 301, 307, 318, 324, 327, 369,
       379, 384, 397, 401, 438, 443, 478, 482, 503, 544, 550, 557, 611,
       612, 622, 630, 691, 705, 723, 748, 751, 780, 795])
In [ ]:
G.edges
EdgeView([(3, 397), (11, 397), (13, 142), (13, 224), (13, 225), (13, 438), (13, 622), (13, 748), (24, 397), (61, 142), (61, 224), (61, 225), (61, 438), (61, 622), (61, 748), (65, 225), (77, 224), (106, 379), (112, 224), (112, 748), (115, 397), (116, 397), (117, 397), (142, 379), (142, 550), (144, 397), (157, 224), (157, 225), (201, 397), (207, 397), (209, 379), (224, 301), (224, 443), (224, 550), (224, 691), (224, 705), (224, 795), (225, 307), (225, 544), (318, 397), (324, 379), (327, 369), (369, 397), (379, 780), (384, 397), (397, 401), (397, 478), (397, 482), (397, 503), (397, 557), (397, 611), (397, 612), (397, 630), (397, 723), (397, 751), (550, 748)])

Conclusions: passing from static grating to all the stimuli the overall amount of correlation decreased, however there is still some structure in the data. In particular, there are still pairs of neurons with very high positive correlation, and pairs with negative correlations comparable as with considering only static gratings. It seems that some (actually, the majority of) neurons were correlated only imposing the static gratings stimuli, and using all the types of stimuli they were removed. However, as seen for the largest connected component with high positive correlation, its structure remained very similar, which was not totally trivial, since we are considering a lot more, and different, stimuli. This could mean that the correlation obtained considering only a subset of types of stimuli could be a good approximation for a larger set of types of stimuli.

2.6. Brief analysis on random data¶

This part is meant to be just a test to check that our measure of empirical correlation is not affected only by randomness. We simulate 6000 independent samples (the number of stimuli in gratings), and we see what is the maximal correlation achieved.

Let's see first the distribution of the firing rates. This analysis is done on static gratings.

We often get a shape like this one:

In [ ]:
full_data.iloc[:,0].hist(bins = 20)
<Axes: >
No description has been provided for this image

Even though sometimes we get a bell-shaped distribution:

In [ ]:
full_data.iloc[:,302].hist(bins = 20)
<Axes: >
No description has been provided for this image
In [ ]:
areas2 = ['APN', 'CA1', 'CA2', 'CA3', 'DG', 'Eth', 'LGv', 'LP', 'MB', 'NOT',
       'ProS', 'SUB', 'TH', 'VISal', 'VISam', 'VISl', 'VISp', 'VISrl']

Plot of the distribution of the first neuron for each area:

In [ ]:
temp = 0
img, axs = plt.subplots(3, 6, figsize=(20, 14))
for x in range(6):
    for y in range(3):
        axs[y][x].hist(full_data.iloc[:,temp], bins=20)
        area = x + 6*y
        temp += areas[areas2[area]]
        axs[y][x].set_title(areas2[area])
        axs[y][x].set_xlabel(f"Firing rate") 

    
    
No description has been provided for this image

Now, we decide to approximate each neuron's activity with a Poisson distribution. We sample random data for each neuron with a poisson distribution with mean equal to the empirical mean of the neuron in static gratings.

Since there are 6000 stimuli in static gratings, we sample 6000 rates for each neuron

In [ ]:
from scipy.stats import poisson
In [ ]:
def generate_poisson_random_data(n_units, n_samples, mean):
    result = np.zeros((n_units, n_samples))
    for i in range(n_units):
        result[i] = poisson.rvs(mean[i], size = n_samples)
    return pd.DataFrame(result.T)
    
In [ ]:
mean = np.array(full_data.mean(axis = 0))
In [ ]:
random_full_data = generate_poisson_random_data(825, 6000, mean)
In [ ]:
c = random_full_data.corr()
In [ ]:
for i in range(825):
    c.iloc[i,i] = 0 
In [ ]:
c.max().max()
0.05588429522922792
In [ ]:
c.min().min()
-0.058898055518087036

From a random distribution, with 6000 samples, the maximum absolute value of correlation is 0.059, which confirms that the correlation studied in the past sections does not come from randomness.

2.bis Studying correlation between spike trains¶

Motivated by the presence of a large number of spikes inside the spike trains, which carry a lot of information about their distribution in time, we decided to test and analyze another correlation metric, the Spike Time Tiling Coeficient (STTC).

Simply put, computing the STTC of two spike trains means determining the degree of similarity of the two on a spike to spike basis, with a value ranging from -1 to 1, as in the classical correlation.

More specifically, we can see in the following image how the coefficient is computed:

alt text

In [ ]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
import pickle

from allensdk.brain_observatory.ecephys.ecephys_project_cache import EcephysProjectCache

data_dir = "/Users/gabri/Library/Mobile Documents/com~apple~CloudDocs/Python/Neuroscience/allendata"
manifest_path = os.path.join(data_dir, "manifest.json")
cache = EcephysProjectCache.from_warehouse(manifest=manifest_path)

session_id = 798911424 
oursession = cache.get_session_data(session_id, timeout=3000)

stimulus_table = oursession.get_stimulus_table()
spike_times = oursession.spike_times
/Users/gabri/opt/anaconda3/envs/envallen/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
/Users/gabri/opt/anaconda3/envs/envallen/lib/python3.11/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded.
  return func(args[0], **pargs)
/Users/gabri/opt/anaconda3/envs/envallen/lib/python3.11/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.2 because version 2.6.0-alpha is already loaded.
  return func(args[0], **pargs)

Functions to implement the method¶

In [ ]:
# Computing T and P for the STTC formula

def get_T(spike_train, dt, T):
    current = None
    previous = 0
    area = 0
    for spike in spike_train:
        current = spike
        area += min(2*dt, current - previous)
        if T-spike < dt:
            area -= dt - (T - spike)
        previous = current

    return area/T


def get_P(spike_train_1, spike_train_2, dt):
    if len(spike_train_1) > 0:
        count = 0
        for spike_1 in spike_train_1:
            for i, spike_2 in enumerate(spike_train_2):

                if abs(spike_1-spike_2) <= dt:
                    count += 1
                    spike_train_2 = spike_train_2[i:]
                    break

                elif spike_2 - spike_1 > dt:
                    spike_train_2 = spike_train_2[i:]
                    break
        
        return count / len(spike_train_1)
    
    else:
        return 1
In [ ]:
# Getting spikes in a specific interval (stimulus block)

def get_multiple_data(stimulus_block):
    
    start = stimulus_table[stimulus_table['stimulus_block'] == stimulus_block]['start_time'].min()
    stop = stimulus_table[stimulus_table['stimulus_block'] == stimulus_block]['stop_time'].max()

    spike_times_list = oursession.spike_times.copy()
    
    for unit_id in spike_times_list.keys():
        spike_times_list[unit_id] = spike_times_list[unit_id][(spike_times_list[unit_id] >= start) & (spike_times_list[unit_id] <= stop)] - start

    return spike_times_list
In [ ]:
# Computing STTC on the whole spike trains of specified areas

def STTC_all_area(areas, dt = 0.1):
    spike_trains = oursession.spike_times.copy()

    unit_ids = oursession.units[oursession.units['ecephys_structure_acronym'].isin(areas)].index
    spike_trains = {unit_id: spike_trains[unit_id] for unit_id in unit_ids}

    start = 0
    stop = 10441.665860980896           # max([max(spike_times) for spike_times in oursession.spike_times.values()])

    T = stop - start

    spike_trains = {unit_id: spike_times - start for unit_id, spike_times in spike_trains.items()}

    STTC = np.zeros((len(spike_trains), len(spike_trains)))

    index = unit_ids

    T_array = dict()

    for a in index:
        T_array[a] = get_T(spike_trains[a], dt, T)

    for i, a in enumerate(index):
        print('unit', i+1, 'out of', len(index))
        for j, b in enumerate(index):
            if j >= i:
                Pa = get_P(spike_trains[a], spike_trains[b], dt)
                Pb = get_P(spike_trains[b], spike_trains[a], dt)

                STTC[i][j] = 1/2 * ((Pa-T_array[b])/(1-Pa*T_array[b]) + (Pb-T_array[a])/(1-Pb*T_array[a]))
        print('Done')
    STTC = STTC + STTC.T - np.diag(np.diag(STTC))

    return STTC
In [ ]:
# Computing STTC on spikes trains restricted to an interval corresponding to a single stimulus block of specified areas

def STTC_block_area(stimulus_block, areas, dt = 0.1):
    spike_trains = get_multiple_data(stimulus_block)
    
    unit_ids = oursession.units[oursession.units['ecephys_structure_acronym'].isin(areas)].index
    spike_trains = {unit_id: spike_trains[unit_id] for unit_id in unit_ids}

    start = stimulus_table[stimulus_table['stimulus_block'] == stimulus_block]['start_time'].min()
    stop = stimulus_table[stimulus_table['stimulus_block'] == stimulus_block]['stop_time'].max()

    T = stop - start

    STTC = np.zeros((len(spike_trains), len(spike_trains)))

    index = unit_ids

    T_array = dict()

    for a in index:
        T_array[a] = get_T(spike_trains[a], dt, T)

    for i, a in enumerate(index):
        print('unit', i+1, 'out of', len(index))
        for j, b in enumerate(index):
            if j >= i:
                Pa = get_P(spike_trains[a], spike_trains[b], dt)
                Pb = get_P(spike_trains[b], spike_trains[a], dt)

                STTC[i][j] = 1/2 * ((Pa-T_array[b])/(1-Pa*T_array[b]) + (Pb-T_array[a])/(1-Pb*T_array[a]))
        print('Done')
    STTC = STTC + STTC.T - np.diag(np.diag(STTC))

    return STTC
In [ ]:
# STTC of all areas during a single stimulus block

def STTC_multiple(stimulus_block, dt = 0.1):
    spike_trains = get_multiple_data(stimulus_block)

    start = stimulus_table[stimulus_table['stimulus_block'] == stimulus_block]['start_time'].min()
    stop = stimulus_table[stimulus_table['stimulus_block'] == stimulus_block]['stop_time'].max()

    T = stop - start

    STTC = np.zeros((len(spike_trains), len(spike_trains)))

    index = oursession.units.index

    T_array = dict()

    for a in index:
        T_array[a] = get_T(spike_trains[a], dt, T)

    for i, a in enumerate(index):
        print('unit', i+1, 'out of', len(index))
        for j, b in enumerate(index):
            if j >= i:
                Pa = get_P(spike_trains[a], spike_trains[b], dt)
                Pb = get_P(spike_trains[b], spike_trains[a], dt)

                STTC[i][j] = 1/2 * ((Pa-T_array[b])/(1-Pa*T_array[b]) + (Pb-T_array[a])/(1-Pb*T_array[a]))
        print('Done')
    STTC = STTC + STTC.T - np.diag(np.diag(STTC))

    return STTC
In [ ]:
# Function that automatically saves/loads and plots STTC of the whole spike train for given areas

def compute_save_plot_all(areas, plot = True, ordered = False):
    if os.path.exists(f'/Users/gabri/Desktop/STTC/{areas}_all.pkl'):
        matrix = pickle.load(open(f'/Users/gabri/Desktop/STTC/{areas}_all.pkl', 'rb'))
    else:
        matrix = STTC_all_area(areas)
        pickle.dump(matrix, open(f'/Users/gabri/Desktop/STTC/{areas}_all.pkl', 'wb'))
    
    if plot:

        if ordered:
            sns.clustermap(matrix, cmap='bwr', center=0, figsize=(10, 8), dendrogram_ratio=(0.1,0.1), xticklabels=False, yticklabels=False, method='ward')
        else:
            plt.subplots(figsize=(10, 8))
            sns.heatmap(matrix, cmap='bwr', center=0)
        plt.title(f'{areas} all stimuli')
        plt.show()

    return matrix
In [ ]:
# Function that automatically saves/loads and plots STTC of the spikes during a stimulus block for given areas

def compute_save_plot_block_area(block, areas, plot = True, ordered = False):
    
    if os.path.exists(f'/Users/gabri/Desktop/STTC/block_{block}_{areas}.pkl'):
        matrix = pickle.load(open(f'/Users/gabri/Desktop/STTC/block_{block}_{areas}.pkl', 'rb'))
    else:
        matrix = STTC_block_area(block, areas)
        pickle.dump(matrix, open(f'/Users/gabri/Desktop/STTC/block_{block}_{areas}.pkl', 'wb'))
    
    if plot:

        if ordered:
            sns.clustermap(matrix, cmap='bwr', center=0, figsize=(10, 8), dendrogram_ratio=(0.1,0.1), xticklabels=False, yticklabels=False, method='ward')
        else:
            plt.subplots(figsize=(10, 8))
            sns.heatmap(matrix, cmap='bwr', center=0)
        plt.title(f'Block {block} {areas}')
        plt.show()

    return matrix
In [ ]:
# Function that automatically saves/loads and plots STTC of the spikes during a stimulus block for all areas

def compute_save_plot_block(block, plot = True, ordered = False):
    
    if os.path.exists(f'/Users/gabri/Desktop/STTC/block_{block}.pkl'):
        matrix = pickle.load(open(f'/Users/gabri/Desktop/STTC/block_{block}.pkl', 'rb'))
    else:
        matrix = STTC_multiple(block)
        pickle.dump(matrix, open(f'/Users/gabri/Desktop/STTC/block_{block}.pkl', 'wb'))
    
    if plot:

        if ordered:
            sns.clustermap(matrix, cmap='bwr', center=0, figsize=(10, 8), dendrogram_ratio=(0.1,0.1), xticklabels=False, yticklabels=False, method='ward')
        else:
            plt.subplots(figsize=(10, 8))
            sns.heatmap(matrix, cmap='bwr', center=0)
        plt.title(f'Block {block}')
        plt.show()

    return matrix
In [ ]:
# Function that automatically saves/loads and plots STTC of the whole spike train for given areas, ordered by area

def compute_save_plot_all_ordered(areas, plot = True):

    matrix = compute_save_plot_all(areas, plot = False)
    
    area_indeces = pickle.load(open('/Users/gabri/Desktop/STTC/area_indeces.pkl', 'rb'))

    # order areas by length of indeces

    ordered_area_indeces = dict(sorted(area_indeces.items(), key=lambda item: len(item[1]), reverse=True))

    indeces = []
    for _, area_indeces in ordered_area_indeces.items():
        indeces.extend(area_indeces)
    
    reordered_matrix = matrix[indeces]
    reordered_matrix_2 = reordered_matrix.copy()
    for i, x in enumerate(reordered_matrix):
        reordered_matrix_2[i] = x[indeces]

    if plot:
        plt.subplots(figsize=(100, 100))
        sns.heatmap(reordered_matrix_2, cmap='bwr', center=0, square=True)
        plt.title(f'STTC computed on the whole spike trains', fontsize=100)
        # plt.xticks([])
        # plt.yticks([])

        area_sizes = oursession.units['ecephys_structure_acronym'].value_counts().values
        
        increments = np.zeros(len(area_sizes))
        for i in range(len(area_sizes)):
            increments[i] = sum(area_sizes[:i+1])

        plt.hlines(increments, *plt.xlim(), colors='black')
        plt.vlines(increments, *plt.ylim(), colors='black')

        # plot area names in the middle of their section

        for i, area in enumerate(ordered_area_indeces.keys()):
            try:
                plt.text(increments[i] - area_sizes[i]//2, increments[i] - area_sizes[i]//2, area, fontsize=50, color='black')
            except:
                break

        plt.show()
    return matrix
In [ ]:
# Function that automatically saves/loads and plots STTC of the spikes during a stimulus block for all areas, ordered by area

def compute_save_plot_block_ordered(block, plot=True):
    matrix = compute_save_plot_block(block, plot = False)
    
    area_indeces = pickle.load(open('/Users/gabri/Desktop/STTC/area_indeces.pkl', 'rb'))

    # order areas by length of indeces

    ordered_area_indeces = dict(sorted(area_indeces.items(), key=lambda item: len(item[1]), reverse=True))

    indeces = []
    for _, area_indeces in ordered_area_indeces.items():
        indeces.extend(area_indeces)
    
    reordered_matrix = matrix[indeces]
    reordered_matrix_2 = reordered_matrix.copy()
    for i, x in enumerate(reordered_matrix):
        reordered_matrix_2[i] = x[indeces]

    if plot:
        plt.subplots(figsize=(10, 10))
        sns.heatmap(reordered_matrix_2, cmap='bwr', center=0, square=True)
        plt.title(f'STTC computed on block {block}')
        plt.xticks([])
        plt.yticks([])

        area_sizes = oursession.units['ecephys_structure_acronym'].value_counts().values
        
        increments = np.zeros(len(area_sizes))
        for i in range(len(area_sizes)):
            increments[i] = sum(area_sizes[:i+1])

        plt.hlines(increments, *plt.xlim(), colors='black')
        plt.vlines(increments, *plt.ylim(), colors='black')

        # plot area names in the middle of their section

        for i, area in enumerate(ordered_area_indeces.keys()):
            try:
                plt.text(increments[i] - area_sizes[i]//2, increments[i] - area_sizes[i]//2, area, fontsize=12, color='black')
            except:
                break

        plt.show()
    return matrix

Visualization of the results¶

As can be seen above, we implemented functions that enable the computation of the coefficient for specific time intervals or for neuron in specific areas. This was done to make the calculations less computationally expensive and analyze specific sections of the data.

However, we were able to compute the metric for the whole dataset (complete spike trains and all areas), so we are going to show the results obtained from this.

In [ ]:
compute_save_plot_all_ordered(oursession.structure_acronyms, plot=True)
No description has been provided for this image
array([[ 1.        ,  0.0750257 ,  0.15931823, ...,  0.04312051,
         0.04811847, -0.031305  ],
       [ 0.0750257 ,  1.        ,  0.14971448, ..., -0.01397203,
        -0.03893257, -0.04039993],
       [ 0.15931823,  0.14971448,  1.        , ...,  0.00951153,
         0.0338185 ,  0.00144769],
       ...,
       [ 0.04312051, -0.01397203,  0.00951153, ...,  1.        ,
         0.46211766,  0.22177768],
       [ 0.04811847, -0.03893257,  0.0338185 , ...,  0.46211766,
         1.        ,  0.21329055],
       [-0.031305  , -0.04039993,  0.00144769, ...,  0.22177768,
         0.21329055,  1.        ]])

Studying the data¶

In [ ]:
STTC_matrix = pickle.load(open("/Users/gabri/Desktop/STTC/['LP', 'DG', 'CA1', 'VISam', nan, 'APN', 'TH', 'Eth', 'CA3', 'VISrl', 'HPF', 'ProS', 'SUB', 'VISp', 'CA2', 'VISl', 'MB', 'NOT', 'LGv', 'VISal']_all.pkl", 'rb'))

To make it easier to analyze we are reorganizing the matrix to cluster neurons from the same area together in order of area size

In [ ]:
area_indeces = pickle.load(open('/Users/gabri/Desktop/STTC/area_indeces.pkl', 'rb'))

ordered_area_indeces = dict(sorted(area_indeces.items(), key=lambda item: len(item[1]), reverse=True))

indeces = []
for _, area_indeces in ordered_area_indeces.items():
    indeces.extend(area_indeces)

temp = STTC_matrix[indeces]
STTC_matrix_ordered = temp.copy()
for i, x in enumerate(temp):
    STTC_matrix_ordered[i] = x[indeces]
In [ ]:
ordered_area_names = list(ordered_area_indeces.keys())[:-2]
ordered_area_names
['VISam',
 'CA1',
 'VISp',
 'VISal',
 'VISl',
 'LP',
 'SUB',
 'VISrl',
 'DG',
 'APN',
 'CA3',
 'NOT',
 'LGv',
 'ProS',
 'TH',
 'CA2',
 'Eth',
 'MB']

Now we compute the sizes of the areas that can be used as indeces to restrict the analysis to a specific part of the matrix

In [ ]:
area_sizes = oursession.units['ecephys_structure_acronym'].value_counts().values
increments = np.zeros(len(area_sizes), dtype=int)
for i in range(len(area_sizes)):
    increments[i] = sum(area_sizes[:i+1])
increments
array([135, 269, 363, 452, 530, 595, 654, 701, 732, 757, 778, 794, 810,
       819, 821, 823, 824, 825])

From now on (in this section), we are going to consider only the first 8 biggest areas, since the remaining ones have a low number of neurons and may lead to inaccurate behaviours

In [ ]:
ordered_area_names = ordered_area_names[:8]

Average strength of correlation between neurons in the same area¶

We compute the average absolute value of the correlation between neurons of the same area

In [ ]:
average_STTC = dict()

for i, area in enumerate(ordered_area_names):
    if i == 0:
        length = increments[i]
        average_STTC[area] = (np.sum(np.abs(STTC_matrix_ordered[:increments[i], :increments[i]])) - length) / ( length*(length-1) )
    else:
        length = (increments[i]-increments[i-1])

        average_STTC[area] = (np.sum(np.abs(STTC_matrix_ordered[increments[i-1]:increments[i], increments[i-1]:increments[i]])) - length) / ( length*(length-1) )
average_STTC
{'VISam': 0.08255662344317649,
 'CA1': 0.1001113941634092,
 'VISp': 0.10531725606406021,
 'VISal': 0.07665673167802156,
 'VISl': 0.07007841320001308,
 'LP': 0.07686991884234151,
 'SUB': 0.0842123747695893,
 'VISrl': 0.08313645214815685}

We sort them

In [ ]:
sorted(average_STTC.items(), key=lambda x: x[1], reverse=True)
[('VISp', 0.10531725606406021),
 ('CA1', 0.1001113941634092),
 ('SUB', 0.0842123747695893),
 ('VISrl', 0.08313645214815685),
 ('VISam', 0.08255662344317649),
 ('LP', 0.07686991884234151),
 ('VISal', 0.07665673167802156),
 ('VISl', 0.07007841320001308)]

We can see that, on average, neurons are not very correlated.

To give a better interpretation to these values, it might be of more interest to see the average correlations also between different areas and order them

In [ ]:
def area_STTC(area):
    area_STTCs = dict()
    if area != 'VISam':
        area_start = increments[ordered_area_names.index(area)-1]
    else:
        area_start = 0
    area_end = increments[ordered_area_names.index(area)]
    for i, area_name in enumerate(ordered_area_names):
        if area_name == area:
            area_STTCs[area_name] = average_STTC[area_name]
            continue
        if i == 0:
            area_STTCs[area_name] = np.mean(np.abs(STTC_matrix_ordered[:increments[i], area_start:area_end]))
        else:
            area_STTCs[area_name] = np.mean(np.abs(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end]))
    return area_STTCs
In [ ]:
STTC_averages = dict()
for area in ordered_area_names:
    STTC_averages[area] = sorted(area_STTC(area).items(), key=lambda x: x[1], reverse=True)

for area, corrs in STTC_averages.items():
    print(area, ': ', corrs)
VISam :  [('VISam', 0.08255662344317649), ('VISp', 0.07246637608277162), ('VISal', 0.07078178263623246), ('VISrl', 0.06918459347278622), ('CA1', 0.06682707142135616), ('SUB', 0.0654369121733156), ('VISl', 0.0634595320180365), ('LP', 0.062225983051985116)]
CA1 :  [('CA1', 0.1001113941634092), ('SUB', 0.08039862642473626), ('VISrl', 0.07019368459990617), ('VISam', 0.06682707142135616), ('LP', 0.06499226682240566), ('VISp', 0.06278932375997498), ('VISal', 0.05769961157971161), ('VISl', 0.052464472986628576)]
VISp :  [('VISp', 0.10531725606406021), ('VISam', 0.07246637608277162), ('VISal', 0.0723934542401987), ('VISrl', 0.06645831784213206), ('VISl', 0.06537822488422153), ('LP', 0.06317560386805418), ('CA1', 0.06278932375997498), ('SUB', 0.061344691312894585)]
VISal :  [('VISal', 0.07665673167802156), ('VISp', 0.0723934542401987), ('VISam', 0.07078178263623246), ('VISrl', 0.06579576204969023), ('VISl', 0.06424429769528353), ('SUB', 0.05887962259519398), ('CA1', 0.05769961157971159), ('LP', 0.054870603464347724)]
VISl :  [('VISl', 0.07007841320001308), ('VISp', 0.06537822488422151), ('VISal', 0.06424429769528353), ('VISam', 0.06345953201803652), ('VISrl', 0.06342364346999328), ('SUB', 0.05473712451935701), ('CA1', 0.05246447298662857), ('LP', 0.05024867914900693)]
LP :  [('LP', 0.07686991884234151), ('SUB', 0.0660672236065325), ('CA1', 0.06499226682240566), ('VISp', 0.06317560386805417), ('VISam', 0.062225983051985116), ('VISrl', 0.05559671817688621), ('VISal', 0.05487060346434773), ('VISl', 0.05024867914900692)]
SUB :  [('SUB', 0.0842123747695893), ('CA1', 0.08039862642473629), ('VISrl', 0.06799762846617104), ('LP', 0.0660672236065325), ('VISam', 0.06543691217331558), ('VISp', 0.06134469131289457), ('VISal', 0.05887962259519398), ('VISl', 0.05473712451935701)]
VISrl :  [('VISrl', 0.08313645214815685), ('CA1', 0.07019368459990619), ('VISam', 0.06918459347278622), ('SUB', 0.06799762846617104), ('VISp', 0.06645831784213206), ('VISal', 0.06579576204969023), ('VISl', 0.06342364346999328), ('LP', 0.05559671817688621)]

There are two important results that can be drawn from this. Firstly, we can see that every area is mostly correlated within itself. Secondly, we can also see that VIS areas tend to be more correlated between themselves rather than with other areas, while CA1, SUB and LP form a secondary group.

It is hard to extrapolate further results from this, since this correlation data doesn't imply by no means that the neurons are connected. Nontheless, the fact that this is a recurrent result for all areas, might follow from the presence of an underlying process or structure at the neural level that could be responsible for this.

Looking for maximal and minimal correlation between areas¶

Another interesting thing that we could look at is the maximal and minimal correlation between neurons across areas. This will probably catch outliers, but it is still interesting to see how similar or opposite two spike trains can be.

In [ ]:
def area_STTC_max(area):

    area_STTCs = dict()
    if area != 'VISam':
        area_start = increments[ordered_area_names.index(area)-1]
    else:
        area_start = 0
        
    area_end = increments[ordered_area_names.index(area)]
    for i, area_name in enumerate(ordered_area_names):
        if i == 0:
            if area_name == area:
                area_STTCs[area_name] = np.max(STTC_matrix_ordered[:increments[i], area_start:area_end]-np.diag(np.diag(STTC_matrix_ordered[:increments[i], area_start:area_end])))
            else:
                area_STTCs[area_name] = np.max(STTC_matrix_ordered[:increments[i], area_start:area_end])
        else:
            if area_name == area:
                area_STTCs[area_name] = np.max(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end]-np.diag(np.diag(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end])))
            else:
                area_STTCs[area_name] = np.max(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end])
    return area_STTCs
In [ ]:
STTC_maxs = dict()

for area in ordered_area_names:

    STTC_maxs[area] = sorted(area_STTC_max(area).items(), key=lambda x: x[1], reverse=True)

for area, corrs in STTC_maxs.items():
    print(area, ': ', corrs)
VISam :  [('CA1', 0.6651158975296159), ('VISam', 0.6350849368523841), ('VISp', 0.5800964689197329), ('LP', 0.5598944905807833), ('SUB', 0.5476996795227088), ('VISal', 0.5454190445814023), ('VISl', 0.536936135487955), ('VISrl', 0.5243718541081331)]
CA1 :  [('CA1', 0.7229560496911289), ('VISam', 0.6651158975296159), ('VISp', 0.639178207718907), ('SUB', 0.6080687741427093), ('VISrl', 0.5948133724252141), ('LP', 0.5919274306095716), ('VISl', 0.5347432576074619), ('VISal', 0.4847897527930829)]
VISp :  [('VISp', 0.7197770797952999), ('CA1', 0.639178207718907), ('VISal', 0.5850056720337684), ('VISam', 0.5800964689197329), ('VISl', 0.5419931163703515), ('LP', 0.5363600275458325), ('SUB', 0.44749243644458014), ('VISrl', 0.4173519614925355)]
VISal :  [('VISal', 0.7116330546720138), ('LP', 0.6033140211296097), ('VISp', 0.5850056720337684), ('VISl', 0.5836344014645951), ('VISam', 0.5454190445814023), ('SUB', 0.5110129379484085), ('VISrl', 0.5016421738810275), ('CA1', 0.4847897527930829)]
VISl :  [('LP', 0.6925808517866816), ('VISal', 0.5836344014645951), ('SUB', 0.5637227273834009), ('VISp', 0.5419931163703515), ('VISam', 0.536936135487955), ('CA1', 0.5347432576074619), ('VISl', 0.49897339782199657), ('VISrl', 0.4590807307862382)]
LP :  [('LP', 0.805278258710598), ('VISl', 0.6925808517866816), ('VISal', 0.6033140211296097), ('CA1', 0.5919274306095716), ('SUB', 0.5867792358722027), ('VISam', 0.5598944905807833), ('VISp', 0.5363600275458325), ('VISrl', 0.37015812323930314)]
SUB :  [('CA1', 0.6080687741427093), ('LP', 0.5867792358722027), ('VISl', 0.5637227273834009), ('VISrl', 0.5493323238278809), ('VISam', 0.5476996795227088), ('SUB', 0.536850745243445), ('VISal', 0.5110129379484085), ('VISp', 0.44749243644458014)]
VISrl :  [('VISrl', 0.868529791741372), ('CA1', 0.5948133724252141), ('SUB', 0.5493323238278809), ('VISam', 0.5243718541081331), ('VISal', 0.5016421738810275), ('VISl', 0.4590807307862382), ('VISp', 0.4173519614925355), ('LP', 0.37015812323930314)]
In [ ]:
def area_STTC_min(area):

    area_STTCs = dict()
    if area != 'VISam':
        area_start = increments[ordered_area_names.index(area)-1]
    else:
        area_start = 0
        
    area_end = increments[ordered_area_names.index(area)]
    for i, area_name in enumerate(ordered_area_names):
        if i == 0:
            if area_name == area:
                area_STTCs[area_name] = np.min(STTC_matrix_ordered[:increments[i], area_start:area_end]-np.diag(np.diag(STTC_matrix_ordered[:increments[i], area_start:area_end])))
            else:
                area_STTCs[area_name] = np.min(STTC_matrix_ordered[:increments[i], area_start:area_end])
        else:
            if area_name == area:
                area_STTCs[area_name] = np.min(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end]-np.diag(np.diag(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end])))
            else:
                area_STTCs[area_name] = np.min(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end])
    return area_STTCs
In [ ]:
STTC_mins = dict()

for area in ordered_area_names:

    STTC_mins[area] = sorted(area_STTC_min(area).items(), key=lambda x: x[1])

for area, corrs in STTC_mins.items():
    print(area, ': ', corrs)
VISam :  [('CA1', -0.430786862411099), ('VISp', -0.3773482044469345), ('VISal', -0.3510943437121914), ('SUB', -0.34147698357597467), ('VISam', -0.3357315653943669), ('VISrl', -0.3141679115156154), ('VISl', -0.31220707100558076), ('LP', -0.25641605367963805)]
CA1 :  [('VISrl', -0.4353173805662587), ('VISam', -0.430786862411099), ('CA1', -0.3798163712676609), ('SUB', -0.34876007655892), ('VISp', -0.34655454124303753), ('VISl', -0.33840381880335446), ('VISal', -0.32544032059193484), ('LP', -0.2842342581065946)]
VISp :  [('VISam', -0.3773482044469345), ('CA1', -0.34655454124303753), ('SUB', -0.31920061186430604), ('VISrl', -0.3157738865228304), ('VISp', -0.2872294636971585), ('VISal', -0.2861474743007289), ('LP', -0.2751852609753331), ('VISl', -0.23298831816302507)]
VISal :  [('VISl', -0.3655834494929514), ('VISam', -0.3510943437121914), ('VISal', -0.34607952046700446), ('VISrl', -0.3411993649531014), ('SUB', -0.32882782700336877), ('CA1', -0.32544032059193484), ('LP', -0.31912271296107664), ('VISp', -0.2861474743007289)]
VISl :  [('VISal', -0.3655834494929514), ('LP', -0.3489079364386581), ('CA1', -0.33840381880335446), ('VISam', -0.31220707100558076), ('VISrl', -0.3120568676745754), ('VISl', -0.2696441045757123), ('SUB', -0.25150597571612704), ('VISp', -0.23298831816302507)]
LP :  [('LP', -0.38763823869604636), ('VISl', -0.3489079364386581), ('VISal', -0.31912271296107664), ('SUB', -0.2919104816513442), ('CA1', -0.2842342581065946), ('VISp', -0.2751852609753331), ('VISrl', -0.2578374035862638), ('VISam', -0.25641605367963805)]
SUB :  [('SUB', -0.3553883995007206), ('CA1', -0.34876007655892), ('VISam', -0.34147698357597467), ('VISal', -0.32882782700336877), ('VISp', -0.31920061186430604), ('LP', -0.2919104816513442), ('VISrl', -0.26457427969283887), ('VISl', -0.25150597571612704)]
VISrl :  [('CA1', -0.4353173805662587), ('VISal', -0.3411993649531014), ('VISrl', -0.3192718625720402), ('VISp', -0.3157738865228304), ('VISam', -0.3141679115156154), ('VISl', -0.3120568676745754), ('SUB', -0.26457427969283887), ('LP', -0.2578374035862638)]

It is interesting to see that both for the maximal and minimal correlations we have signifacntly different results. Not only it is not possible to distinguish the two structures that we had before, but in half of the areas for the maximal and in three quarters for the minimal we see that the strongest correlation between a neuron of that area and another neuron is not with a neuron belonging to the same area. The combination of these two changes, together with the fact that the results deviate significantly from the mean of the absolute value computed above, lead us to think that studying this correlations would be misleading, since maximal and minimal correlations are with high probability outliers.

Additional plot

Since we were able to compute the correlation matrix for the whole length of the spike trains we decided to focus our analysis on that data. In this way we were able to abstract from specific stimuli presentations and gather the most general results possible.

However, we thought it could be interesting to show also the plot for the second block of stimuli presentations (Drifting gabors), which presented the same results described above in a more visible way. In particular, what is interesting here is the presence of higher correlations for blocks in the diagonal, which are correlations within the same area, and the presence of a darker area for the correlations between VISam, VISp, VISal and VISl with respect to a lighter area in the blocks of correlation between CA1 and the afore mentioned areas.

In [ ]:
compute_save_plot_block_ordered(2, plot=True);
No description has been provided for this image

Final remark

In all the correlation matrices we plotted, we always noted the presence of some neurons that were correlated with strength around 0.5 with all other neurons. This is an interesting result that we tryed to further analyzed, but we did not reach any relevant conclusion on why this might be the case. In any case, this did not happen for neurons in the 8 biggest areas that we selected, so our results were not byased by the presence of such neurons, which were located primarily in APN and TH.

3. Maximal entropy distribution over configurations of activity¶

The model we will implement in this section aims at capturing patterns of activity of neurons through a probability distribution. It is the contact point between our analysis on the correlation, explained in the above sections, and the last part which will be about predicting neuronal activity. We use a maximal entropy model, since it returns the most unbiased probability distribution, and we constrain the mean and correlation of the probability distribution we want to find to match the empirical mean and correlation computed from the data. More precisely the structure will be the following (reference: https://arxiv.org/pdf/2112.14735):

We select one type of stimulus and construct $T$ binary vectors of length $N$:

$\{x_i\}^\mu = \{x_1, x_2, ...., x_N\}^\mu$, $\mu = 1,...,T$

where $N$ in the number of neurons we are considering and $T$ is the number of times that stimulus is presented.

Every vector will take a value $x_i^\mu = \{+1, -1\}$ indicating whether the $i_{th}$ neuron spiked or not during presentation $\mu$.

Through these vectors we can compute the values of empirical mean and correlation as follows:

$m_{i}^{emp} = \frac{1}{T}\sum_{\mu}x_i^\mu$

$C_{ij}^{emp} = \frac{1}{T}\sum_{\mu}x_i^\mu x_j^\mu$

Now we want to find the probability distribution $P$ that satisfies the following optimization problem:

$max$ $S := - \sum_{\{x_i\}} P(\{x_i\})logP(\{x_i\})$

$subject$ $to$

$\langle x_i \rangle := \sum_{\{x_i\}} x_i P(\{x_i\}) = m_{i}^{emp}$

$\langle x_i x_j \rangle := \sum_{\{x_i\}} x_i x_j P(\{x_i\}) = C_{ij}^{emp}$

We can solve this optimization problem by taking the partial derivatives of the Lagrangian and setting it to zero. Doing so, we obtain the following expression, which corresponds to the Boltzmann distribution in the setting of the Spin glass model:

$P(\{x_i\}) = \frac{1}{Z} exp[-E(\{x_i\})]$

where $E(\{x_i\}) = - \sum_{i}h_i x_i - \frac{1}{2} \sum_{i,j}J_{ij} x_i x_j$

and $Z = \sum_{\{x_i\}}exp[-E(\{x_i\})]$

$h_i$ can be seen as local magnetic field and $J_{ij}$ as coupling constants. Thus, we want to infer $h_i$ and $J_{ij}$ to have the compelte probability distribution.

To do so, the best known method is the so called Boltzman machine, which comprises a Monte Carlo simulation step, done to estimate $\langle x_i \rangle$ and $\langle x_i x_j \rangle$ from given $h_i$ and $J_{ij}$, and a gradient descent step to update the parameters to match our initial equations. Nonetheless, this is quite computationally expensive, so we opted for an approximation method known as Mean field learning. Estimating $h_i$ and $J_{ij}$ becomes just an algebraic computation and it follows the following steps:

  • if $C^{emp}$ is not full rank, we do an initial step wich is to consider a matrix $C = (1-\alpha)\cdot C^{emp} + \alpha \cdot I$, where $I$ is the identity matrix and $\alpha \in (0,1)$ is a hyperparameter that can be tuned
  • Then, we compute $J = - [C]^{-1}$ and set the diagonal to $0$
  • Finally, we can compute $h = - J \cdot m^{emp} + tanh^{-1}[m^{emp}]$

This is already a pretty good approximation, but the metohd could be extended to increase precision. A popular way, which could be explored in furteher implementations, is to use the values obtained with mean field as initialization values for the pseudo-likelihood method, and then what one obtains as a result there, as initialization for the Boltzman machine described above.

We will firstly consider the neurons in one brain area (we chose VISam because in our dataset it is the most numerous and thus, measures on neurons themselves can be considered as strong, but it can be repeated in other brain areas) filtering neuronal activation during the presentation of the stimulus "static gratings" with a particular orientation. Doing so, we can understand weather some effective local interactions might be present and, as we said, it might be possible to capture them through the correlation measure.

We will also compare our statistical physics model with a random model (i.e. random patterns of activity which present the same empirical mean across trials, but with zero correlation being drawn independently) to see if there is indeed an improvement.

Lastly, since we are also interested in studying how different areas relate to each other and how results change when moving across them, we will include in our analysis neurons in other brain areas to infer the distribution of configuration in VISam. In particular, we will consider one brain area which is spatially close to VISam and one which is instead further away and compare the results to try to extract some meaningful conclusion

We start by defining some functions to make the computations described easier to be performed multiple times

In [4]:
def create_configuration(n): 
    """
    Function to create a random configuration of n spins (neurons)
    """
    configuration = np.random.choice([1, -1], size=n)
    return configuration

def energy_configuration(s,h,J):
    """
    Function to calculate the energy of a configuration s
    """
    single_term = np.dot(h,s)
    double_term = s @ J @ s.T
    E = - single_term - (double_term)/2
    return E

def delta_E(i,s,delta_s, h, J):
    """
    Function to calculate the difference in energy starting from configuration s

    Parameters
    ----------
    i : position of the spin flip proposed by the algorithm
    s : initial configuration
    delta_s : difference between the initial and final spin value in position i
    h : array of values of local fields
    J : matrix of values of coupling constants
    """
    single_delta = h[i]*delta_s
    double_delta = np.dot(J[i],delta_s*s) 
    return single_delta+double_delta

def monte_carlo_simulation(n,nsteps,h,J, mean_corr = False, configs =False, T=1):

    """
    Function to run monte carlo simulations using metropolis algorithm

    Parameters
    ----------
    n : number of neurons considered
    h : array of values of local fields
    J : matrix of values of coupling constants
    mean_corr : if set to True it returns the final estimation of mean and correlation
    configs : if set to True it returns all the configurations obtained through sampling
    """

    configuration = create_configuration(n)
    res = dict()
    si = configuration.copy()
    sisj = np.outer(configuration.copy(), configuration.copy())
    spins = []
    spins.append(configuration.copy())
    effective_steps = 1
    for j in range(nsteps):       
        i = np.random.randint(n) ## random sampling of the position of the spin to flip
        s_0 = configuration[i]
        s_f = - s_0
        dE = delta_E(i,configuration,(-s_f+s_0),h,J) ## calculation of the change in energy with the selected spin flip
        if dE < 0 or np.random.random() < np.exp(-(dE/T)): ## metropolis 
            configuration[i] = s_f
        if (j >= 50000) and (j % 1000 == 0): ## to make sampling more reliable we start saving configurations after the 50000th step and every 1000 steps
            effective_steps+=1
            config_copy = configuration.copy()
            spins.append(config_copy)
            si += config_copy
            sisj += np.outer(config_copy, config_copy)
        if j % 100000 == 0:
            print(f'Step {j} out of {nsteps}')
    m_e = si/effective_steps ## empirical mean
    c_e = sisj/effective_steps - np.outer(m_e,m_e) 
    res['effective_steps'] = effective_steps
    if mean_corr:
        res['m_e'] = m_e
        res['c_e'] = c_e
    if configs:
        res['spin_configs'] = spins
    return res

def mean_field_learning(n,m_real,c_real,alpha,full_rank=False):
    """
    Function to perform mean field learning.
    Used to infer the values of h and J starting from a mean field theory assumption.

    Parameters
    ----------
    n : number of neurons considered
    m_real : empirical values of mean 
    c_real : empirical value of correlation
    alpha : weight given to identity matrix in linear combination with correlation matrix (hyperparameter)
    full_rank : bool to indicate wether the matrix is full rank i.e. when there is no necessity of rescaling with identity
    """
    if not full_rank:
        c = (1-alpha)*c_real + alpha*np.identity(n) ## taking the empirical correlation and "making it invertible"
    if full_rank:
        c = c_real 
    c_inv =  np.linalg.inv(c) ## inverting the empirical correlation matrix and changing sign
    c_inv -= np.diag(np.diag(c_inv)) ## setting the diagonal to zero
    J = - c_inv ## this is the coupling constants matrix 
    ## if m_i real is 1 or -1 the arctanh explodes so we rescale it by a small epsilon
    for i,neuron in enumerate(m_real):
        if np.abs(neuron) == 1:
            m_real[i] -= neuron * 0.01

    h = - (J@m_real) + np.arctanh(m_real)
    return h,J
In [31]:
def empirical_mean_corr(region_ids, df_stimulus):
    """
    Compute the empirical mean and correlation of neurons across different presentations of the same stimulus

    Parameters
    ----------
    region_ids : oursession.units[oursession.units["ecephys_structure_acronym"]==area].index.values where area can be one of the visual areas
    (region_ids can be comprehensive of two or more brain areas)
    df_stimulus : oursession.get_stimulus_table("stimulus") where stimulus is the type of stimulus chosen 

    """
    
    si = np.zeros(len(region_ids))
    sisj = np.zeros(shape=(len(region_ids),len(region_ids)))
    for i in range(df_stimulus.shape[0]):
        activity_temp = np.ones(len(region_ids))
        for j,id in enumerate(region_ids):
            spikes = oursession.spike_times[id]
            masking = spikes[(spikes > df_stimulus.iloc[i,1]) & (spikes < df_stimulus.iloc[i,2])]
            if len(masking) == 0:
                activity_temp[j] = -1
        si += activity_temp
        sisj += np.outer(activity_temp, activity_temp)

    m_real = si/(df_stimulus.shape[0]) ## empirical mean
    c_real = sisj/(df_stimulus.shape[0]) ## empirical correlation 

    return m_real,c_real 

def activity(region_ids, df_stimulus):
    """
    Detect the activity of neurons in a given area, in a small interval of time when shown a particular type of stimulus (1 : spike, -1 : no spike)
    
    Parameters
    ----------
    region_ids : oursession.units[oursession.units["ecephys_structure_acronym"]==area].index.values where area can be one of the visual areas
    (region_ids can be comprehensive of two or more brain areas)
    df_stimulus : oursession.get_stimulus_table("stimulus") where stimulus is the type of stimulus chosen and it can be selected one single type of it (e.g. only positive flashes)

    """
    activity = []
    for i in range(df_stimulus.shape[0]):
        activity_temp = np.ones(len(region_ids))
        for j,id in enumerate(region_ids):
            spikes = oursession.spike_times[id]
            masking = spikes[(spikes > df_stimulus.iloc[i,1]) & (spikes < df_stimulus.iloc[i,2])]
            if len(masking) == 0:
                activity_temp[j] = -1
        activity.append(activity_temp)
    return activity
In [6]:
def calculate_h_J(region_ids,df_stimulus,alpha,full_rank=False):
    """
    Function that returns the values of h and J
    
    Parameters
    ----------
    region_ids : oursession.units[oursession.units["ecephys_structure_acronym"]==area].index.values where area can be one of the visual areas
    (region_ids can be comprehensive of two or more brain areas)
    df_stimulus : oursession.get_stimulus_table("stimulus") where stimulus is the type of stimulus chosen and it can be selected one single type of it (e.g. only positive flashes)
    alpha : weight given to identity matrix in linear combination with correlation matrix (hyperparameter)
    full_rank : bool to indicate wether the matrix is full rank i.e. when there is no necessity of rescaling with identity
    """
    
    m_real,c_real = empirical_mean_corr(region_ids,df_stimulus)
    n = len(region_ids)
    h,J = mean_field_learning(n,m_real,c_real,alpha,full_rank)
    
    return h,J

We must choose some metric to see how the model performs.

Once we have computed all the parameters of our probability distribution, we can do some parameter-free estimations and compare the model with the data.

We chose to compute (and plot):

  1. the probability that K out of N neurons are active at the same time (denoted P(K)) together with the same probability but derived empirically from the data
  2. the probability to have a certain energy configuration (denoted P(E)) together with the same probability but derived empirically from the data

Indeed, the following functions perform the computations and relative plots of histograms and probabilities for points 1. and 2. In particular:

For point 1:

  • histogram displays the value of K (in steps of 2) in the x-axis VS the number of configurations that present that number of active neurons on the y-axis
  • probability plot dislays the the value of K (in steps of 2) in the x-axis VS the probability of having that number of active neurons on the y-axis

For point 2:

  • histogram displays the value E of energies (in steps of 2 from the minimum to the maximum of the empirically observed ones) in the x-axis VS the number of configurations that present that value of energy on the y-axis
  • probability plot dislays the value E of energies (in steps of 2 from the minimum to the maximum of the empirically observed ones) in the x-axis VS the probability of having that value of energy on the y-axis
In [71]:
def K_outof_N_histogram(s,empirical_activity_K):
    
    configs = s['spin_configs']
    
    ## making configurations 0-1 for easier calculations later
    for i,x in enumerate(configs):
        configs[i] = (configs[i]+1)//2

    ## for every configuration sampled, computes the number of active neurons in that configuration
    n_active_neurons = []
    for i,x in enumerate(configs):
        n_active_neurons.append(np.sum(x))

    ## making empirical activity 0-1 for easier calculations later
    for i,x in enumerate(empirical_activity_K):
        empirical_activity_K[i] = (empirical_activity_K[i]+1)//2

    ## for every empirical configuration, computes the number of active neurons in that configuration
    emp_active_neurons = []
    for i,x in enumerate(empirical_activity_K):
        emp_active_neurons.append(np.sum(x))

    ## creating the bins for the histogram : from min of the empirical activity to max in step of 1
    ## min of empirical activity is the minimum number of simultaneously active neurons 
    ## max of empirical activity is the maximum number of simultaneously active neurons 

    bins_K = np.arange(min(emp_active_neurons),max(emp_active_neurons)+2,2) 
    counts_K, edges_K = np.histogram(n_active_neurons, bins=bins_K)
    counts_K_emp, edges_K_emp = np.histogram(emp_active_neurons, bins=bins_K)
    edges_K = edges_K[:-1]
    edges_K_emp = edges_K_emp[:-1]

    return counts_K, counts_K_emp, bins_K, edges_K

def plot_K_outof_N_histogram(counts_K,counts_K_emp,bins_K,area:str,stimulus:str):
    fig,ax = plt.subplots(1,2,figsize=(15,5))
    ax[0].stairs(counts_K, bins_K, color='lightblue')
    ax[1].stairs(counts_K_emp, bins_K, color='pink')
    for x in [0,1]:
        ax[x].set_xlabel('K')
        ax[x].set_ylabel('# configurations with K active neruons')
    ax[0].set_title('Model')
    ax[1].set_title('Data')
    plt.suptitle(f'{area}, {stimulus}')
    plt.show()

def K_outof_N_prob(s,counts_K,counts_K_emp,df_stimulus):
    ## from the counts, computing the probabilities, both in the model and empirical cases
    prob_K = np.array(counts_K)/s['effective_steps']
    prob_K_emp = np.array(counts_K_emp)/df_stimulus.shape[0]
    return prob_K,prob_K_emp

def plot_K_outof_N_prob(prob_K,prob_K_emp,edges_K,area:str,stimulus:str):
    plt.figure()
    plt.plot(edges_K,prob_K,color='lightblue',label='model')
    plt.plot(edges_K,prob_K_emp,color='pink',label='data')
    plt.xlabel('K simultaneous active neurons')
    plt.ylabel('P(K)')
    plt.suptitle(f'{area}, {stimulus}')
    plt.legend()
    plt.show()

def E_histogram(s,h,J,empirical_activity_K):
    configs = s['spin_configs']
    ## the process is the same as before but computing the energies of the configurations 
    energies = []
    for config in configs:
        energies.append(energy_configuration(config,h,J))

    emp_energies = []
    for config in empirical_activity_K:
        emp_energies.append(energy_configuration(config,h,J))

    bins_E = np.arange(min(emp_energies),max(emp_energies),2) 
    counts_E, edges_E = np.histogram(energies, bins=bins_E)
    counts_E_emp, edges_E_emp = np.histogram(emp_energies, bins=bins_E)
    edges_E = edges_E[:-1]
    edges_E_emp = edges_E_emp[:-1]
    
    return counts_E, counts_E_emp, bins_E, edges_E

def plot_E_histogram(counts_E,counts_E_emp,bins_E,area:str,stimulus:str):
    fig,ax = plt.subplots(1,2,figsize=(15,5))
    ax[0].stairs(counts_E, bins_E, color='lightblue')
    ax[1].stairs(counts_E_emp, bins_E, color='pink')
    for x in [0,1]:
        ax[x].set_xlabel('E')
        ax[x].set_ylabel('# configurations with energy E')
    ax[0].set_title('Model')
    ax[1].set_title('Data')
    plt.suptitle(f'{area}, {stimulus}')
    plt.show()

def E_prob(s,counts_E,counts_E_emp,df_stimulus):
    prob_E = np.array(counts_E)/s['effective_steps']
    prob_E_emp = np.array(counts_E_emp)/df_stimulus.shape[0]
    return prob_E,prob_E_emp

def plot_E_prob(prob_E,prob_E_emp,edges_E,area:str,stimulus:str):
    plt.figure()
    plt.plot(edges_E,prob_E,color='lightblue',label='model')
    plt.plot(edges_E,prob_E_emp,color='pink',label='data')
    plt.xlabel('Energy E')
    plt.ylabel('P(E)')
    plt.legend()
    plt.title(f'{area}, {stimulus}')
    plt.show()

3.1 VISam -> VISam, static gratings¶

We chose to consider the activity of neurons during the presentation of static gratings since it is the stimulus for which we see a higher correlation of activity. Furthermore, we decided to focus on a particular orientation for the following 2 reasons:

  • hope, again, for a stronger correlation in the activity, given the same entity of the type of stimulus considered, which would be more coherent with the assumptions and constraints of our model
  • gratings are presented around 6000 times, so to have acomparable sample of monte carlo - simulated configurations, we would need a lot of iterations in the algorithm. Instead, a single orientation is seen around 1000 times which, compared to other stimuli, is still a fair sample to compute empirical probabilities but at the same time would make computations ~ 6 times shorter

Clearly, this model has two main limitations:

  1. this number is still not enough to conclude that it is a good "empirical probability", so one could hope for better results for example merging other sessions and, thus, considering a higer amount of observations
  2. the time bin between two consecutive stimuli is relatively big compared to the time of a spike. Thus, since in our model we are just interested in whether a neuron spiked (+1) or not (-1), we are considering neurons that spike > 1 times in the same way as neurons that just spike 1 time, with a risk of loosing important information. Therefore, it would be interesting for further research to repeat the analysis considering a time bin small enough to have maximum one spike per neuron, or directly considering the firing rates instead of binary vectors.
In [32]:
df_stimulus = oursession.get_stimulus_table("static_gratings")
df_stimulus = df_stimulus[df_stimulus['orientation'] == 30]

As we said in the introduction, we restricted our analysis on extracting patterns of activity in VISam, being it the richest region in our dataset

In [33]:
area = 'VISam'
region_ids_VISam = oursession.units[oursession.units["ecephys_structure_acronym"]==area].index.values
In [56]:
h,J = calculate_h_J(region_ids=region_ids_VISam, df_stimulus=df_stimulus, alpha=0.7)
print('Local fields: \n\n', h)
print('\n\n Coupling constants: \n\n', J)
Local fields: 

 [ 2.59083484e-01  2.66342125e-02  4.86465327e-01 -1.08180469e-01
  5.11262057e-01 -2.13161809e-03  1.02087384e-01  9.48455336e-03
 -1.32653521e-01  7.60690201e-02 -2.96514142e-01  2.03269035e-01
 -2.06844425e-02  5.72143940e-03 -1.15785327e-02 -8.71256462e-05
 -1.36322001e-02 -3.07563694e-02 -3.41492677e-02 -2.35054268e-02
  5.95295788e-02  9.45378047e-02  4.18903887e-01 -1.34663007e-01
  5.22469821e-02  9.13114888e-01  2.39397923e-01  1.43219709e+00
  2.11190136e+00  3.99624011e-02  2.55760009e-03  4.99652862e-04
  7.83740574e-01  1.41575194e-01  3.17415422e-01 -7.48999419e-03
 -2.09333056e-02  1.17157611e-02  2.95551518e-03  3.38123128e-02
 -2.43789627e-01 -4.23205642e-01 -8.93776835e-02  1.23520736e+00
 -4.05809142e-02 -3.77849590e-03 -4.69356300e-01  8.46962276e-03
 -7.45287235e-03 -6.59756776e-02 -3.32663239e-01 -4.48348355e-01
 -1.95389267e-01 -2.61709123e-02 -1.56077367e-02  7.29034573e-02
 -7.47071611e-03 -1.96078055e-02 -1.48562511e-02 -5.16875440e-02
 -1.23783177e-01 -6.52016503e-02 -2.50176981e-02  7.89038447e-01
  1.97901899e-01 -2.11556230e-01 -7.49998542e-02 -1.96992164e-01
  9.43898199e-03 -8.58222121e-03 -4.08821759e-02 -1.57820280e-01
 -1.23811461e-01  6.62398783e-03 -5.75316195e-03  9.92912589e-03
 -1.20643042e-02 -2.29997352e-02 -1.21445413e-02  8.79557911e-01
  3.36100816e-02 -1.36965656e-02 -2.35091446e-02 -5.68720128e-04
 -1.25922634e-02 -1.32186108e-02  3.07308875e-01  2.92562414e-04
  1.49430908e-02 -1.57557494e-02 -2.46063573e-02 -5.46432877e-03
 -7.25882822e-03  3.05571349e-02  4.91836430e-01  1.03523835e-01
 -3.18744649e-01  1.85131226e-01  4.11949884e-01  6.37631336e-02
  7.12982368e-01 -8.06817502e-02  1.32424340e+00  1.51611194e-02
 -5.46470210e-01 -5.69291238e-03 -2.40964195e-01 -5.36742174e-03
  2.28903689e-02 -3.23948766e-02 -6.07332802e-03 -1.73573479e-03
  9.10502776e-03  1.92278647e-01 -1.16333482e-02 -2.11996188e-02
 -1.01551940e-01 -1.99366638e-02 -2.97448689e-01 -9.96319060e-03
  2.77116580e-02 -3.86098095e-02 -1.29040400e-02 -2.09304054e-02
 -1.36955524e-01 -6.50141896e-02 -1.72096652e-02 -4.25860116e-03
 -6.71168746e-03 -1.81902842e-01 -1.05341533e-01  1.35108032e-01
  3.12783364e-01 -5.44244230e-02  8.73698323e-01]


 Coupling constants: 

 [[-0.          0.009083    0.02239585 ...  0.01371031 -0.0195145
   0.02083298]
 [ 0.009083   -0.          0.00755833 ...  0.01536652  0.00337205
   0.0074803 ]
 [ 0.02239585  0.00755833 -0.         ...  0.01666217 -0.00865041
   0.021728  ]
 ...
 [ 0.01371031  0.01536652  0.01666217 ... -0.          0.00111876
   0.02686237]
 [-0.0195145   0.00337205 -0.00865041 ...  0.00111876 -0.
  -0.01086308]
 [ 0.02083298  0.0074803   0.021728   ...  0.02686237 -0.01086308
  -0.        ]]

We choose a number of monte carlo steps in order for us to generate a sample of configurations comparable wit the true one coming from our dataset

In [61]:
mc_steps = 1000000
In [62]:
## sampling spin configurations with monte carlo simulation
s = monte_carlo_simulation(n=len(region_ids_VISam), nsteps=mc_steps, h=h, J=J, configs=True)

## computing the empirical activity with the function defined earlier
empirical_activity_K = activity(region_ids=region_ids_VISam, df_stimulus=df_stimulus)
Step 0 out of 1000000
Step 100000 out of 1000000
Step 200000 out of 1000000
Step 300000 out of 1000000
Step 400000 out of 1000000
Step 500000 out of 1000000
Step 600000 out of 1000000
Step 700000 out of 1000000
Step 800000 out of 1000000
Step 900000 out of 1000000
In [76]:
pickle.dump(s, open('/Users/isottamagistrali/Library/Mobile Documents/com~apple~CloudDocs/Bocconi/Semestre 3.2/Neurosciencee/data_mcs/mcs_VISam_gratings.pkg', 'wb'))

Probability that K out of N neurons are active at the same time:

In [72]:
counts_K,counts_K_emp,bins_K,edges_K = K_outof_N_histogram(s=s, empirical_activity_K=empirical_activity_K)
plot_K_outof_N_histogram(counts_K=counts_K, counts_K_emp=counts_K_emp, bins_K=bins_K, area='VISam', stimulus='gratings')
No description has been provided for this image
In [64]:
prob_K,prob_K_emp = K_outof_N_prob(s=s, counts_K=counts_K, counts_K_emp=counts_K_emp, df_stimulus=df_stimulus)
plot_K_outof_N_prob(prob_K=prob_K, prob_K_emp=prob_K_emp, edges_K=edges_K, area='VISam', stimulus='gratings')
No description has been provided for this image

From what we can see in the plot above, the model is pretty good at capturing the general trend of the empirical probability. As it is often the case with monte carlo simulations, it oversamples more probable configurations giving less importance to the tails of the distribution. Indeed, some probability mass that in the true data lies in the tails, is instead put in the mean value of the distribution. There are methods that account for this issue, like considering a "temperature" in the exponent of the probability distribution (which would correspond to the physical distribution) or many others, so this would be an interesting starting point for further research.

Probability to have a certain energy configuration:

In [73]:
counts_E,counts_E_emp,bins_E,edges_E=E_histogram(s=s, h=h, J=J, empirical_activity_K=empirical_activity_K)
plot_E_histogram(counts_E=counts_E, counts_E_emp=counts_E_emp, bins_E=bins_E, area='VISam', stimulus='gratings')
No description has been provided for this image
In [74]:
prob_E,prob_E_emp=E_prob(s=s, counts_E=counts_E, counts_E_emp=counts_E_emp, df_stimulus=df_stimulus)
plot_E_prob(prob_E=prob_E, prob_E_emp=prob_E_emp, edges_E=edges_E, area='VISam', stimulus='gratings')
No description has been provided for this image

The energy resents a bit more of our approximation, but we believe the trend is captured still pretty well

We can compare our model with a randomly generated model of neurons activation, wihch means a model which matches our empirical mean but has 0 correlation. This is a way to see wether adding the correlation constraint can capture a deeper level of complexity in our biological network

In [77]:
m_real,c_real = empirical_mean_corr(region_ids=region_ids_VISam, df_stimulus=df_stimulus)
In [78]:
proba = (m_real + 1)/2
In [79]:
sample = np.zeros((len(region_ids_VISam),mc_steps//1000))
for neuron in range(len(region_ids_VISam)):
    sample[neuron] = np.random.choice([0,1], mc_steps//1000, p = [1-proba[neuron], proba[neuron]])
sample = sample.T
In [80]:
s_prime = dict()
s_prime['spin_configs'] = sample
In [81]:
counts_K_prime,counts_K_emp_prime,bins_K_prime,edges_K_prime = K_outof_N_histogram(s_prime, empirical_activity_K)
In [82]:
fig,ax = plt.subplots(1,3,figsize=(15,5))
ax[0].stairs(counts_K, bins_K, color='lightblue')
ax[1].stairs(counts_K_emp, bins_K, color='pink')
ax[2].stairs(counts_K_prime, bins_K, color='green')
for x in [0,1,2]:
    ax[x].set_xlabel('K')
    ax[x].set_ylabel('# configurations with K active neruons')
ax[0].set_title('Model')
ax[1].set_title('Data')
ax[2].set_title('Random model')
#plt.grid(True)
#plt.legend()
plt.suptitle(f'VISam, gratings')
plt.show()
No description has been provided for this image
In [83]:
s_prime['effective_steps'] = mc_steps//1000
In [84]:
prob_K_prime,prob_K_emp_prime = K_outof_N_prob(s_prime, counts_K_prime, counts_K_emp, df_stimulus)
In [85]:
plt.figure()
plt.plot(edges_K,prob_K,color='lightblue',label='model')
plt.plot(edges_K,prob_K_emp,color='pink',label='data')
plt.plot(edges_K,prob_K_prime,color='green',label='random model')
plt.xlabel('K simultaneous active neurons')
plt.ylabel('P(K)')
plt.suptitle(f'VISam, gratings')
plt.legend()
plt.show()
No description has been provided for this image

Our model seems to perform better that a ranom model for two main reasons:

  1. it is less peaked around te mean and it is instead able to distribute some probability mass also in the tails (even though not in a perfect way)
  2. it captures the shape of the empirical distribution in a more reliable way with respect to the random and uncorrelated one

One thing that can be deduced from this is that, indeed, our model is able to account for some structure of the underlying network and the correlation measure seems to be informative for this matter

3.2 VISam + VISal -> VISam, static gratings¶

We now try to include neurons of different areas to infer the activity of VISam. We start by considering an area that is closer to VISam and then move on to a further one, as explained in the introduction

We would like to understand whether this could give more information improving our results or, instead, worsen our analysis because our assumptions get weaker (i.e. the correlation is less strong when cnsidering neurons coming from differnet areas in the brain)

In [86]:
region_ids_VISal =  oursession.units[(oursession.units["ecephys_structure_acronym"] == 'VISal')].index.values
region_ids_visam_visal = np.array(list(region_ids_VISam)+list(region_ids_VISal))
# region_ids_visam_visal.shape, region_ids_visam_visal_old.shape
In [87]:
h2,J2 = calculate_h_J(region_ids_visam_visal, df_stimulus, 0.75)
In [106]:
## sampling spin configurations with monte carlo simulation
s2 = monte_carlo_simulation(len(region_ids_visam_visal), mc_steps, h2, J2,configs=True) 
pickle.dump(s2, open('/Users/isottamagistrali/Library/Mobile Documents/com~apple~CloudDocs/Bocconi/Semestre 3.2/Neurosciencee/data_mcs/mcs_VISam_VISal_gratings.pkg', 'wb'))
s_VISam = dict()
s_VISam['spin_configs'] = [x[:len(region_ids_VISam)] for x in s2['spin_configs']]
s_VISam['effective_steps'] = s2['effective_steps']
## computing the empirical activity with the function defined earlier
#empirical_activity_K = activity(region_ids_VISam,flashes)
Step 0 out of 1000000
Step 100000 out of 1000000
Step 200000 out of 1000000
Step 300000 out of 1000000
Step 400000 out of 1000000
Step 500000 out of 1000000
Step 600000 out of 1000000
Step 700000 out of 1000000
Step 800000 out of 1000000
Step 900000 out of 1000000

Probability that K out of N neurons are active at the same time:

In [107]:
counts_K2,counts_K_emp2,bins_K2,edges_K2 = K_outof_N_histogram(s_VISam, empirical_activity_K)
In [108]:
fig,ax = plt.subplots(1,3,figsize=(15,5))
ax[0].stairs(counts_K, bins_K, color='lightblue')
ax[1].stairs(counts_K_emp, bins_K, color='pink')
ax[2].stairs(counts_K2, bins_K, color='purple')
for x in [0,1,2]:
    ax[x].set_xlabel('K')
    ax[x].set_ylabel('# configurations with K active neruons')
ax[0].set_title('Model')
ax[1].set_title('Data')
ax[2].set_title('Double model (VISal)')
#plt.grid(True)
#plt.legend()
plt.suptitle(f'VISam, gratings')
plt.show()
No description has been provided for this image
In [109]:
prob_K2,prob_K_emp2 = K_outof_N_prob(s_VISam, counts_K2, counts_K_emp, df_stimulus)
In [110]:
plt.figure()
plt.plot(edges_K,prob_K,color='lightblue',label='model')
plt.plot(edges_K,prob_K_emp,color='pink',label='data')
plt.plot(edges_K,prob_K2,color='purple',label='double model (VISal)')
plt.xlabel('K simultaneous active neurons')
plt.ylabel('P(K)')
plt.suptitle(f'VISam, gratings')
plt.legend()
plt.show()
No description has been provided for this image

Probability to have a certain energy configuration:

In [111]:
configs = s_VISam['spin_configs']

energies = []
for config in configs:
    energies.append(energy_configuration(config,h2[:len(region_ids_VISam)],J2[:len(region_ids_VISam),:len(region_ids_VISam)]))

counts_E2, edges_E2 = np.histogram(energies, bins=bins_E)
edges_E2 = edges_E2[:-1]
In [112]:
fig,ax = plt.subplots(1,3,figsize=(15,5))
ax[0].stairs(counts_E, bins_E, color='lightblue')
ax[1].stairs(counts_E_emp, bins_E, color='pink')
ax[2].stairs(counts_E2, bins_E, color='purple')
for x in [0,1,2]:
    ax[x].set_xlabel('E')
    ax[x].set_ylabel('# configurations with energy E')
ax[0].set_title('Model')
ax[1].set_title('Data')
ax[2].set_title('Double model VISal')
plt.suptitle(f'VISam, gratings')
plt.show()
No description has been provided for this image
In [113]:
prob_E2,prob_E_emp2=E_prob(s_VISam, counts_E2, counts_E_emp, df_stimulus)
In [115]:
plt.figure()
plt.plot(edges_E,prob_E,color='lightblue',label='model')
plt.plot(edges_E,prob_E_emp,color='pink',label='data')
plt.plot(edges_E,prob_E2,color='purple',label='double model (VISal)')
plt.xlabel('Energy E')
plt.ylabel('P(E)')
plt.legend()
plt.title(f'VISam, gratings')
plt.show()
No description has been provided for this image

Whether we don't see a big difference in the probability of having a number of active neurons, the analysis on the energy configuration is way worsened here with respect to when we were considering one area alone. In this case, indeed, we tend to have more configurations concentrated around higher values of energy

3.3 VISam + VISp -> VISam, static gratings¶

In [116]:
## trying to include area further to each other

region_ids_VISp =  oursession.units[(oursession.units["ecephys_structure_acronym"] == 'VISp')].index.values
region_ids_visam_visp = np.array(list(region_ids_VISam)+list(region_ids_VISp))
In [117]:
h2,J2 = calculate_h_J(region_ids_visam_visp, df_stimulus, 0.75)
In [128]:
## sampling spin configurations with monte carlo simulation
s2 = monte_carlo_simulation(len(region_ids_visam_visp),mc_steps,h2,J2,configs=True) 
pickle.dump(s2, open('/Users/isottamagistrali/Library/Mobile Documents/com~apple~CloudDocs/Bocconi/Semestre 3.2/Neurosciencee/data_mcs/mcs_VISam_VISp_gratings.pkg', 'wb'))
s_VISam = dict()
s_VISam['spin_configs'] = [x[:len(region_ids_VISam)] for x in s2['spin_configs']]
s_VISam['effective_steps'] = s2['effective_steps']
Step 0 out of 1000000
Step 100000 out of 1000000
Step 200000 out of 1000000
Step 300000 out of 1000000
Step 400000 out of 1000000
Step 500000 out of 1000000
Step 600000 out of 1000000
Step 700000 out of 1000000
Step 800000 out of 1000000
Step 900000 out of 1000000

Probability that K out of N neurons are active at the same time:

In [129]:
counts_K2,counts_K_emp2,bins_K2,edges_K2 = K_outof_N_histogram(s_VISam, empirical_activity_K)
In [130]:
fig,ax = plt.subplots(1,3,figsize=(15,5))
ax[0].stairs(counts_K, bins_K, color='lightblue')
ax[1].stairs(counts_K_emp, bins_K, color='pink')
ax[2].stairs(counts_K2, bins_K, color='purple')
for x in [0,1,2]:
    ax[x].set_xlabel('K')
    ax[x].set_ylabel('# configurations with K active neruons')
ax[0].set_title('Model')
ax[1].set_title('Data')
ax[2].set_title('Double model (VISp)')
#plt.grid(True)
#plt.legend()
plt.suptitle(f'VISam, gratings')
plt.show()
No description has been provided for this image
In [131]:
prob_K2,prob_K_emp2 = K_outof_N_prob(s_VISam,counts_K2,counts_K_emp,df_stimulus)
#plot_K_outof_N_prob(prob_K,prob_K_emp,edges_K,'VISam with VISal','flashes')
In [132]:
plt.figure()
plt.plot(edges_K,prob_K,color='lightblue',label='model')
plt.plot(edges_K,prob_K_emp,color='pink',label='data')
plt.plot(edges_K,prob_K2,color='purple',label='double model (VISp)')
plt.xlabel('K simultaneous active neurons')
plt.ylabel('P(K)')
plt.suptitle(f'VISam, gratings')
plt.legend()
plt.show()
No description has been provided for this image

We can already see that we don't have any improvement in our analysis.

Probability to have a certain energy configuration:

In [133]:
configs = s_VISam['spin_configs']

energies = []
for config in configs:
    energies.append(energy_configuration(config,h2[:len(region_ids_VISam)],J2[:len(region_ids_VISam),:len(region_ids_VISam)]))

counts_E2, edges_E2 = np.histogram(energies, bins=bins_E)
edges_E2 = edges_E2[:-1]
In [134]:
fig,ax = plt.subplots(1,3,figsize=(15,5))
ax[0].stairs(counts_E, bins_E, color='lightblue')
ax[1].stairs(counts_E_emp, bins_E, color='pink')
ax[2].stairs(counts_E2, bins_E, color='purple')
for x in [0,1,2]:
    ax[x].set_xlabel('E')
    ax[x].set_ylabel('# configurations with energy E')
ax[0].set_title('Model')
ax[1].set_title('Data')
ax[2].set_title('Double model (VISp)')
plt.suptitle(f'VISam, gratings')
plt.show()
No description has been provided for this image
In [135]:
prob_E2,prob_E_emp2=E_prob(s_VISam, counts_E2, counts_E_emp, df_stimulus)
In [136]:
plt.figure()
plt.plot(edges_E,prob_E,color='lightblue',label='model')
plt.plot(edges_E,prob_E_emp,color='pink',label='data')
plt.plot(edges_E,prob_E2,color='purple',label='double model (VISp)')
plt.xlabel('Energy E')
plt.ylabel('P(E)')
plt.legend()
plt.title(f'VISam, gratings')
plt.show()
No description has been provided for this image

The energy, again, is shifted towards the right resulting in a worse estimation.

Even though one might expect some improvement when including more areas in the analysis having more information, instead, the results are a bit worsened. We found the following possible reasons:

  • Neuronal activity is highly noisy
  • To infer the parameters of the probability distribution, we used an approximation method which disregards second order fluctuations; this clearly looses some information about the second moment of the distribution which might instead turn out to be important
  • The most informative point is that, since the method performs fairly just in some of the settings considered, it means that, when that is the case, the constraints initially given are sufficient to capture the complexity of the biological network we are considering and so the correlation would be a strong measure; instead, when that is not the case, correlation might not be enough and there might be some underlying dynamics or properties that we are not capturing.

Two final important takeaways are that, first of all, with this model, we have much more information having inferred a probability distribution over all the possible configurations of neural activity in VISam. Moreover, we have a further confirmation of the fact that when restricting ourselves to a single area the correlation becomes stronger, meaning that interactions are stronger when considered locally and they enable us to infer better models on the real patterns of activity.

This analysis could be repeated considering other combinations of area-stimulus, which we do not perform here to avoid redundancy and because of time constraints. Finally, an interesting further analysis could be performed considering a brain area which is intermediate in the hierarchy of the visual path, and see whether results change when considering neurons in earlier or subsequent areas.

We now move on to the use of machine learning models, which might be able to find more hidden patterns and relations.

4. Predictions¶

The ultimate goal of our analysis is to predict neural activity. Take a group of neurons, call it A, to predict the firing rate of a neuron b. Then, the more we are able to predict the activity of neuron b, the more the firing rate of neuron b is correlated with the firing rates of neurons in group A. Througout this analysis, we will define the firing rate of a neuron during a stimulus in the same way we defined it in the analysis of correlation, that is, the firing rate is equal to the number of spikes emitted during the stimulus divided bt the length of the stimulus. As a metric, we will mainly consider explained variance, which tells how much of the total variance is “explained” by each component. Therefore, when performing a regression, the higher the explained variance, the more we are able to say about neuron b. We will proceed in the following way: We start by considering rates during static gratings. First we try some different types of regression on the same neuron, in order to see which one is the better for our analysis. For brevity reasons, we do not include this part on the report: summaryzing, we tried with a linear regressor, random forest regressor, support vector machine regressor , polynomial features + sgd regressor and with a multilayer perceptron, tuning hyperparameters with grid searches in every case. Even though with more complex models we obtain slightly better results than a linear regression (in terms of MSE and Explained-variance) in the test set (while they have much better results in the training set, obviously), this small increase in performance is not so relevant and it is not enough to use a more complex model than the linear one. Once estabilished that a linear regression is enough for our scope, we:

  1. predict the rate of a neuron in an Area given all the other neurons in the same Area, for all Area
  2. predict the rate of a neuron in an area Area1 given all the neurons in another area Area2, for all Area1,Area2, Area1!= Area2
  3. predict the rate of every neurons given all the other neurons (in every area)

Then we try to repeat the same with not only gratings but all the stimuli presented in the session, in order to see how much our measures were biased by the fact that we were considering only one type of stimulus

Disclaimer: for the sake of brevity, we will sometimes refer to 'predictive area' and 'predicted area'. However, one needs to be careful as we are not trying to infer the activity of a global area, rather the the activity of a single neuron within that area. Since the prediction is repeated for all neurons within that area and mean values are computed, this generates an 'abuse of notation'.

Predictions of static gratings¶

1. Area1 given Area1¶

We load the data, using the function get_rates defined before.

In [ ]:
areas = {'APN': 25,   #0
 'CA1': 134,          #1
 'CA2': 2,
   'CA3': 21,
   'DG': 31,
   'Eth': 1,          # 5
   'LGv': 16,
   'LP': 65,
   'MB': 1,
   'NOT': 16,
  'ProS': 9,          # 10
  'SUB': 59,
  'TH': 2,
  'VISal': 89,
  'VISam': 135,
  'VISl': 78,         # 15
  'VISp': 94,
  'VISrl': 47}
In [ ]:
data_areas = {}
for area in areas:
        data_areas[area] = get_rates(['static_gratings'], {area : areas[area]} , dir = '/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/')

We define a function in order to divide between training and test data

In [ ]:
def shuffle_and_split_data(data, test_ratio):
    shuffled_indices = np.random.permutation(len(data))
    test_set_size = int(len(data) * test_ratio)
    test_indices = shuffled_indices[:test_set_size]
    train_indices = shuffled_indices[test_set_size:]
    return data.iloc[train_indices], data.iloc[test_indices]

As model we choose a linear model with a standard scaler

In [ ]:
linear_nopca = Pipeline([
    ("scaler", StandardScaler()),
    ("linear_reg", LinearRegression())
])

We define a function that fit and predicts the firing rates of a neuron given all the others, and does it for all the neurons considered. It returns the array of explained variances in the test set, the array of explained variances in the training set, and the same for the mean squared error.

In [ ]:
def prediction_over_all_units(units, model, training_data, test_data):
    n = len(units.index)
    expl_var_training = np.zeros(n)
    expl_var_test = np.zeros(n)
    mse_training = np.zeros(n)
    mse_test = np.zeros(n)
    area = ["" for i in range(n)]
    for neuron in range(n):
        x_training_data = training_data.loc[:, training_data.columns != neuron]
        y_training_data = training_data.iloc[:, neuron]
        x_test_data = test_data.loc[:, test_data.columns != neuron]
        y_test_data = test_data.iloc[:, neuron]
        model.fit(x_training_data, y_training_data)
        predictions_test = model.predict(x_test_data)       #predictions on test set
        predictions_training = model.predict(x_training_data)  #predistions on train set  
        expl_var_training[neuron] = explained_variance_score(y_training_data, predictions_training)  #explained_variance score of training set
        expl_var_test[neuron] = explained_variance_score(y_test_data, predictions_test)         #explained_variance score of training set
        mse_training[neuron] = mean_squared_error(y_training_data, predictions_training)
        mse_test[neuron] = mean_squared_error(y_test_data, predictions_test)
        area[neuron] = units.ecephys_structure_acronym.iloc[neuron]
    return expl_var_test, expl_var_training, mse_test, mse_training, area

For every area (if the number of neurons in that area is greater than 1), we split between training and test set and we apply the fuction prediction_over_all_units, so that:

for every area:

for every unit b in that area:

fit the regressor with the firing rates of the other neurons in the area and predict the rates of b

In [ ]:
dati = {}
for area in areas:
    if len(oursession.units[oursession.units["ecephys_structure_acronym"]==area]) > 1:
        training_data, test_data = shuffle_and_split_data(pd.DataFrame(data_areas[area]),0.2 ) #shuffling and splitting the dataset for each area
        expl_var_test, expl_var_training, mse_test, mse_training, zone = prediction_over_all_units(oursession.units[oursession.units["ecephys_structure_acronym"]== area], linear_nopca, training_data, test_data)
        dati[area] = {}
        dati[area]["expl_var_test"] = expl_var_test
        dati[area]["expl_var_training"] = expl_var_training
        dati[area]["mse_test"] = mse_test
        dati[area]["mse_training"] = mse_training
with open(f"/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/single.pkl", "wb") as tf:
    pickle.dump(dati, tf)

just to avoid re-running, to load the data:

In [ ]:
with open(f'../one_given_the_other/single.pkl', 'rb') as handle:
    dati = pickle.load(handle)

We define a function to print our scores (min, max, mean) across areas

In [ ]:
def analyze_single(name, expl_var_test, expl_var_training, mse_test, mse_training):
    print(name)
    print("\tExplained variance")
    print(f"\t\ttraining\tMax: {round(expl_var_training.max(), 5)}\tMin: {round(expl_var_training.min(), 5)}\tMean: {round(expl_var_training.mean(), 5)}")
    print(f"\t\ttest\t\tMax: {round(expl_var_test.max(), 5)}\tMin: {round(expl_var_test.min(), 5)}\tMean: {round(expl_var_test.mean(), 5)}")
    print("\tMSE")
    print(f"\t\ttraining\tMax: {round(mse_training.max(), 5)}\tMin: {round(mse_training.min(), 5)}\tMean: {round(mse_training.mean(), 5)}")
    print(f"\t\ttest\t\tMax: {round(mse_test.max(), 5)}\tMin: {round(mse_test.min(), 5)}\tMean: {round(mse_test.mean(), 5)}")
 
In [ ]:
for area in areas:
    if areas[area] > 1:
        analyze_single(area, dati[area]["expl_var_test"], dati[area]["expl_var_training"], dati[area]["mse_test"], dati[area]["mse_training"] )
APN
	Explained variance
		training	Max: 0.65161	Min: 0.0222	Mean: 0.26865
		test		Max: 0.64339	Min: 0.00761	Mean: 0.26669
	MSE
		training	Max: 219.24482	Min: 2.4217	Mean: 47.45593
		test		Max: 229.04317	Min: 2.35715	Mean: 48.95882
CA1
	Explained variance
		training	Max: 0.83551	Min: 0.03593	Mean: 0.339
		test		Max: 0.83414	Min: -0.0167	Mean: 0.29516
	MSE
		training	Max: 212.47617	Min: 0.87598	Mean: 38.58232
		test		Max: 236.06089	Min: 0.82168	Mean: 40.43761
CA2
	Explained variance
		training	Max: 0.04383	Min: 0.04383	Mean: 0.04383
		test		Max: 0.03633	Min: 0.03577	Mean: 0.03605
	MSE
		training	Max: 55.34544	Min: 22.63183	Mean: 38.98864
		test		Max: 54.03096	Min: 23.74187	Mean: 38.88642
CA3
	Explained variance
		training	Max: 0.51912	Min: 0.03434	Mean: 0.19914
		test		Max: 0.51399	Min: 0.03996	Mean: 0.18876
	MSE
		training	Max: 421.21742	Min: 3.42764	Mean: 56.3878
		test		Max: 385.30129	Min: 3.41261	Mean: 54.90416
DG
	Explained variance
		training	Max: 0.67721	Min: 0.01028	Mean: 0.25401
		test		Max: 0.71418	Min: -0.01317	Mean: 0.24319
	MSE
		training	Max: 128.84396	Min: 0.39067	Mean: 32.81834
		test		Max: 117.45095	Min: 0.31462	Mean: 32.71476
LGv
	Explained variance
		training	Max: 0.54915	Min: 0.12649	Mean: 0.30679
		test		Max: 0.60002	Min: 0.00101	Mean: 0.29695
	MSE
		training	Max: 420.28142	Min: 1.13222	Mean: 118.8423
		test		Max: 416.93888	Min: 0.91819	Mean: 119.11035
LP
	Explained variance
		training	Max: 0.77447	Min: 0.02682	Mean: 0.18268
		test		Max: 0.75268	Min: 0.00746	Mean: 0.16073
	MSE
		training	Max: 258.08527	Min: 2.63829	Mean: 70.45518
		test		Max: 264.62387	Min: 2.76267	Mean: 73.06763
NOT
	Explained variance
		training	Max: 0.53905	Min: 0.02094	Mean: 0.28126
		test		Max: 0.58242	Min: 0.01021	Mean: 0.26729
	MSE
		training	Max: 350.97802	Min: 0.85883	Mean: 69.68963
		test		Max: 372.39397	Min: 1.13802	Mean: 70.32973
ProS
	Explained variance
		training	Max: 0.25405	Min: 0.01408	Mean: 0.12659
		test		Max: 0.25936	Min: 0.00984	Mean: 0.12506
	MSE
		training	Max: 237.79262	Min: 6.90714	Mean: 53.86658
		test		Max: 222.10879	Min: 6.95017	Mean: 52.30727
SUB
	Explained variance
		training	Max: 0.7363	Min: 0.01891	Mean: 0.27115
		test		Max: 0.70619	Min: -0.00861	Mean: 0.25082
	MSE
		training	Max: 386.15947	Min: 0.5817	Mean: 46.52244
		test		Max: 414.70897	Min: 0.54212	Mean: 47.95195
TH
	Explained variance
		training	Max: 0.07528	Min: 0.07528	Mean: 0.07528
		test		Max: 0.05874	Min: 0.05781	Mean: 0.05828
	MSE
		training	Max: 95.73183	Min: 75.38494	Mean: 85.55839
		test		Max: 101.27358	Min: 84.0766	Mean: 92.67509
VISal
	Explained variance
		training	Max: 0.79992	Min: 0.02338	Mean: 0.33266
		test		Max: 0.81323	Min: -0.0159	Mean: 0.30963
	MSE
		training	Max: 99.65574	Min: 0.09368	Mean: 23.26062
		test		Max: 115.85297	Min: 0.13451	Mean: 24.50839
VISam
	Explained variance
		training	Max: 0.79354	Min: 0.0482	Mean: 0.32836
		test		Max: 0.80018	Min: -0.0348	Mean: 0.28903
	MSE
		training	Max: 102.80685	Min: 0.71906	Mean: 23.71995
		test		Max: 107.96264	Min: 0.79429	Mean: 25.16778
VISl
	Explained variance
		training	Max: 0.67348	Min: 0.03355	Mean: 0.28011
		test		Max: 0.62872	Min: -0.01479	Mean: 0.25432
	MSE
		training	Max: 193.71216	Min: 0.21551	Mean: 24.40096
		test		Max: 199.18708	Min: 0.27845	Mean: 24.69047
VISp
	Explained variance
		training	Max: 0.71641	Min: 0.07248	Mean: 0.35438
		test		Max: 0.73033	Min: 0.0173	Mean: 0.32009
	MSE
		training	Max: 227.5397	Min: 0.79899	Mean: 27.04178
		test		Max: 228.49792	Min: 0.76279	Mean: 29.13979
VISrl
	Explained variance
		training	Max: 0.75699	Min: 0.03595	Mean: 0.23854
		test		Max: 0.76197	Min: -0.02255	Mean: 0.21885
	MSE
		training	Max: 148.15256	Min: 1.7946	Mean: 33.22596
		test		Max: 134.06407	Min: 1.90548	Mean: 33.13624

We note that:

  1. There is no much difference between training set and test set --> we are not overfitting
  2. There is high variability of accuracy across areas
  3. There is high variability of accuracy across neurons of the same area

We will analyze better the data in the next section, predictiong also from neurons of different areas

Area1 given Area2¶

In order to predict, using the units in Area 1, the activity of an unit in Area2, and to do it for all neurons in Area2, we define a new function, similar in spirit to the previous one

In [ ]:
def prediction_over_all_first_units(units, model, training_data, test_data):
    n = len(units.index)
    expl_var_training = np.zeros(n)
    expl_var_test = np.zeros(n)
    mse_training = np.zeros(n)
    mse_test = np.zeros(n)
    area = ["" for i in range(n)]
    for neuron in range(n):
        x_training_data = training_data.loc[:, training_data.columns >= n]
        y_training_data = training_data.iloc[:, neuron]
        x_test_data = test_data.loc[:, training_data.columns >= n]
        y_test_data = test_data.iloc[:, neuron]
        model.fit(x_training_data, y_training_data)
        predictions_test = model.predict(x_test_data)
        predictions_training = model.predict(x_training_data)    
        expl_var_training[neuron] = explained_variance_score(y_training_data, predictions_training)
        expl_var_test[neuron] = explained_variance_score(y_test_data, predictions_test)
        mse_training[neuron] = mean_squared_error(y_training_data, predictions_training)
        mse_test[neuron] = mean_squared_error(y_test_data, predictions_test)
        area[neuron] = units.ecephys_structure_acronym.iloc[neuron]
    return expl_var_test, expl_var_training, mse_test, mse_training, area

We will use the same linear model as before. Similarly, we run the prediction

In [ ]:
for area1 in areas:
    dati = {}
    for area2 in areas:
        training_data, test_data = shuffle_and_split_data(pd.DataFrame(np.concatenate((data_areas[area2],data_areas[area1]), axis = 1)),0.2 )
        expl_var_test, expl_var_training, mse_test, mse_training, zone = prediction_over_all_first_units(oursession.units[oursession.units["ecephys_structure_acronym"]== area], linear_nopca, training_data, test_data)
        dati[area2] = {}
        dati[area2]["expl_var_test"] = expl_var_test
        dati[area2]["expl_var_training"] = expl_var_training
        dati[area2]["mse_test"] = mse_test
        dati[area2]["mse_training"] = mse_training
    with open(f"/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/given_{area1}.pkl", "wb") as tf:
        pickle.dump(dati, tf)

Let's give a look to the mean explained variance for every single combination of areas

We load the data

In [ ]:
expl_variances_mean = np.zeros((len(areas),len(areas)))
for i, area1 in enumerate(areas):
    with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/given_{area1}.pkl', 'rb') as handle:
        b = pickle.load(handle)
        for j, area2 in enumerate(areas):
            if i != j:
                expl_variances_mean[i][j] = np.mean(b[area2]['expl_var_test'])
            elif area2 not in ['Eth', 'MB']:
                with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/single.pkl', 'rb') as handle2:
                    a = pickle.load(handle2)
                    expl_variances_mean[i][j] = np.mean(a[area2]['expl_var_test'])

The mean explained variance is represented through a heat map, which visually allows us to capture the predictive capability of neurons in one area and how weel neurons within an area are predicted by others. On the x-axis, there are neurons from the areas that is predicted, while on the y-axis neurons from the area predicted.

In [ ]:
c =pd.DataFrame(expl_variances_mean, index=list(areas.keys()), columns=list(areas.keys()))
# Calcolo della linkage matrix utilizzando il metodo 'ward'
link = linkage(c, method='ward')

# Ottenimento dell'ordine dei cluster
order = leaves_list(link)

# Riordino delle righe e colonne secondo l'ordine dei cluster
ordered_exp_matrix = c.iloc[order, order]
plt.figure(figsize=(8, 6))
sns.heatmap(ordered_exp_matrix.T, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("Ordered Mean Explained Variance Matrix")
plt.xlabel('Area predicting')
plt.ylabel('Area predicted')
plt.show()
No description has been provided for this image

There are some areas for which it seems to be almost impossible to predict from them (eth, mb, ca2, th), because they have only 1 or 2 neurons each. Those areas are located in the corners of the map and a gray value is associated to them.

MOST IMPORTANTLY, the structure is clear: as seen before, there are 2 main groups: the one composed by the "VIS" areas, and the one composed by the "CA" areas together with DG and SUB

Let's do the same with the max instead of the mean, in order to see outliers.

In [ ]:
expl_variances_max = np.zeros((len(areas),len(areas)))
for i, area1 in enumerate(areas):
    with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/given_{area1}.pkl', 'rb') as handle:
        b = pickle.load(handle)
        for j, area2 in enumerate(areas):
            if i != j:
                expl_variances_max[i][j] = np.max(b[area2]['expl_var_test'])
            elif area2 not in ['Eth', 'MB']:
                with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/single.pkl', 'rb') as handle2:
                    a = pickle.load(handle2)
                    expl_variances_max[i][j] = np.max(a[area2]['expl_var_test'])
In [ ]:
from scipy.cluster.hierarchy import dendrogram, linkage, leaves_list

c =pd.DataFrame(expl_variances_max, index=list(areas.keys()), columns=list(areas.keys()))
# Calcolo della linkage matrix utilizzando il metodo 'ward'
link = linkage(c, method='ward')

# Ottenimento dell'ordine dei cluster
order = leaves_list(link)

# Riordino delle righe e colonne secondo l'ordine dei cluster
ordered_exp_matrix = c.iloc[order, order]
plt.figure(figsize=(8, 6))
sns.heatmap(ordered_exp_matrix.T, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("Ordered Maximum Explained Variance Matrix")
plt.xlabel('Area predicting')
plt.ylabel('Area predicted')
plt.show()
No description has been provided for this image

With the max we are not able to reconstruct the structures! This means that there are "outliers" that is neurons that can be predicted very well from a given area even if in general from that area is difficult to predict neurons in the area of the outlier. ___________

let's try to grasp the distribution of the explained variances. Let's load the explained variance test set.

In [ ]:
areas_def = {
 'CA1': 134,
 'LP': 65,
 'SUB': 59,
 'VISal': 89,
 'VISam': 135,
 'VISl': 78,
 'VISp': 94,
 'VISrl': 47}
In [ ]:
# explained variances  of the test set 
expl_variances = {}
for i, area1 in enumerate(areas):
    expl_variances[area1] = {}
    with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/given_{area1}.pkl', 'rb') as handle:
        b = pickle.load(handle)
        for j, area2 in enumerate(areas):
            if i != j:
                expl_variances[area1][area2] = b[area2]['expl_var_test']
            elif area2 not in ['Eth', 'MB']:
                with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/single.pkl', 'rb') as handle2:
                    a = pickle.load(handle2)
                    expl_variances[area1][area2] = a[area2]['expl_var_test']
In [ ]:
colors_to_area = mcp.gen_color(cmap="Dark2",n=8)
areas_colors = {}
for i, area in enumerate(areas_def.keys()):
    areas_colors[area] = colors_to_area[i]

In order to study how the explained variance scores are distributed and change as one area is predicting another, we have represented two grids: on the x-axis of each subgraph there are the values of the explained variance, while on the y-axis the number of units that count have that value (i.e. the neural population). We have considered only 8 areas, i.e. those which seemed the most relevant to us (and had a greater amount of units). Together with areas predicting/predicted by another, there is one area predicting/predicted by itself. What does it mean? It implies that all neurons EXCEPT one are employed to infer the activity of the remaining neuron.

  1. The title shows the name of the area that is predicting the other 8
  2. The title shows the name of the area that is predicted by the remaining 8
In [ ]:
fig = plt.figure(figsize = (20,10))
for i, area1 in enumerate(areas_def):
    maximum = np.max(expl_variances[area1][max(expl_variances[area1])])
    minimum = np.min(expl_variances[area1][min(expl_variances[area1])])
    ax = fig.add_subplot(2, 4, i + 1)
    for area2 in areas_def:
        ax = sns.distplot(expl_variances[area1][area2], hist = False, color = areas_colors[area2], kde_kws={'clip': (minimum ,maximum)}, ax = ax)
        ax.set_title(f'{area1}')
        ax.set(xlabel=None)
        ax.legend(areas_colors)
plt.suptitle('Explained Variance Score: how much can a unit in an area predict the values of the same or others areas?', fontsize = 20);    
No description has been provided for this image
In [ ]:
fig = plt.figure(figsize = (20,10))
for i, area1 in enumerate(areas_def):
    maximum = np.max(expl_variances[area1][max(expl_variances[area1])])
    minimum = np.min(expl_variances[area1][min(expl_variances[area1])])
    ax = fig.add_subplot(2, 4, i + 1)
    for area2 in areas_def:
        ax = sns.distplot(expl_variances[area2][area1], hist = False, color = areas_colors[area2], kde_kws={'clip': (minimum ,maximum)}, ax = ax)
        ax.set_title(f'{area1}')
        ax.set(xlabel=None)
        ax.legend(areas_colors)
plt.suptitle('Explained Variance Score: how much can neurons from same or other areas predict activity of one unit in area?', fontsize = 20);    
No description has been provided for this image

Grid 1: Areas seem to better predict themselves rather than others. However, there are few exceptions: VISam and VISl demonstrate good scores when it comes to predicting the activity of a neuron in VISal (pink curve). Grid 2: Areas seem to be better predicted by themselves rather than by the others. Only in the case of VISal, VISl seems to predict better. In general, neural activity within the same area is better predicted than the one across brain zones.

Disclaimer: the curves have been produced through a KDE (kernel density estimator), which tends to smooth the data and to produce heavy tails, which are particolarly problematic for bounded domains. As a consequence, the plots have been 'cut' to their minimum and maximum values (even though the areas below the curve does not account for 1). Nevertheless, the qualitative behavior is not affected.

However, it might still be difficult to capture the patterns (as many areas are involved). Let's reproduce the same graphs, restring to cortical regions, (which are the ones exhibiting an interesting behavior).

In [ ]:
areas_cortex = {
 'VISal': 89,
 'VISam': 135,
 'VISl': 78,
 'VISp': 94,
 'VISrl': 47}
In [ ]:
fig = plt.figure(figsize = (20,10))
for i, area1 in enumerate(areas_cortex):
    maximum = np.max(expl_variances[area1][max(expl_variances[area1])])
    minimum = np.min(expl_variances[area1][min(expl_variances[area1])])
    ax = fig.add_subplot(2, 5, i + 1)
    for area2 in areas_cortex:
        ax = sns.distplot(expl_variances[area1][area2], hist = False, color = areas_colors_cort[area2], kde_kws={'clip': (minimum ,maximum)}, ax = ax)
        ax.set_title(f'{area1}')
        ax.set(xlabel=None)
        ax.legend(areas_colors_cort)
plt.suptitle('Explained Variance Score (visual cortex): how much can a unit in an area predict the values of the same or others areas?', fontsize = 20);    
No description has been provided for this image
In [ ]:
fig = plt.figure(figsize = (20,10))
for i, area1 in enumerate(areas_cortex):
    maximum = np.max(expl_variances[area1][max(expl_variances[area1])])
    minimum = np.min(expl_variances[area1][min(expl_variances[area1])])
    ax = fig.add_subplot(2, 5, i + 1)
    for area2 in areas_cortex:
        ax = sns.distplot(expl_variances[area2][area1], hist = False, color = areas_colors_cort[area2], kde_kws={'clip': (minimum ,maximum)}, ax = ax)
        ax.set_title(f'{area1}')
        ax.set(xlabel=None)
        ax.legend(areas_colors_cort)
plt.suptitle('Explained Variance Score (visual cortex): how much can neurons from same or other areas predict activity of one unit in area?', fontsize = 20);    
No description has been provided for this image

Grid 1: As could have imagined from the above analysis, VISam and VISl predict 'better' VISal than themselves. Grid 2: VISal is better predicted by VISl and VISam rather than itself. The reason why these areas behave in such manner could not only be due to the amount of units at disposal, but also by the hierarchical structure and proximity of these regions.

Finally, here comes the 'ambitious' task, i.e. predicting the activity of ONE unit given ALL others (from all areas). Because of the large number of units at disposal (i.e. 825), in order to speed up computations and extract the best out of the model, we have performed dimensionality reduction (i.e. PCA) by choosing the first 100 components (i.e. those carrying the highest variance). What could one expect? Since now we have rates from several units, we could expect the activity of a single neuron to be better predicted. However, we should also take into account that there are areas which are way less relevant in terms of predictive power, as they count one or two units.

In [ ]:
linear = Pipeline([
    ("scaler", StandardScaler()),
    ('PCA', PCA(n_components = 100)),
    ("linear_reg", LinearRegression())
])
In [ ]:
training_data, test_data = shuffle_and_split_data(data, 0.2)
In [ ]:
expl_var_test, expl_var_training, mse_test, mse_training = prediction_over_all_units_new( linear, training_data, test_data, mappa, areas)
APN
CA1
CA2
CA3
DG
Eth
LGv
LP
MB
NOT
ProS
SUB
TH
VISal
VISam
VISl
VISp
VISrl
In [ ]:
dati = {}
dati["expl_var_training"] = expl_var_training
dati["expl_var_test"] = expl_var_test
dati["mse_training"] = mse_training
dati["mse_test"] = mse_test
with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/all.pkl', "wb") as tf:
        pickle.dump(dati, tf)

In the same manner done above, the graph illustrates how the explained variance score is distributed. For each subgraph, two curves have been defined: one shows the values obtained when all neurons are predicting a single one (All -> Area 1), while the blue one when units from the same area predict the 'left out' one. The reason why the two curves have been jointly represented was to capture imporvements in the explained variance score.

In [ ]:
fig = plt.figure(figsize = (20,10))
for i, area1 in enumerate(areas_def):
    maximum = np.max(expl_variances[area1][max(expl_variances[area1])])
    minimum = np.min(expl_variances[area1][min(expl_variances[area1])])
    ax = fig.add_subplot(2, 4, i + 1)
    ax = sns.distplot(dati['expl_var_test'][area1], hist = False,  kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
    ax = sns.distplot(expl_variances[area1][area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
    ax.set_title(f'{area1}')
    ax.set(xlabel=None)
    ax.legend([ 'All -> Area 1', 'Area1 -> Area 1']);
    
plt.suptitle('Explained Variance Score: the activity of a neuron from one area is predicted from all neurons in an area (same or different)', fontsize = 20);    
No description has been provided for this image

As a matter of fact, a slight imporvement seems to have occurred when the totality of units are exploited (purple). In particular, cortical regions are the ones exhibiting greater scores.

How could the results of the regressions be interpreted? Indeed, some reassuring results have been discovered, since there is an overall satisfying predictive capability within areas. Furthermore, there are regions which seem to play a determinant role in the prediction of single neural activity of different zones. Such behavior might be linked to the existence of positive (or negative) correlations between firing rates (either within or across brain areas).

All the stimuli¶

We can even predict from all the types of stimuli, and not only static gratings. The result is the following (we omit the fitting for brevity)

In [ ]:
with open(f'../one_given_the_other/all.pkl', "rb") as tf:
        dati = pickle.load(tf)
In [ ]:
with open(f'../Risultati_Luca/linear_whole/expl_var_test.pkl', "rb") as tf:
        expl_variances = pickle.load(tf)
In [ ]:
fig = plt.figure(figsize = (20,10))
for i, area1 in enumerate(areas_def):
    ax = fig.add_subplot(2, 4, i + 1)
    ax = sns.distplot(dati['expl_var_test'][area1], hist = False,  kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
    ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
    ax.set_title(f'{area1}')
    ax.set(xlabel=None)
    ax.legend([ 'Static Gratings', 'All the stimuli']);
    
plt.suptitle('Explained Variance Score: difference between considering only static gratings and all types of stimuli', fontsize = 20);    
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(dati['expl_var_test'][area1], hist = False,  kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(dati['expl_var_test'][area1], hist = False,  kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(dati['expl_var_test'][area1], hist = False,  kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(dati['expl_var_test'][area1], hist = False,  kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(dati['expl_var_test'][area1], hist = False,  kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(dati['expl_var_test'][area1], hist = False,  kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(dati['expl_var_test'][area1], hist = False,  kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(dati['expl_var_test'][area1], hist = False,  kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning: 

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

  ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
No description has been provided for this image

The performces have a dramatic drop. The areas which has mantained the best score is CA1

Limitations¶

Is it enough to confirm the existence of neural connections? In general, correlation might not always be a good measure to grasp such functioning and one should be careful to interpret such result. For instance, there might existence brain regions which exhibit a correlation between spikes emitted by neurons, but the cells are not connected. Furthermore, studying correlations might sometimes be inconvenient, as a large number of data from simoultaneously recorded neurons and are subjects to fluctuations (also due to trial-to-trail variability). Reference : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3586814/

Conclusion¶

Our work was motivated by an interest in understanding whether some correlation in neuronal activity was present, whether this measure was enough to capture some relevant information of the underlying dynamics or properties of the biological network and then, reassured by the proven presence of local interactions, whether it was possible to predict the activity of one neuron given the activity of others (selected in various ways). The main takeaways of our analysis are that there seems to be a higher correlation of single areas with themselves and that areas in the visual cortex seem to be more correlated with each other with respect to other brain areas. Furthermore, we can see that some areas are more able to predict others (more specific conclusions are better explained in each particular section of our research).

Lastly, we will briefly mention some general limitation and possible venues for further research, which can be applied to our work as a whole, but more comments and indications regarding specific types of analysis can be found while reading through this notebook.

Clearly one of the main limitations of our work is the scarcity of data available, so surely one could try to merge different sessions to extract more precise and meaningful conclusions. Furthermore, one could repeat the same work with different combinations of brain areas - stimuli to compare results.