Correlation and inference within and across brain areas¶
Our project aims at inferring or predicting the neuronal activity and thoroughly studying the strength of the correlation measure. We chose diferent models and methods to perform these analysis and understand if and how these results change on the basis of the structure of brain areas.
0. Importing libraries¶
from allensdk.brain_observatory.ecephys.ecephys_project_cache import EcephysProjectCache
import os
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import classification_report
import seaborn as sns
import matplotlib.pyplot as plt
from mycolorpy import colorlist as mcp
import pickle
import plotly.express as px
import math
from matplotlib.patches import Patch
/Users/isottamagistrali/opt/anaconda3/envs/envallen/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
1. The data¶
1.1 Loading the data¶
The first step is to load the data from the Allen Brain Observatory database.
# Change this path to the location of the data in your computer
data_dir = "/Users/isottamagistrali/Library/Mobile Documents/com~apple~CloudDocs/Bocconi/Semestre 3.2/Neurosciencee/allendata"
# Loading the metadata contained in the cache which refers to all experiments
manifest_path = os.path.join(data_dir, "manifest.json")
cache = EcephysProjectCache.from_warehouse(manifest=manifest_path)
The available data is divided in sessions. Each session corresponds to an experiment conducted on a different individual. Each session has some information regarding the type of experiment done and informations about the specimen as can be seen in the following table.
# Showing the first five entries of the session table
cache.get_session_table().head()
| published_at | specimen_id | session_type | age_in_days | sex | full_genotype | unit_count | channel_count | probe_count | ecephys_structure_acronyms | |
|---|---|---|---|---|---|---|---|---|---|---|
| id | ||||||||||
| 715093703 | 2019-10-03T00:00:00Z | 699733581 | brain_observatory_1.1 | 118.0 | M | Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt | 884 | 2219 | 6 | [CA1, VISrl, nan, PO, LP, LGd, CA3, DG, VISl, ... |
| 719161530 | 2019-10-03T00:00:00Z | 703279284 | brain_observatory_1.1 | 122.0 | M | Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt | 755 | 2214 | 6 | [TH, Eth, APN, POL, LP, DG, CA1, VISpm, nan, N... |
| 721123822 | 2019-10-03T00:00:00Z | 707296982 | brain_observatory_1.1 | 125.0 | M | Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt | 444 | 2229 | 6 | [MB, SCig, PPT, NOT, DG, CA1, VISam, nan, LP, ... |
| 732592105 | 2019-10-03T00:00:00Z | 717038288 | brain_observatory_1.1 | 100.0 | M | wt/wt | 824 | 1847 | 5 | [grey, VISpm, nan, VISp, VISl, VISal, VISrl] |
| 737581020 | 2019-10-03T00:00:00Z | 718643567 | brain_observatory_1.1 | 108.0 | M | wt/wt | 568 | 2218 | 6 | [grey, VISmma, nan, VISpm, VISp, VISl, VISrl] |
There is a lot more material that could be extracted from the metadata contained inside the cache besides the list of all sessions, but given the vastness of the available information, going deeper inside the structure of the data of all sessions combined is beyond the scope of our research.
For our research we are going to focus only on the data for one specific session, although the work that will follow could be extended by including data from multiple sessions. We will discuss at the end our ideas for possible further work.
# Loading data for our chosen session
session_id = 798911424
oursession = cache.get_session_data(session_id, timeout=3000)
/Users/isottamagistrali/opt/anaconda3/envs/envallen/lib/python3.11/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /Users/isottamagistrali/opt/anaconda3/envs/envallen/lib/python3.11/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.2 because version 2.6.0-alpha is already loaded. return func(args[0], **pargs)
1.2 Available data¶
Even for a single session there are a lot of data at our disposal, which can be summarized in the following categories:
- Metadata
- Probes
- Channels
- Units
- Stimuli
- Spikes
Metadata: general information regarding the session and the individual taking part in the experiment
oursession.metadata
{'specimen_name': 'Vip-IRES-Cre;Ai32-421338',
'session_type': 'brain_observatory_1.1',
'full_genotype': 'Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt',
'sex': 'F',
'age_in_days': 110.0,
'rig_equipment_name': 'NP.1',
'num_units': 825,
'num_channels': 2233,
'num_probes': 6,
'num_stimulus_presentations': 70931,
'session_start_time': datetime.datetime(2018, 12, 21, 0, 2, 57, tzinfo=tzoffset(None, -28800)),
'ecephys_session_id': 798911424,
'structure_acronyms': ['LP',
'DG',
'CA1',
'VISam',
nan,
'APN',
'TH',
'Eth',
'CA3',
'VISrl',
'HPF',
'ProS',
'SUB',
'VISp',
'CA2',
'VISl',
'MB',
'NOT',
'LGv',
'VISal'],
'stimulus_names': ['spontaneous',
'gabors',
'flashes',
'drifting_gratings',
'natural_movie_three',
'natural_movie_one',
'static_gratings',
'natural_scenes',
'drifting_gratings_contrast']}
Here we can see some information that will be useful later such as the acronymns for the structures of the brain and the list of all stymulus that have been presented during the experiment. We show them again separately for more clarity.
oursession.structure_acronyms
['LP', 'DG', 'CA1', 'VISam', nan, 'APN', 'TH', 'Eth', 'CA3', 'VISrl', 'HPF', 'ProS', 'SUB', 'VISp', 'CA2', 'VISl', 'MB', 'NOT', 'LGv', 'VISal']
oursession.stimulus_names
['spontaneous', 'gabors', 'flashes', 'drifting_gratings', 'natural_movie_three', 'natural_movie_one', 'static_gratings', 'natural_scenes', 'drifting_gratings_contrast']
Spontaneous refers to a period in which there is nothing shown on the screen. We can also see that in this particular experiment the 'Dot motion' stimulus is not present.

Probes and Channels: information regarding the phisical instruments used to collect the data, which consist of probes with numerous channels each. We are going to ignore this data in our research, but we show it for completeness since it might be used in possible extensions.
print('Number of probes:', len(oursession.probes))
oursession.probes.head()
Number of probes: 6
| description | location | sampling_rate | lfp_sampling_rate | has_lfp_data | |
|---|---|---|---|---|---|
| id | |||||
| 800036196 | probeA | See electrode locations | 29999.965974 | 1249.998582 | True |
| 800036198 | probeB | See electrode locations | 29999.917201 | 1249.996550 | True |
| 800036200 | probeC | See electrode locations | 29999.996048 | 1249.999835 | True |
| 800036202 | probeD | See electrode locations | 29999.919900 | 1249.996662 | True |
| 800036204 | probeE | See electrode locations | 29999.999422 | 1249.999976 | True |
print('Number of channels:', len(oursession.channels))
oursession.channels.head()
Number of channels: 2233
| filtering | probe_channel_number | probe_horizontal_position | probe_id | probe_vertical_position | structure_acronym | ecephys_structure_id | ecephys_structure_acronym | anterior_posterior_ccf_coordinate | dorsal_ventral_ccf_coordinate | left_right_ccf_coordinate | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||
| 849854220 | AP band: 500 Hz high-pass; LFP band: 1000 Hz l... | 4 | 43 | 800036198 | 60 | LP | 218.0 | LP | 7647.0 | 3029.0 | 7470.0 |
| 849854224 | AP band: 500 Hz high-pass; LFP band: 1000 Hz l... | 6 | 59 | 800036198 | 80 | LP | 218.0 | LP | 7650.0 | 3011.0 | 7469.0 |
| 849854226 | AP band: 500 Hz high-pass; LFP band: 1000 Hz l... | 7 | 27 | 800036198 | 80 | LP | 218.0 | LP | 7651.0 | 3002.0 | 7469.0 |
| 849854230 | AP band: 500 Hz high-pass; LFP band: 1000 Hz l... | 9 | 11 | 800036198 | 100 | LP | 218.0 | LP | 7654.0 | 2985.0 | 7468.0 |
| 849854234 | AP band: 500 Hz high-pass; LFP band: 1000 Hz l... | 11 | 27 | 800036198 | 120 | LP | 218.0 | LP | 7657.0 | 2967.0 | 7466.0 |
| Probes | Channels in a probe |
![]() |
![]() |
Units: information about neruons and their general properties across the whole experiment
print('Number of units:', len(oursession.units))
oursession.units.head()
Number of units: 825
| waveform_PT_ratio | waveform_amplitude | amplitude_cutoff | cluster_id | cumulative_drift | d_prime | firing_rate | isi_violations | isolation_distance | L_ratio | ... | ecephys_structure_id | ecephys_structure_acronym | anterior_posterior_ccf_coordinate | dorsal_ventral_ccf_coordinate | left_right_ccf_coordinate | probe_description | location | probe_sampling_rate | probe_lfp_sampling_rate | probe_has_lfp_data | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| unit_id | |||||||||||||||||||||
| 951088679 | 0.620607 | 82.147455 | 0.023654 | 1 | 480.97 | 2.575648 | 7.429131 | 0.053350 | 51.364291 | 0.028062 | ... | 215.0 | APN | 8328.0 | 3018.0 | 7095.0 | probeA | See electrode locations | 29999.965974 | 1249.998582 | True |
| 951088664 | 0.587044 | 78.399165 | 0.001739 | 0 | 427.07 | 3.085334 | 6.843864 | 0.033732 | 40.358293 | 0.051513 | ... | 215.0 | APN | 8328.0 | 3018.0 | 7095.0 | probeA | See electrode locations | 29999.965974 | 1249.998582 | True |
| 951088734 | 0.560996 | 187.504005 | 0.001249 | 5 | 311.01 | 5.152564 | 1.418476 | 0.356931 | 56.950961 | 0.001292 | ... | 215.0 | APN | 8316.0 | 2988.0 | 7099.0 | probeA | See electrode locations | 29999.965974 | 1249.998582 | True |
| 951088721 | 0.474457 | 196.741545 | 0.000059 | 4 | 355.17 | 4.295014 | 11.517381 | 0.000271 | 58.358552 | 0.020218 | ... | 215.0 | APN | 8316.0 | 2988.0 | 7099.0 | probeA | See electrode locations | 29999.965974 | 1249.998582 | True |
| 951088862 | 0.534674 | 92.125020 | 0.089901 | 16 | 239.50 | 4.043348 | 19.953126 | 0.013619 | 79.934237 | 0.011335 | ... | 215.0 | APN | 8304.0 | 2958.0 | 7103.0 | probeA | See electrode locations | 29999.965974 | 1249.998582 | True |
5 rows × 40 columns
Stimuli: information about each stimulus presented during the experiment, including type of stimulus, time interval and stimulus specific informations
As we saw before there are only a handful of stimulus categories, but each category includes a lot of different stimuli, which are in turn presented multiple times each
stimulus_table = oursession.get_stimulus_table()
print('Number of stimuli shown:', len(stimulus_table))
print('Number of unique stimuli:', len(stimulus_table['stimulus_condition_id'].unique()))
print('Number of unique stimulus names:', len(oursession.stimulus_names))
stimulus_table.head()
Number of stimuli shown: 70931 Number of unique stimuli: 5063 Number of unique stimulus names: 9
| stimulus_block | start_time | stop_time | phase | spatial_frequency | stimulus_name | orientation | frame | color | x_position | contrast | temporal_frequency | y_position | size | duration | stimulus_condition_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| stimulus_presentation_id | ||||||||||||||||
| 0 | null | 24.875987 | 84.942787 | null | null | spontaneous | null | null | null | null | null | null | null | null | 60.066800 | 0 |
| 1 | 0.0 | 84.942787 | 85.176306 | [3644.93333333, 3644.93333333] | 0.08 | gabors | 90.0 | null | null | 20.0 | 0.8 | 4.0 | -10.0 | [20.0, 20.0] | 0.233519 | 1 |
| 2 | 0.0 | 85.176306 | 85.426505 | [3644.93333333, 3644.93333333] | 0.08 | gabors | 0.0 | null | null | 30.0 | 0.8 | 4.0 | 40.0 | [20.0, 20.0] | 0.250199 | 2 |
| 3 | 0.0 | 85.426505 | 85.676704 | [3644.93333333, 3644.93333333] | 0.08 | gabors | 45.0 | null | null | 40.0 | 0.8 | 4.0 | -30.0 | [20.0, 20.0] | 0.250199 | 3 |
| 4 | 0.0 | 85.676704 | 85.926904 | [3644.93333333, 3644.93333333] | 0.08 | gabors | 90.0 | null | null | 0.0 | 0.8 | 4.0 | -40.0 | [20.0, 20.0] | 0.250199 | 4 |
Spikes: information about the activity of the neurons. Each neuron is associated to an array of timestamps signaling that the neuron has spiked in that instant.
list(oursession.spike_times.items())[:5]
[(951088862,
array([3.71142233e+00, 3.72135568e+00, 3.73648903e+00, ...,
1.04413344e+04, 1.04414476e+04, 1.04415437e+04])),
(951088823,
array([3.71175567e+00, 3.73015569e+00, 3.95735595e+00, ...,
1.04408445e+04, 1.04415059e+04, 1.04415614e+04])),
(951088939,
array([3.73448903e+00, 3.96052262e+00, 4.44302316e+00, ...,
1.04414054e+04, 1.04415481e+04, 1.04416177e+04])),
(951088968,
array([3.70888900e+00, 3.71578901e+00, 3.74058903e+00, ...,
1.04415843e+04, 1.04416096e+04, 1.04416475e+04])),
(951089020,
array([3.71198900e+00, 3.72525568e+00, 3.75238905e+00, ...,
1.04416045e+04, 1.04416177e+04, 1.04416401e+04]))]
In our study we are going to use primarily the spiking data. We are also going to take the information about the structure in which each neuron is located from the units data and information about the stimuli in order to restrict our study to specific areas and specific stimuli. From now on we are only going to consider this data.
spike_times = oursession.spike_times
unit_area = oursession.units['ecephys_structure_acronym']
stimulus_table = oursession.get_stimulus_table()
1.3 Analyzing the data¶
Before starting to utilize the data to implement models to try to find an asnwer to our question we do a quick analysis in order to get a deeper understanding and subdivide it in smaller portions. This is to allow us to start with a more focused study with the hope of finding promising results in at least some of the subset of data and then progressively move on to using the whole dataset.
print('Number of units in each area:')
print(unit_area.value_counts())
Number of units in each area: VISam 135 CA1 134 VISp 94 VISal 89 VISl 78 LP 65 SUB 59 VISrl 47 DG 31 APN 25 CA3 21 NOT 16 LGv 16 ProS 9 CA2 2 TH 2 MB 1 Eth 1 Name: ecephys_structure_acronym, dtype: int64
#static 3D spatial representation of neurons
units = oursession.units
color1 = mcp.gen_color(cmap="tab20",n=18)
fig = plt.figure(figsize = (15, 8))
ax = fig.add_subplot(projection='3d')
counter = 0
for struct in units.structure_acronym.unique():
units1 = units[units.structure_acronym == struct]
ax.scatter(units1['left_right_ccf_coordinate'], units1['dorsal_ventral_ccf_coordinate'], units1['anterior_posterior_ccf_coordinate'], color = color1[counter], s = 100, edgecolor = color1[counter]);
counter += 1
ax.legend(units.structure_acronym.unique(), loc = (1,0.1), fontsize = 14.5);
ax.set_xlabel('anterior -> posterior')
ax.set_ylabel('left -> right')
ax.set_zlabel('dorsal -> ventral');
ax.view_init(azim=175, elev=30)
#dynamic 3D spatial representation of neurons
fig = px.scatter_3d(x = units['left_right_ccf_coordinate'], y = units['dorsal_ventral_ccf_coordinate'], z = units['anterior_posterior_ccf_coordinate'], color = units['structure_acronym'], color_discrete_sequence= color1);
fig.update_layout(scene=dict(
xaxis_title='left_right',
yaxis_title='dorsal_ventral',
zaxis_title='anterior_posterior'))
fig.update_layout(title='Spatial representation of neurons')
fig.show()
print('Number of total stimuli for each type:')
print(stimulus_table['stimulus_name'].value_counts())
Number of total stimuli for each type: natural_movie_three 36000 natural_movie_one 18000 static_gratings 6000 natural_scenes 5950 gabors 3645 drifting_gratings 630 drifting_gratings_contrast 540 flashes 150 spontaneous 16 Name: stimulus_name, dtype: int64
print('Number of unique stimuli for each type:')
print(stimulus_table.groupby('stimulus_name')['stimulus_condition_id'].nunique().sort_values(ascending=False))
Number of unique stimuli for each type: stimulus_name natural_movie_three 3600 natural_movie_one 900 gabors 243 static_gratings 121 natural_scenes 119 drifting_gratings 41 drifting_gratings_contrast 36 flashes 2 spontaneous 1 Name: stimulus_condition_id, dtype: int64
fig = plt.figure(figsize = (10,8))
counts = stimulus_table['stimulus_block'].value_counts().index
sns.set_palette('tab10')
sns.countplot(data = stimulus_table, y = 'stimulus_block', order = counts, hue = 'stimulus_name')
plt.ylabel('');
plt.title('Types of stimuli')
plt.savefig('/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/Types_of_stimuli.png', format = 'png')
We can see that there is a great variation in the number of unique stimuli presented in each category, hinting at different degrees of complexity of the stimuli themselves
print('Number of iterations of the same stimulus:')
print(stimulus_table['stimulus_condition_id'].value_counts().value_counts())
Number of iterations of the same stimulus: 10 3600 20 900 15 319 50 144 49 40 48 29 47 14 46 7 45 4 75 2 194 1 44 1 30 1 16 1 Name: stimulus_condition_id, dtype: int64
We can see that most stimuli are repeated 10 times, with the majority of the remaining ones being shown 20, 15 or 50 times.
2. Empirical Correlation of Firing rates¶
2.0. Introduction¶
We start our study analysing the correlation between neurons.
In this first question we will consider the firing rate r(b) of a neuron b during a stimulus s as the number of spikes emitted by b during s divided by the length of s. Given two neurons b and c, the empirical correlation between the firing rates of b and c is defined in the usual way, considering each stimulus as an individual sample of r(b) and r(c). We proceed as follow:
Correlations of neurons in static Gratings:
- Correlation of neurons in VISam
- Correlation of neurons in VISam + Visal
- Correlation for every pair of areas
- Correlation in all the areas
And finally:
- Comparison with correlation from all the stimuli (not only static gratings)
- Simulating random data to check robustness
2.1. Correlation in VISam (Static gratings)¶
We start by looking at the correlation of neurons in VISam
We first load the data using the function get_rates, which has been defined to load the data stored on the repository
visam = get_rates(["static_gratings"], {
'VISam': 135,
}, dir = "../")
static_gratings
We calculate the empirical correlation and we do an heatmap
plt.figure(figsize=(8,6))
c= pd.DataFrame(visam).corr()
plt.title("Correlation matrix, VISam, Static gratings")
sns.heatmap(c,cmap='coolwarm', center=0 );
In order to understand better the distribution of correlations, we plot the same matrix but ordering the units using the ward method for hierarchical clustering
from scipy.cluster.hierarchy import linkage, leaves_list
link = linkage(c, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c.iloc[order, order]
plt.figure(figsize=(8,6))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("Ordered correlation matrix, VISam, Static gratings")
plt.show()
We can see that:
- There is a group of neurons (on the left) which are almost no correlated between each others but they are negatively correlated with the other neurons
- There is a group of neurons (in the middle, approximately between 77 and 6 (numbers on the x axis)) which are quite strongly correlated between each others
- There is a group of neurons which are not correlated with anyone (93- 45)
As a way to understand the structure of correlations, we plot them in a graph-style. We consider a graph where each node is a neuron and there is an edge between two nodes if their correlation is greater (or lower if we want to consider negative correlation) than a certain threshold.
IMPORTANT DISCLAIMER: This is just a study of the correlation, there is absolutely no relationship between edges between nodes in the following graphs and real connections between neurons
We plot it using the real corrdinates of neurons
Threshold 0.5
alpha = 0.5
connections = np.copy(c)
for i in range(len(connections[0])):
connections[i][i] = 0
connections[connections >= alpha] = 1
connections[connections < alpha] = 0
G=nx.from_numpy_array(connections)
units = oursession.units
units_visam = units[units.structure_acronym == 'VISam']
x = units_visam['anterior_posterior_ccf_coordinate']
y = units_visam['dorsal_ventral_ccf_coordinate']
my_di = {}
for i in range(0, len(x)):
my_di[i]=[x.iloc[i], y.iloc[i]]
plt.figure(figsize = (20,10))
nx.draw_networkx(G, node_color = "red", pos = my_di)
G.number_of_edges()
15
8 out of 15 edges are between the two lines -- > no difference related to distance
Or we can even plot it in 3d. In this case there is an edge between neurons indicate whether there is a correlation > 0.5 or < -0.25 (we highlight that we DON'T mean there is aconnection, just correlation).
alpha_pos = 0.5
alpha_neg = -0.25
G = nx.Graph()
pos = my_di
w_edges = []
edge_xyz = []
for i in range(0, 135):
G.add_node(i)
for j in range(i, 135):
if i != j:
if c[i][j] > alpha_pos or c[i][j] < alpha_neg:
w_edges.append(c[i][j])
edge_xyz.append((pos[i], pos[j]))
G.add_edge(i, j, weight = c[i][j])
node_xyz = np.array([pos[v] for v in sorted(G)])
edge_xyz = np.array(edge_xyz)
# Create the 3D figure
fig = plt.figure(figsize = (10,8))
ax = fig.add_subplot(111, projection="3d")
# Plot the nodes - alpha is scaled by "depth" automatically
colors = ['darkred', 'darkgreen']
##Plot the edges
for vizedge, weight in zip(edge_xyz, w_edges):
if weight > alpha_pos:
ax.plot(*vizedge.T, color=colors[0], linewidth = weight, label = 'Positive Correlation')
break
for vizedge, weight in zip(edge_xyz, w_edges):
if weight < alpha_neg:
ax.plot(*vizedge.T, color=colors[1], linewidth = - weight, label = 'Negative Correlation')
break
ax.legend(['Positive correlation', 'Negative Correlation'])
for vizedge, weight in zip(edge_xyz, w_edges):
if weight > alpha_pos:
ax.plot(*vizedge.T, color=colors[0], linewidth = weight * 3, label = 'Positive Correlation')
elif weight < alpha_neg:
ax.plot(*vizedge.T, color=colors[1], linewidth = - weight * 3, label = 'Negative Correlation')
ax.scatter(*node_xyz.T, s = 50, ec = 'b');
ax.set_xlabel("left_right")
ax.set_ylabel("dorsal_ventral")
ax.set_zlabel("anterior_posterior")
ax.view_init(azim=30, elev=30)
ax.set_title('3D model, Correlation of neurons in VISam', fontsize = 20)
plt.show()
There is less correlation if we are in two distinct "lines" and with high distance in the dorsal ventral coordinate. (however decreasing to threshold = 0.4 this is no more the case)
2.2. Correlation VISam + VISal (Static gratings)¶
We now add another area to our analysis: VISal
visamsal = get_rates(["static_gratings"], {
'VISam': 135,
'VISal': 89
}, dir = "../" )
static_gratings
We calculate the correlation
c = visamsal.corr()
plt.figure(figsize=(8,6))
plt.title("Correlation matrix, VISam + VISal, Static gratings")
sns.heatmap(c,cmap='coolwarm', center=0 )
plt.hlines([135], xmin = 0 , xmax = 224)
plt.vlines([135], ymin = 0 , ymax = 224)
plt.text(70, 70, "VISam" )
plt.text(170, 170, "VISal" )
Text(170, 170, 'VISal')
It's possible to see that are slightly more correlated within the same area
link = linkage(c, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c.iloc[order, order]
plt.figure(figsize=(8,6))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("Ordered correlation matrix, VISam + VISal, Static gratings")
plt.show()
There is a clear distinction between groups of neurons
To check correlations between areas, we do a plot as before (notice that are plotted only the nodes with an edge)
Threshold = 0.5
alpha = 0.5
connections = np.copy(c)
for i in range(len(connections[0])):
connections[i][i] = 0
connections[connections >= alpha] = 1
connections[connections < alpha] = 0
G=nx.from_numpy_array(connections)
# Rimozione dei nodi isolati
#nodi_isolati = [nodo for nodo in nx.isolates(G)]
nodi_collegati = np.array([nodo for nodo in G.nodes() if G.degree(nodo) > 0])
#G.remove_nodes_from(nodi_isolati)
node_colors = np.array(["red" for i in range(135)] + ["blue" for i in range(89)])[np.array([G.degree(nodo) > 0 for nodo in G.nodes() ])]
# Creazione degli handle per la legenda
legend_handles = [
Patch(facecolor='red', label='VISam'),
Patch(facecolor='blue', label='VISal')
]
# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_kamada_kawai(G, with_labels=False, node_color = node_colors, nodelist = nodi_collegati)
Considering only very high correlations, neurons in the same areaa are preferred, even though there are still nodes of different areas which have a strong correlation
Same graph but with real coordinates: threshold = 0.5
units_visam = units[units.structure_acronym == 'VISam']
units_visal = units[units.structure_acronym == 'VISal']
x_visam = units_visam['anterior_posterior_ccf_coordinate']
y_visam = units_visam['dorsal_ventral_ccf_coordinate']
x_visal = units_visal['anterior_posterior_ccf_coordinate']
y_visal = units_visal['dorsal_ventral_ccf_coordinate']
my_di = {}
for i in range(0, len(x_visam)):
my_di[i]=[x_visam.iloc[i], y_visam.iloc[i]]
for i in range(len(x_visam), len(x_visam) + len(x_visal)):
my_di[i]=[x_visal.iloc[i - len(x_visam)], y_visal.iloc[i - len(x_visam)]]
node_colors = ["red" for i in range(135)] + ["blue" for i in range(89)]
plt.figure(figsize = (20,10))
legend_handles = [
Patch(facecolor='red', label='VISam'),
Patch(facecolor='blue', label='VISal')
]
# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_networkx(G, node_color = node_colors, pos = my_di, with_labels= False)
"Long-range" correlation is present.
We note also there is a tail in visal (blue) (lower part) which do no have any high correlation with any neuron in visam, while the more you go "up", the more the neurons in visal are correlated to the ones in visal. Even though this could be just beacuse of randomness
2.3. Pair-wise correlation between areas (Static Gratings)¶
Now we want to study the correlation between neurons of different areas in general, for every possible pair of areas. While doing it, we include the possibility that the two areas in the analysis are the same
We load the data and define the dict map, which will be used after
full_data = get_rates(["static_gratings"], areas, dir = "../")
static_gratings
mappa = {}
temp = 0
for area in areas:
mappa[area] = [i for i in range(temp, temp + areas[area])]
temp += areas[area]
c = full_data.corr()
We define a function to calculate the mean correlation between neurons of Area1 with neurons of Area2, for all possible combinations. (If area1 == area2 then we delete the diagonal)
def mean_pair_corr(c, mappa):
c_new = c.copy()
for i in range(len(c)):
c_new[i][i] = None
k = len(mappa)
result = np.zeros((k,k))
for i,area1 in enumerate(mappa):
for j,area2 in enumerate(mappa):
if j<i:
pass
else:
result[i][j] = np.nanmean(np.nanmean(np.abs(c_new[mappa[area1]].iloc[mappa[area2]])))
if math.isnan(result[i][j]):
result[i][j] = 0
result[j][i] = result[i][j]
return result
mean_pair = mean_pair_corr(c,mappa)
We can plot the result with a heatmap
c3 = pd.DataFrame(mean_pair, index=list(areas.keys()), columns=list(areas.keys()))
link = linkage(c3, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c3.iloc[order, order]
plt.figure(figsize=(8,6))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("ordered Mean of absolute value correlation areas, Static gratings")
plt.show()
If we exclude TH, it is possible to see that there is a stronger correlation between neurons on the left, that are from CA2 to SUB. This represents a macro group of areas that we will find also later in the analysis. The second group of areas is on the right, and it is composed mainly by the areas of the type "VIS". However the distinction is not so strong
For each area, we can print in an ordered way the areas to which is more correlated
for area in c3:
print(area)
print(sorted(c3.loc[area].items(), reverse = True, key = lambda x: x[1]))
APN
[('APN', 0.12312738235037245), ('TH', 0.11419636558713973), ('MB', 0.110588140810455), ('NOT', 0.09990303931783591), ('CA2', 0.0916253019265115), ('SUB', 0.09021517524792873), ('ProS', 0.08835269730453453), ('LGv', 0.0837080314288179), ('DG', 0.08249329497818586), ('CA1', 0.07941135079717937), ('CA3', 0.07484984612387481), ('LP', 0.07397366356828472), ('VISam', 0.06929214244012581), ('VISrl', 0.0689161107962807), ('VISl', 0.06787723099316009), ('VISal', 0.0666781981477204), ('Eth', 0.059148395667809765), ('VISp', 0.0582369331240436)]
CA1
[('CA1', 0.09455999381562175), ('CA2', 0.08971390766323684), ('CA3', 0.08503923645907448), ('SUB', 0.08478915261569638), ('ProS', 0.08182610660019125), ('DG', 0.08152907988375972), ('APN', 0.07941135079717937), ('NOT', 0.06943535293489968), ('MB', 0.06695748015795028), ('LGv', 0.06366007490636454), ('TH', 0.060261990607070395), ('VISrl', 0.05743145743228213), ('VISam', 0.05710112509751047), ('LP', 0.05476977339807684), ('VISl', 0.0524322466299936), ('VISal', 0.051571651949880955), ('VISp', 0.04447694580544469), ('Eth', 0.036387534529382475)]
CA2
[('CA2', 0.20561101996301057), ('ProS', 0.1323695184701155), ('TH', 0.09273490623803435), ('APN', 0.0916253019265115), ('CA1', 0.08971390766323684), ('DG', 0.0895544055235609), ('MB', 0.08342086181034475), ('CA3', 0.08322747050432183), ('SUB', 0.06868056382373912), ('NOT', 0.06281772628896598), ('VISrl', 0.058023426988845245), ('VISam', 0.0550821699778706), ('LP', 0.05163057513379939), ('VISal', 0.049343812487993806), ('LGv', 0.04829941948161433), ('Eth', 0.04607934560274251), ('VISp', 0.04563597453607007), ('VISl', 0.04449140876675233)]
CA3
[('CA3', 0.08901693451153883), ('CA1', 0.08503923645907448), ('CA2', 0.08322747050432183), ('SUB', 0.07516681496209354), ('APN', 0.07484984612387481), ('ProS', 0.07214819931033241), ('DG', 0.0707390695432312), ('NOT', 0.0647017654004669), ('MB', 0.06107182419237277), ('LGv', 0.06028012261345254), ('TH', 0.05562981509231202), ('VISrl', 0.054387901283727176), ('VISam', 0.054255055724614525), ('LP', 0.052825685972040105), ('VISl', 0.048400670228927765), ('VISal', 0.047513147712161224), ('Eth', 0.044132658061415864), ('VISp', 0.040538593357979005)]
DG
[('ProS', 0.10197359031404475), ('DG', 0.09915486819237482), ('CA2', 0.0895544055235609), ('MB', 0.08442278059026626), ('APN', 0.08249329497818586), ('CA1', 0.08152907988375972), ('NOT', 0.07818036935217514), ('SUB', 0.07765112282992011), ('CA3', 0.0707390695432312), ('LGv', 0.06959817463281413), ('LP', 0.059007277450778856), ('VISl', 0.057062246032824095), ('VISal', 0.05364378283433595), ('VISam', 0.05343479962675175), ('TH', 0.053184165787395314), ('VISrl', 0.05227008180289224), ('VISp', 0.048260388934753526), ('Eth', 0.02935351667126987)]
Eth
[('APN', 0.059148395667809765), ('CA2', 0.04607934560274251), ('ProS', 0.04492473718942198), ('CA3', 0.044132658061415864), ('TH', 0.040284155126311554), ('CA1', 0.036387534529382475), ('SUB', 0.03413440363286329), ('VISam', 0.03232232810232816), ('DG', 0.02935351667126987), ('VISrl', 0.028423237802596018), ('NOT', 0.027023037789695573), ('LGv', 0.025773403261783952), ('VISp', 0.025281516684373697), ('LP', 0.025068728890092172), ('VISl', 0.024619177877270366), ('VISal', 0.023384932893865), ('MB', 0.005139553051917564), ('Eth', 0.0)]
LGv
[('LGv', 0.12846594074884535), ('MB', 0.10435864124859394), ('NOT', 0.10343315229354078), ('APN', 0.0837080314288179), ('VISp', 0.07872472140640324), ('VISal', 0.0724940307006034), ('SUB', 0.07063417878899635), ('DG', 0.06959817463281413), ('VISl', 0.06956310306360386), ('LP', 0.06752348936525888), ('ProS', 0.06650832565746582), ('CA1', 0.06366007490636454), ('VISrl', 0.06267157185066369), ('VISam', 0.06251509238898431), ('CA3', 0.06028012261345254), ('CA2', 0.04829941948161433), ('TH', 0.046292718801978855), ('Eth', 0.025773403261783952)]
LP
[('MB', 0.08424056824860766), ('NOT', 0.08391890938360556), ('APN', 0.07397366356828472), ('LGv', 0.06752348936525888), ('LP', 0.06421644555871042), ('SUB', 0.06174468431882669), ('ProS', 0.061361642695262966), ('DG', 0.059007277450778856), ('VISal', 0.058307789278875234), ('VISl', 0.055923770663281684), ('CA1', 0.05476977339807684), ('VISrl', 0.05357348795595679), ('VISam', 0.05347872185735793), ('CA3', 0.052825685972040105), ('CA2', 0.05163057513379939), ('VISp', 0.05111286675525842), ('TH', 0.04716566636780038), ('Eth', 0.025068728890092172)]
MB
[('NOT', 0.16774745881700515), ('APN', 0.110588140810455), ('VISl', 0.10589374061772659), ('LGv', 0.10435864124859394), ('VISam', 0.09884997781362934), ('VISp', 0.09410729205314883), ('ProS', 0.09072879643612462), ('VISal', 0.08923250329536997), ('DG', 0.08442278059026626), ('LP', 0.08424056824860766), ('CA2', 0.08342086181034475), ('SUB', 0.08340260980648741), ('VISrl', 0.07181290389095207), ('CA1', 0.06695748015795028), ('CA3', 0.06107182419237277), ('TH', 0.052744524697909564), ('Eth', 0.005139553051917564), ('MB', 0.0)]
NOT
[('MB', 0.16774745881700515), ('NOT', 0.14884011922377574), ('LGv', 0.10343315229354078), ('APN', 0.09990303931783591), ('VISal', 0.08989172741733721), ('ProS', 0.08477779792500018), ('LP', 0.08391890938360556), ('VISl', 0.08089062036231921), ('VISp', 0.0790859533584018), ('SUB', 0.0788819642156339), ('DG', 0.07818036935217514), ('VISam', 0.0774510123930317), ('VISrl', 0.07585926151762301), ('CA1', 0.06943535293489968), ('TH', 0.06526496879747626), ('CA3', 0.0647017654004669), ('CA2', 0.06281772628896598), ('Eth', 0.027023037789695573)]
ProS
[('CA2', 0.1323695184701155), ('DG', 0.10197359031404475), ('ProS', 0.09546224793972732), ('MB', 0.09072879643612462), ('APN', 0.08835269730453453), ('NOT', 0.08477779792500018), ('CA1', 0.08182610660019125), ('SUB', 0.08068734680591692), ('CA3', 0.07214819931033241), ('LGv', 0.06650832565746582), ('TH', 0.062428836234069966), ('LP', 0.061361642695262966), ('VISrl', 0.05834391861374082), ('VISam', 0.05730810710454194), ('VISl', 0.057017527158679526), ('VISal', 0.05493106953833446), ('VISp', 0.04873577487489248), ('Eth', 0.04492473718942198)]
SUB
[('SUB', 0.09124163203451092), ('APN', 0.09021517524792873), ('CA1', 0.08478915261569638), ('MB', 0.08340260980648741), ('ProS', 0.08068734680591692), ('NOT', 0.0788819642156339), ('DG', 0.07765112282992011), ('TH', 0.07541572622000985), ('CA3', 0.07516681496209354), ('LGv', 0.07063417878899635), ('CA2', 0.06868056382373912), ('VISam', 0.062281201429440686), ('LP', 0.06174468431882669), ('VISrl', 0.06109152649128866), ('VISl', 0.05968763977809013), ('VISal', 0.057438484577878784), ('VISp', 0.04855130890804684), ('Eth', 0.03413440363286329)]
TH
[('TH', 0.26792659516327716), ('APN', 0.11419636558713973), ('CA2', 0.09273490623803435), ('SUB', 0.07541572622000985), ('NOT', 0.06526496879747626), ('VISam', 0.06506735698073443), ('VISrl', 0.06419189958564947), ('ProS', 0.062428836234069966), ('CA1', 0.060261990607070395), ('CA3', 0.05562981509231202), ('VISl', 0.05418065531032463), ('VISal', 0.05413859362842249), ('DG', 0.053184165787395314), ('MB', 0.052744524697909564), ('LP', 0.04716566636780038), ('LGv', 0.046292718801978855), ('VISp', 0.04165479405564982), ('Eth', 0.040284155126311554)]
VISal
[('NOT', 0.08989172741733721), ('MB', 0.08923250329536997), ('VISal', 0.07825975111318538), ('LGv', 0.0724940307006034), ('VISl', 0.07112060449171467), ('VISp', 0.06839174011755207), ('VISam', 0.06717159480868491), ('APN', 0.0666781981477204), ('VISrl', 0.06616088599189691), ('LP', 0.058307789278875234), ('SUB', 0.057438484577878784), ('ProS', 0.05493106953833446), ('TH', 0.05413859362842249), ('DG', 0.05364378283433595), ('CA1', 0.051571651949880955), ('CA2', 0.049343812487993806), ('CA3', 0.047513147712161224), ('Eth', 0.023384932893865)]
VISam
[('MB', 0.09884997781362934), ('NOT', 0.0774510123930317), ('VISam', 0.0719858674237911), ('APN', 0.06929214244012581), ('VISal', 0.06717159480868491), ('VISl', 0.06594549611867936), ('TH', 0.06506735698073443), ('LGv', 0.06251509238898431), ('SUB', 0.062281201429440686), ('VISrl', 0.06224861143943695), ('VISp', 0.05741898232869215), ('ProS', 0.05730810710454194), ('CA1', 0.05710112509751047), ('CA2', 0.0550821699778706), ('CA3', 0.054255055724614525), ('LP', 0.05347872185735793), ('DG', 0.05343479962675175), ('Eth', 0.03232232810232816)]
VISl
[('MB', 0.10589374061772659), ('NOT', 0.08089062036231921), ('VISl', 0.07198567305323106), ('VISal', 0.07112060449171467), ('LGv', 0.06956310306360386), ('APN', 0.06787723099316009), ('VISam', 0.06594549611867936), ('VISrl', 0.06350865228706969), ('VISp', 0.06328733532613531), ('SUB', 0.05968763977809013), ('DG', 0.057062246032824095), ('ProS', 0.057017527158679526), ('LP', 0.055923770663281684), ('TH', 0.05418065531032463), ('CA1', 0.0524322466299936), ('CA3', 0.048400670228927765), ('CA2', 0.04449140876675233), ('Eth', 0.024619177877270366)]
VISp
[('MB', 0.09410729205314883), ('VISp', 0.0793190235583615), ('NOT', 0.0790859533584018), ('LGv', 0.07872472140640324), ('VISal', 0.06839174011755207), ('VISl', 0.06328733532613531), ('APN', 0.0582369331240436), ('VISam', 0.05741898232869215), ('VISrl', 0.0538300251793051), ('LP', 0.05111286675525842), ('ProS', 0.04873577487489248), ('SUB', 0.04855130890804684), ('DG', 0.048260388934753526), ('CA2', 0.04563597453607007), ('CA1', 0.04447694580544469), ('TH', 0.04165479405564982), ('CA3', 0.040538593357979005), ('Eth', 0.025281516684373697)]
VISrl
[('NOT', 0.07585926151762301), ('MB', 0.07181290389095207), ('VISrl', 0.07079114152647305), ('APN', 0.0689161107962807), ('VISal', 0.06616088599189691), ('TH', 0.06419189958564947), ('VISl', 0.06350865228706969), ('LGv', 0.06267157185066369), ('VISam', 0.06224861143943695), ('SUB', 0.06109152649128866), ('ProS', 0.05834391861374082), ('CA2', 0.058023426988845245), ('CA1', 0.05743145743228213), ('CA3', 0.054387901283727176), ('VISp', 0.0538300251793051), ('LP', 0.05357348795595679), ('DG', 0.05227008180289224), ('Eth', 0.028423237802596018)]
We can now repeat the same analysis with the maximum correlation, that is, the maximum correlation there is between a neuron of the first area and a neuron of the second area.
Our goal is to see whether there are neurons of different areas which are strongly correlated.
def max_pair_corr(c, mappa):
c_new = c.copy()
for i in range(len(c)):
c_new[i][i] = 0
k = len(mappa)
result = np.zeros((k,k))
for i,area1 in enumerate(mappa):
for j,area2 in enumerate(mappa):
if j<i:
pass
else:
result[i][j] = np.max(np.max((c_new[mappa[area1]].iloc[mappa[area2]])))
result[j][i] = result[i][j]
return result
max_pair = max_pair_corr(c,mappa)
max_pair.shape
(18, 18)
c1 = pd.DataFrame(max_pair, index=list(areas.keys()), columns=list(areas.keys()))
link = linkage(c1, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c1.iloc[order, order]
plt.figure(figsize=(8,6))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("ordered Max correlation areas, Static gratings")
plt.show()
Indeed there are, also with very high correlations and between different areas. For instance, there is a neuron in CA1 and a neuron in Virsl that are highly correlated
We can do the same also with the min (for negative correlation)
def min_pair_corr(c, mappa):
c_new = c.copy()
for i in range(len(c)):
c_new[i][i] = 0
k = len(mappa)
result = np.zeros((k,k))
for i,area1 in enumerate(mappa):
for j,area2 in enumerate(mappa):
if j<i:
pass
else:
result[i][j] = np.min(np.min((c_new[mappa[area1]].iloc[mappa[area2]])))
result[j][i] = result[i][j]
return result
min_pair = min_pair_corr(c,mappa)
c2 = pd.DataFrame(min_pair, index=list(areas.keys()), columns=list(areas.keys()))
link = linkage(c2, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c2.iloc[order, order]
plt.figure(figsize=(8,6))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("ordered Min correlation areas, Static gratings")
plt.show()
VISal , LP and DG are the ones with some units that are the most negative correlated with other areas
2.4. Correlation in all areas (Static Gratings)¶
We now consider all areas together
full_data = get_rates(["static_gratings"], areas, dir = "../")
static_gratings
c = full_data.corr()
We can do the heatmap ordered by similarity
link = linkage(c, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c.iloc[order, order]
plt.figure(figsize=(12,9))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("Ordered correlation matrix, All_areas, Static gratings")
plt.show()
There is a group of neurons which are highly correlated between each others, while another group of neurons which are negatively correlated with the first one, and not correlated between each others. However it is also possible to see some subgroups of highly correlated neurons that are negatively correlated with some other highly correlated ones.
We proceed by plotting some image with a graph style
colors = {0 : "#544148",
1: "#8e86a0",
2 : "#25680d",
3: "#bd83f7",
4 : "#cecbd3",
5: "#afd19c",
6: "green",
7: "#704431",
8: "purple",
9: "#932039",
10: "yellow",
11: "#e5967e",
12: "pink",
13:"blue",
14: "red",
15: "#0dbbe2",
16: "#937aa5",
17: "#5ee6ff"}
areas2 = ['APN', 'CA1', 'CA2', 'CA3', 'DG', 'Eth', 'LGv', 'LP', 'MB', 'NOT',
'ProS', 'SUB', 'TH', 'VISal', 'VISam', 'VISl', 'VISp', 'VISrl']
classes = [areas2.index(area) for area in areas for j in range(areas[area])]
Threshold 0.6
alpha = 0.6
connections = np.copy(c)
for i in range(len(connections[0])):
connections[i][i] = 0
connections[connections >= alpha] = 1
connections[connections < alpha] = 0
G=nx.from_numpy_array(connections)
# Rimozione dei nodi isolati
#nodi_isolati = [nodo for nodo in nx.isolates(G)]
nodi_collegati = np.array([nodo for nodo in G.nodes() if G.degree(nodo) > 0])
#G.remove_nodes_from(nodi_isolati)
node_colors = np.array([colors[i] for i in classes])[np.array([G.degree(nodo) > 0 for nodo in G.nodes() ])]
plt.figure(figsize=(11,8))
# Creazione degli handle per la legenda
legend_handles = [Patch(color=color, label= areas2[label]) for label, color in colors.items()]
# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_spring(G, with_labels=False, node_color = node_colors, nodelist = nodi_collegati)
Since there are many connected components, let's plot only the first one
largest_component = max(nx.connected_components(G), key=len)
node_colors = [colors[classes[node]] for node in largest_component]
subgraph = G.subgraph(largest_component)
plt.figure(figsize=(10, 10))
# Creazione degli handle per la legenda
legend_handles = [Patch(color=color, label= areas2[label]) for label, color in colors.items()]
# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_kamada_kawai(subgraph, node_color=node_colors)
plt.show()
Let's try with negative correlation
alpha = -0.4
connections = np.copy(c)
for i in range(len(connections[0])):
connections[i][i] = 0
connections[connections >= alpha] = 0
connections[connections < alpha] = 1
G=nx.from_numpy_array(connections)
# Rimozione dei nodi isolati
#nodi_isolati = [nodo for nodo in nx.isolates(G)]
nodi_collegati = np.array([nodo for nodo in G.nodes() if G.degree(nodo) > 0])
#G.remove_nodes_from(nodi_isolati)
node_colors = np.array([colors[i] for i in classes])[np.array([G.degree(nodo) > 0 for nodo in G.nodes() ])]
plt.figure(figsize=(11,8))
nx.draw_kamada_kawai(G, with_labels=False, node_color = node_colors, nodelist = nodi_collegati)
The structure is completely different now: there are few nodes with very high degree
(We have to consider the fact that here the threshold is smaller in absolute value)
alpha = -0.5
connections = np.copy(c)
for i in range(len(connections[0])):
connections[i][i] = 0
connections[connections >= alpha] = 0
connections[connections < alpha] = 1
G=nx.from_numpy_array(connections)
# Rimozione dei nodi isolati
#nodi_isolati = [nodo for nodo in nx.isolates(G)]
nodi_collegati = np.array([nodo for nodo in G.nodes() if G.degree(nodo) > 0])
#G.remove_nodes_from(nodi_isolati)
node_colors = np.array([colors[i] for i in classes])[np.array([G.degree(nodo) > 0 for nodo in G.nodes() ])]
plt.figure(figsize=(11,8))
# Creazione degli handle per la legenda
legend_handles = [Patch(color=color, label= areas2[label]) for label, color in colors.items()]
# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_kamada_kawai(G, with_labels=False, node_color = node_colors, nodelist = nodi_collegati)
About differences between positive and negative correlations:
- When we consider the highest-end positive correlations, we have more nodes with high correlations but those have lower degree, and they are in a sort of chain shape
- When we consider the highest-end negative correlations, we have fewer nodes with high negative correlations but those have a very high degree, and they are in a sort of star shape
This could be related only by the fact that we have more neurons with high and similar positive correlations
2.5. Correlation in all areas (all the stimuli)¶
We load the data
names = ["spontaneous",
"gabors",
"flashes",
"drifting_gratings",
"natural_movie_three",
"natural_movie_one",
"static_gratings",
"natural_scenes",
"drifting_gratings_contrast"]
full_full_data = get_rates(names, areas, dir = "../")
spontaneous gabors flashes drifting_gratings natural_movie_three natural_movie_one static_gratings natural_scenes drifting_gratings_contrast
and compute the correlation as usual
c = full_full_data.corr()
c.shape
(825, 825)
link = linkage(c, method='ward')
order = leaves_list(link)
ordered_corr_matrix = c.iloc[order, order]
plt.figure(figsize=(12,9))
sns.heatmap(ordered_corr_matrix, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("Ordered correlation matrix, All_areas, All stimuli")
plt.show()
The correlation drops dramatically. However, there is still some structure.
Let's see the min and the max
for i in range(825):
c[i][i] = 0
np.max(np.max(c))
c:\Users\lgand\anaconda3\envs\envallen\Lib\site-packages\numpy\core\fromnumeric.py:84: FutureWarning: In a future version, DataFrame.max(axis=None) will return a scalar max over the entire DataFrame. To retain the old behavior, use 'frame.max(axis=0)' or just 'frame.max()' return reduction(axis=axis, out=out, **passkwargs)
0.8458585764461427
We still have some neurons with very high correlation!!!
np.min(np.min(c))
c:\Users\lgand\anaconda3\envs\envallen\Lib\site-packages\numpy\core\fromnumeric.py:84: FutureWarning: In a future version, DataFrame.min(axis=None) will return a scalar min over the entire DataFrame. To retain the old behavior, use 'frame.min(axis=0)' or just 'frame.min()' return reduction(axis=axis, out=out, **passkwargs)
-0.3817121948958405
And also very high negative correlation!
areas2 = ['APN', 'CA1', 'CA2', 'CA3', 'DG', 'Eth', 'LGv', 'LP', 'MB', 'NOT',
'ProS', 'SUB', 'TH', 'VISal', 'VISam', 'VISl', 'VISp', 'VISrl']
k = len(areas2)
classes = [areas2.index(area) for area in areas for j in range(areas[area])]
Just some plots
alpha = 0.40
connections = np.copy(c)
for i in range(len(connections[0])):
connections[i][i] = 0
connections[connections >= alpha] = 1
connections[connections < alpha] = 0
G=nx.from_numpy_array(connections)
# Rimozione dei nodi isolati
#nodi_isolati = [nodo for nodo in nx.isolates(G)]
nodi_collegati = np.array([nodo for nodo in G.nodes() if G.degree(nodo) > 0])
#G.remove_nodes_from(nodi_isolati)
node_colors = np.array([colors[i] for i in classes])[np.array([G.degree(nodo) > 0 for nodo in G.nodes() ])]
plt.figure(figsize=(11,8))
# Creazione degli handle per la legenda
legend_handles = [Patch(color=color, label= areas2[label]) for label, color in colors.items()]
# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_spring(G, with_labels=False, node_color = node_colors, nodelist = nodi_collegati)
Lasgest connected component
largest_component = max(nx.connected_components(G), key=len)
node_colors = [colors[classes[node]] for node in largest_component]
subgraph = G.subgraph(largest_component)
plt.figure(figsize=(10, 10))
# Creazione degli handle per la legenda
legend_handles = [Patch(color=color, label= areas2[label]) for label, color in colors.items()]
# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_kamada_kawai(subgraph, node_color=node_colors)
plt.show()
This should be exactly the same as the one seen before!
And in general this has a meaning also for predictions: it should be simpler to predict the activity of a neuron in CA1 given the rates of the other neurons in CA1
Indeed this will be verified in the prediction part
Negative correlation
alpha = -0.2
connections = np.copy(c)
for i in range(len(connections[0])):
connections[i][i] = 0
connections[connections >= alpha] = 0
connections[connections < alpha] = 1
G=nx.from_numpy_array(connections)
# Rimozione dei nodi isolati
#nodi_isolati = [nodo for nodo in nx.isolates(G)]
nodi_collegati = np.array([nodo for nodo in G.nodes() if G.degree(nodo) > 0])
#G.remove_nodes_from(nodi_isolati)
node_colors = np.array([colors[i] for i in classes])[np.array([G.degree(nodo) > 0 for nodo in G.nodes() ])]
plt.figure(figsize=(11,8))
# Creazione degli handle per la legenda
legend_handles = [Patch(color=color, label= areas2[label]) for label, color in colors.items()]
# Aggiunta della legenda al plot
plt.legend(handles=legend_handles, title='Areas')
nx.draw_spring(G, with_labels=False, node_color = node_colors, nodelist = nodi_collegati)
Similar structure, and nodes, as before
nodi_collegati
array([ 3, 11, 13, 24, 61, 65, 77, 106, 112, 115, 116, 117, 142,
144, 157, 201, 207, 209, 224, 225, 301, 307, 318, 324, 327, 369,
379, 384, 397, 401, 438, 443, 478, 482, 503, 544, 550, 557, 611,
612, 622, 630, 691, 705, 723, 748, 751, 780, 795])
G.edges
EdgeView([(3, 397), (11, 397), (13, 142), (13, 224), (13, 225), (13, 438), (13, 622), (13, 748), (24, 397), (61, 142), (61, 224), (61, 225), (61, 438), (61, 622), (61, 748), (65, 225), (77, 224), (106, 379), (112, 224), (112, 748), (115, 397), (116, 397), (117, 397), (142, 379), (142, 550), (144, 397), (157, 224), (157, 225), (201, 397), (207, 397), (209, 379), (224, 301), (224, 443), (224, 550), (224, 691), (224, 705), (224, 795), (225, 307), (225, 544), (318, 397), (324, 379), (327, 369), (369, 397), (379, 780), (384, 397), (397, 401), (397, 478), (397, 482), (397, 503), (397, 557), (397, 611), (397, 612), (397, 630), (397, 723), (397, 751), (550, 748)])
Conclusions: passing from static grating to all the stimuli the overall amount of correlation decreased, however there is still some structure in the data. In particular, there are still pairs of neurons with very high positive correlation, and pairs with negative correlations comparable as with considering only static gratings. It seems that some (actually, the majority of) neurons were correlated only imposing the static gratings stimuli, and using all the types of stimuli they were removed. However, as seen for the largest connected component with high positive correlation, its structure remained very similar, which was not totally trivial, since we are considering a lot more, and different, stimuli. This could mean that the correlation obtained considering only a subset of types of stimuli could be a good approximation for a larger set of types of stimuli.
2.6. Brief analysis on random data¶
This part is meant to be just a test to check that our measure of empirical correlation is not affected only by randomness. We simulate 6000 independent samples (the number of stimuli in gratings), and we see what is the maximal correlation achieved.
Let's see first the distribution of the firing rates. This analysis is done on static gratings.
We often get a shape like this one:
full_data.iloc[:,0].hist(bins = 20)
<Axes: >
Even though sometimes we get a bell-shaped distribution:
full_data.iloc[:,302].hist(bins = 20)
<Axes: >
areas2 = ['APN', 'CA1', 'CA2', 'CA3', 'DG', 'Eth', 'LGv', 'LP', 'MB', 'NOT',
'ProS', 'SUB', 'TH', 'VISal', 'VISam', 'VISl', 'VISp', 'VISrl']
Plot of the distribution of the first neuron for each area:
temp = 0
img, axs = plt.subplots(3, 6, figsize=(20, 14))
for x in range(6):
for y in range(3):
axs[y][x].hist(full_data.iloc[:,temp], bins=20)
area = x + 6*y
temp += areas[areas2[area]]
axs[y][x].set_title(areas2[area])
axs[y][x].set_xlabel(f"Firing rate")
Now, we decide to approximate each neuron's activity with a Poisson distribution. We sample random data for each neuron with a poisson distribution with mean equal to the empirical mean of the neuron in static gratings.
Since there are 6000 stimuli in static gratings, we sample 6000 rates for each neuron
from scipy.stats import poisson
def generate_poisson_random_data(n_units, n_samples, mean):
result = np.zeros((n_units, n_samples))
for i in range(n_units):
result[i] = poisson.rvs(mean[i], size = n_samples)
return pd.DataFrame(result.T)
mean = np.array(full_data.mean(axis = 0))
random_full_data = generate_poisson_random_data(825, 6000, mean)
c = random_full_data.corr()
for i in range(825):
c.iloc[i,i] = 0
c.max().max()
0.05588429522922792
c.min().min()
-0.058898055518087036
From a random distribution, with 6000 samples, the maximum absolute value of correlation is 0.059, which confirms that the correlation studied in the past sections does not come from randomness.
2.bis Studying correlation between spike trains¶
Motivated by the presence of a large number of spikes inside the spike trains, which carry a lot of information about their distribution in time, we decided to test and analyze another correlation metric, the Spike Time Tiling Coeficient (STTC).
Simply put, computing the STTC of two spike trains means determining the degree of similarity of the two on a spike to spike basis, with a value ranging from -1 to 1, as in the classical correlation.
More specifically, we can see in the following image how the coefficient is computed:

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
import pickle
from allensdk.brain_observatory.ecephys.ecephys_project_cache import EcephysProjectCache
data_dir = "/Users/gabri/Library/Mobile Documents/com~apple~CloudDocs/Python/Neuroscience/allendata"
manifest_path = os.path.join(data_dir, "manifest.json")
cache = EcephysProjectCache.from_warehouse(manifest=manifest_path)
session_id = 798911424
oursession = cache.get_session_data(session_id, timeout=3000)
stimulus_table = oursession.get_stimulus_table()
spike_times = oursession.spike_times
/Users/gabri/opt/anaconda3/envs/envallen/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm /Users/gabri/opt/anaconda3/envs/envallen/lib/python3.11/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /Users/gabri/opt/anaconda3/envs/envallen/lib/python3.11/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.2 because version 2.6.0-alpha is already loaded. return func(args[0], **pargs)
Functions to implement the method¶
# Computing T and P for the STTC formula
def get_T(spike_train, dt, T):
current = None
previous = 0
area = 0
for spike in spike_train:
current = spike
area += min(2*dt, current - previous)
if T-spike < dt:
area -= dt - (T - spike)
previous = current
return area/T
def get_P(spike_train_1, spike_train_2, dt):
if len(spike_train_1) > 0:
count = 0
for spike_1 in spike_train_1:
for i, spike_2 in enumerate(spike_train_2):
if abs(spike_1-spike_2) <= dt:
count += 1
spike_train_2 = spike_train_2[i:]
break
elif spike_2 - spike_1 > dt:
spike_train_2 = spike_train_2[i:]
break
return count / len(spike_train_1)
else:
return 1
# Getting spikes in a specific interval (stimulus block)
def get_multiple_data(stimulus_block):
start = stimulus_table[stimulus_table['stimulus_block'] == stimulus_block]['start_time'].min()
stop = stimulus_table[stimulus_table['stimulus_block'] == stimulus_block]['stop_time'].max()
spike_times_list = oursession.spike_times.copy()
for unit_id in spike_times_list.keys():
spike_times_list[unit_id] = spike_times_list[unit_id][(spike_times_list[unit_id] >= start) & (spike_times_list[unit_id] <= stop)] - start
return spike_times_list
# Computing STTC on the whole spike trains of specified areas
def STTC_all_area(areas, dt = 0.1):
spike_trains = oursession.spike_times.copy()
unit_ids = oursession.units[oursession.units['ecephys_structure_acronym'].isin(areas)].index
spike_trains = {unit_id: spike_trains[unit_id] for unit_id in unit_ids}
start = 0
stop = 10441.665860980896 # max([max(spike_times) for spike_times in oursession.spike_times.values()])
T = stop - start
spike_trains = {unit_id: spike_times - start for unit_id, spike_times in spike_trains.items()}
STTC = np.zeros((len(spike_trains), len(spike_trains)))
index = unit_ids
T_array = dict()
for a in index:
T_array[a] = get_T(spike_trains[a], dt, T)
for i, a in enumerate(index):
print('unit', i+1, 'out of', len(index))
for j, b in enumerate(index):
if j >= i:
Pa = get_P(spike_trains[a], spike_trains[b], dt)
Pb = get_P(spike_trains[b], spike_trains[a], dt)
STTC[i][j] = 1/2 * ((Pa-T_array[b])/(1-Pa*T_array[b]) + (Pb-T_array[a])/(1-Pb*T_array[a]))
print('Done')
STTC = STTC + STTC.T - np.diag(np.diag(STTC))
return STTC
# Computing STTC on spikes trains restricted to an interval corresponding to a single stimulus block of specified areas
def STTC_block_area(stimulus_block, areas, dt = 0.1):
spike_trains = get_multiple_data(stimulus_block)
unit_ids = oursession.units[oursession.units['ecephys_structure_acronym'].isin(areas)].index
spike_trains = {unit_id: spike_trains[unit_id] for unit_id in unit_ids}
start = stimulus_table[stimulus_table['stimulus_block'] == stimulus_block]['start_time'].min()
stop = stimulus_table[stimulus_table['stimulus_block'] == stimulus_block]['stop_time'].max()
T = stop - start
STTC = np.zeros((len(spike_trains), len(spike_trains)))
index = unit_ids
T_array = dict()
for a in index:
T_array[a] = get_T(spike_trains[a], dt, T)
for i, a in enumerate(index):
print('unit', i+1, 'out of', len(index))
for j, b in enumerate(index):
if j >= i:
Pa = get_P(spike_trains[a], spike_trains[b], dt)
Pb = get_P(spike_trains[b], spike_trains[a], dt)
STTC[i][j] = 1/2 * ((Pa-T_array[b])/(1-Pa*T_array[b]) + (Pb-T_array[a])/(1-Pb*T_array[a]))
print('Done')
STTC = STTC + STTC.T - np.diag(np.diag(STTC))
return STTC
# STTC of all areas during a single stimulus block
def STTC_multiple(stimulus_block, dt = 0.1):
spike_trains = get_multiple_data(stimulus_block)
start = stimulus_table[stimulus_table['stimulus_block'] == stimulus_block]['start_time'].min()
stop = stimulus_table[stimulus_table['stimulus_block'] == stimulus_block]['stop_time'].max()
T = stop - start
STTC = np.zeros((len(spike_trains), len(spike_trains)))
index = oursession.units.index
T_array = dict()
for a in index:
T_array[a] = get_T(spike_trains[a], dt, T)
for i, a in enumerate(index):
print('unit', i+1, 'out of', len(index))
for j, b in enumerate(index):
if j >= i:
Pa = get_P(spike_trains[a], spike_trains[b], dt)
Pb = get_P(spike_trains[b], spike_trains[a], dt)
STTC[i][j] = 1/2 * ((Pa-T_array[b])/(1-Pa*T_array[b]) + (Pb-T_array[a])/(1-Pb*T_array[a]))
print('Done')
STTC = STTC + STTC.T - np.diag(np.diag(STTC))
return STTC
# Function that automatically saves/loads and plots STTC of the whole spike train for given areas
def compute_save_plot_all(areas, plot = True, ordered = False):
if os.path.exists(f'/Users/gabri/Desktop/STTC/{areas}_all.pkl'):
matrix = pickle.load(open(f'/Users/gabri/Desktop/STTC/{areas}_all.pkl', 'rb'))
else:
matrix = STTC_all_area(areas)
pickle.dump(matrix, open(f'/Users/gabri/Desktop/STTC/{areas}_all.pkl', 'wb'))
if plot:
if ordered:
sns.clustermap(matrix, cmap='bwr', center=0, figsize=(10, 8), dendrogram_ratio=(0.1,0.1), xticklabels=False, yticklabels=False, method='ward')
else:
plt.subplots(figsize=(10, 8))
sns.heatmap(matrix, cmap='bwr', center=0)
plt.title(f'{areas} all stimuli')
plt.show()
return matrix
# Function that automatically saves/loads and plots STTC of the spikes during a stimulus block for given areas
def compute_save_plot_block_area(block, areas, plot = True, ordered = False):
if os.path.exists(f'/Users/gabri/Desktop/STTC/block_{block}_{areas}.pkl'):
matrix = pickle.load(open(f'/Users/gabri/Desktop/STTC/block_{block}_{areas}.pkl', 'rb'))
else:
matrix = STTC_block_area(block, areas)
pickle.dump(matrix, open(f'/Users/gabri/Desktop/STTC/block_{block}_{areas}.pkl', 'wb'))
if plot:
if ordered:
sns.clustermap(matrix, cmap='bwr', center=0, figsize=(10, 8), dendrogram_ratio=(0.1,0.1), xticklabels=False, yticklabels=False, method='ward')
else:
plt.subplots(figsize=(10, 8))
sns.heatmap(matrix, cmap='bwr', center=0)
plt.title(f'Block {block} {areas}')
plt.show()
return matrix
# Function that automatically saves/loads and plots STTC of the spikes during a stimulus block for all areas
def compute_save_plot_block(block, plot = True, ordered = False):
if os.path.exists(f'/Users/gabri/Desktop/STTC/block_{block}.pkl'):
matrix = pickle.load(open(f'/Users/gabri/Desktop/STTC/block_{block}.pkl', 'rb'))
else:
matrix = STTC_multiple(block)
pickle.dump(matrix, open(f'/Users/gabri/Desktop/STTC/block_{block}.pkl', 'wb'))
if plot:
if ordered:
sns.clustermap(matrix, cmap='bwr', center=0, figsize=(10, 8), dendrogram_ratio=(0.1,0.1), xticklabels=False, yticklabels=False, method='ward')
else:
plt.subplots(figsize=(10, 8))
sns.heatmap(matrix, cmap='bwr', center=0)
plt.title(f'Block {block}')
plt.show()
return matrix
# Function that automatically saves/loads and plots STTC of the whole spike train for given areas, ordered by area
def compute_save_plot_all_ordered(areas, plot = True):
matrix = compute_save_plot_all(areas, plot = False)
area_indeces = pickle.load(open('/Users/gabri/Desktop/STTC/area_indeces.pkl', 'rb'))
# order areas by length of indeces
ordered_area_indeces = dict(sorted(area_indeces.items(), key=lambda item: len(item[1]), reverse=True))
indeces = []
for _, area_indeces in ordered_area_indeces.items():
indeces.extend(area_indeces)
reordered_matrix = matrix[indeces]
reordered_matrix_2 = reordered_matrix.copy()
for i, x in enumerate(reordered_matrix):
reordered_matrix_2[i] = x[indeces]
if plot:
plt.subplots(figsize=(100, 100))
sns.heatmap(reordered_matrix_2, cmap='bwr', center=0, square=True)
plt.title(f'STTC computed on the whole spike trains', fontsize=100)
# plt.xticks([])
# plt.yticks([])
area_sizes = oursession.units['ecephys_structure_acronym'].value_counts().values
increments = np.zeros(len(area_sizes))
for i in range(len(area_sizes)):
increments[i] = sum(area_sizes[:i+1])
plt.hlines(increments, *plt.xlim(), colors='black')
plt.vlines(increments, *plt.ylim(), colors='black')
# plot area names in the middle of their section
for i, area in enumerate(ordered_area_indeces.keys()):
try:
plt.text(increments[i] - area_sizes[i]//2, increments[i] - area_sizes[i]//2, area, fontsize=50, color='black')
except:
break
plt.show()
return matrix
# Function that automatically saves/loads and plots STTC of the spikes during a stimulus block for all areas, ordered by area
def compute_save_plot_block_ordered(block, plot=True):
matrix = compute_save_plot_block(block, plot = False)
area_indeces = pickle.load(open('/Users/gabri/Desktop/STTC/area_indeces.pkl', 'rb'))
# order areas by length of indeces
ordered_area_indeces = dict(sorted(area_indeces.items(), key=lambda item: len(item[1]), reverse=True))
indeces = []
for _, area_indeces in ordered_area_indeces.items():
indeces.extend(area_indeces)
reordered_matrix = matrix[indeces]
reordered_matrix_2 = reordered_matrix.copy()
for i, x in enumerate(reordered_matrix):
reordered_matrix_2[i] = x[indeces]
if plot:
plt.subplots(figsize=(10, 10))
sns.heatmap(reordered_matrix_2, cmap='bwr', center=0, square=True)
plt.title(f'STTC computed on block {block}')
plt.xticks([])
plt.yticks([])
area_sizes = oursession.units['ecephys_structure_acronym'].value_counts().values
increments = np.zeros(len(area_sizes))
for i in range(len(area_sizes)):
increments[i] = sum(area_sizes[:i+1])
plt.hlines(increments, *plt.xlim(), colors='black')
plt.vlines(increments, *plt.ylim(), colors='black')
# plot area names in the middle of their section
for i, area in enumerate(ordered_area_indeces.keys()):
try:
plt.text(increments[i] - area_sizes[i]//2, increments[i] - area_sizes[i]//2, area, fontsize=12, color='black')
except:
break
plt.show()
return matrix
Visualization of the results¶
As can be seen above, we implemented functions that enable the computation of the coefficient for specific time intervals or for neuron in specific areas. This was done to make the calculations less computationally expensive and analyze specific sections of the data.
However, we were able to compute the metric for the whole dataset (complete spike trains and all areas), so we are going to show the results obtained from this.
compute_save_plot_all_ordered(oursession.structure_acronyms, plot=True)
array([[ 1. , 0.0750257 , 0.15931823, ..., 0.04312051,
0.04811847, -0.031305 ],
[ 0.0750257 , 1. , 0.14971448, ..., -0.01397203,
-0.03893257, -0.04039993],
[ 0.15931823, 0.14971448, 1. , ..., 0.00951153,
0.0338185 , 0.00144769],
...,
[ 0.04312051, -0.01397203, 0.00951153, ..., 1. ,
0.46211766, 0.22177768],
[ 0.04811847, -0.03893257, 0.0338185 , ..., 0.46211766,
1. , 0.21329055],
[-0.031305 , -0.04039993, 0.00144769, ..., 0.22177768,
0.21329055, 1. ]])
Studying the data¶
STTC_matrix = pickle.load(open("/Users/gabri/Desktop/STTC/['LP', 'DG', 'CA1', 'VISam', nan, 'APN', 'TH', 'Eth', 'CA3', 'VISrl', 'HPF', 'ProS', 'SUB', 'VISp', 'CA2', 'VISl', 'MB', 'NOT', 'LGv', 'VISal']_all.pkl", 'rb'))
To make it easier to analyze we are reorganizing the matrix to cluster neurons from the same area together in order of area size
area_indeces = pickle.load(open('/Users/gabri/Desktop/STTC/area_indeces.pkl', 'rb'))
ordered_area_indeces = dict(sorted(area_indeces.items(), key=lambda item: len(item[1]), reverse=True))
indeces = []
for _, area_indeces in ordered_area_indeces.items():
indeces.extend(area_indeces)
temp = STTC_matrix[indeces]
STTC_matrix_ordered = temp.copy()
for i, x in enumerate(temp):
STTC_matrix_ordered[i] = x[indeces]
ordered_area_names = list(ordered_area_indeces.keys())[:-2]
ordered_area_names
['VISam', 'CA1', 'VISp', 'VISal', 'VISl', 'LP', 'SUB', 'VISrl', 'DG', 'APN', 'CA3', 'NOT', 'LGv', 'ProS', 'TH', 'CA2', 'Eth', 'MB']
Now we compute the sizes of the areas that can be used as indeces to restrict the analysis to a specific part of the matrix
area_sizes = oursession.units['ecephys_structure_acronym'].value_counts().values
increments = np.zeros(len(area_sizes), dtype=int)
for i in range(len(area_sizes)):
increments[i] = sum(area_sizes[:i+1])
increments
array([135, 269, 363, 452, 530, 595, 654, 701, 732, 757, 778, 794, 810,
819, 821, 823, 824, 825])
From now on (in this section), we are going to consider only the first 8 biggest areas, since the remaining ones have a low number of neurons and may lead to inaccurate behaviours
ordered_area_names = ordered_area_names[:8]
Average strength of correlation between neurons in the same area¶
We compute the average absolute value of the correlation between neurons of the same area
average_STTC = dict()
for i, area in enumerate(ordered_area_names):
if i == 0:
length = increments[i]
average_STTC[area] = (np.sum(np.abs(STTC_matrix_ordered[:increments[i], :increments[i]])) - length) / ( length*(length-1) )
else:
length = (increments[i]-increments[i-1])
average_STTC[area] = (np.sum(np.abs(STTC_matrix_ordered[increments[i-1]:increments[i], increments[i-1]:increments[i]])) - length) / ( length*(length-1) )
average_STTC
{'VISam': 0.08255662344317649,
'CA1': 0.1001113941634092,
'VISp': 0.10531725606406021,
'VISal': 0.07665673167802156,
'VISl': 0.07007841320001308,
'LP': 0.07686991884234151,
'SUB': 0.0842123747695893,
'VISrl': 0.08313645214815685}
We sort them
sorted(average_STTC.items(), key=lambda x: x[1], reverse=True)
[('VISp', 0.10531725606406021),
('CA1', 0.1001113941634092),
('SUB', 0.0842123747695893),
('VISrl', 0.08313645214815685),
('VISam', 0.08255662344317649),
('LP', 0.07686991884234151),
('VISal', 0.07665673167802156),
('VISl', 0.07007841320001308)]
We can see that, on average, neurons are not very correlated.
To give a better interpretation to these values, it might be of more interest to see the average correlations also between different areas and order them
def area_STTC(area):
area_STTCs = dict()
if area != 'VISam':
area_start = increments[ordered_area_names.index(area)-1]
else:
area_start = 0
area_end = increments[ordered_area_names.index(area)]
for i, area_name in enumerate(ordered_area_names):
if area_name == area:
area_STTCs[area_name] = average_STTC[area_name]
continue
if i == 0:
area_STTCs[area_name] = np.mean(np.abs(STTC_matrix_ordered[:increments[i], area_start:area_end]))
else:
area_STTCs[area_name] = np.mean(np.abs(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end]))
return area_STTCs
STTC_averages = dict()
for area in ordered_area_names:
STTC_averages[area] = sorted(area_STTC(area).items(), key=lambda x: x[1], reverse=True)
for area, corrs in STTC_averages.items():
print(area, ': ', corrs)
VISam : [('VISam', 0.08255662344317649), ('VISp', 0.07246637608277162), ('VISal', 0.07078178263623246), ('VISrl', 0.06918459347278622), ('CA1', 0.06682707142135616), ('SUB', 0.0654369121733156), ('VISl', 0.0634595320180365), ('LP', 0.062225983051985116)]
CA1 : [('CA1', 0.1001113941634092), ('SUB', 0.08039862642473626), ('VISrl', 0.07019368459990617), ('VISam', 0.06682707142135616), ('LP', 0.06499226682240566), ('VISp', 0.06278932375997498), ('VISal', 0.05769961157971161), ('VISl', 0.052464472986628576)]
VISp : [('VISp', 0.10531725606406021), ('VISam', 0.07246637608277162), ('VISal', 0.0723934542401987), ('VISrl', 0.06645831784213206), ('VISl', 0.06537822488422153), ('LP', 0.06317560386805418), ('CA1', 0.06278932375997498), ('SUB', 0.061344691312894585)]
VISal : [('VISal', 0.07665673167802156), ('VISp', 0.0723934542401987), ('VISam', 0.07078178263623246), ('VISrl', 0.06579576204969023), ('VISl', 0.06424429769528353), ('SUB', 0.05887962259519398), ('CA1', 0.05769961157971159), ('LP', 0.054870603464347724)]
VISl : [('VISl', 0.07007841320001308), ('VISp', 0.06537822488422151), ('VISal', 0.06424429769528353), ('VISam', 0.06345953201803652), ('VISrl', 0.06342364346999328), ('SUB', 0.05473712451935701), ('CA1', 0.05246447298662857), ('LP', 0.05024867914900693)]
LP : [('LP', 0.07686991884234151), ('SUB', 0.0660672236065325), ('CA1', 0.06499226682240566), ('VISp', 0.06317560386805417), ('VISam', 0.062225983051985116), ('VISrl', 0.05559671817688621), ('VISal', 0.05487060346434773), ('VISl', 0.05024867914900692)]
SUB : [('SUB', 0.0842123747695893), ('CA1', 0.08039862642473629), ('VISrl', 0.06799762846617104), ('LP', 0.0660672236065325), ('VISam', 0.06543691217331558), ('VISp', 0.06134469131289457), ('VISal', 0.05887962259519398), ('VISl', 0.05473712451935701)]
VISrl : [('VISrl', 0.08313645214815685), ('CA1', 0.07019368459990619), ('VISam', 0.06918459347278622), ('SUB', 0.06799762846617104), ('VISp', 0.06645831784213206), ('VISal', 0.06579576204969023), ('VISl', 0.06342364346999328), ('LP', 0.05559671817688621)]
There are two important results that can be drawn from this. Firstly, we can see that every area is mostly correlated within itself. Secondly, we can also see that VIS areas tend to be more correlated between themselves rather than with other areas, while CA1, SUB and LP form a secondary group.
It is hard to extrapolate further results from this, since this correlation data doesn't imply by no means that the neurons are connected. Nontheless, the fact that this is a recurrent result for all areas, might follow from the presence of an underlying process or structure at the neural level that could be responsible for this.
Looking for maximal and minimal correlation between areas¶
Another interesting thing that we could look at is the maximal and minimal correlation between neurons across areas. This will probably catch outliers, but it is still interesting to see how similar or opposite two spike trains can be.
def area_STTC_max(area):
area_STTCs = dict()
if area != 'VISam':
area_start = increments[ordered_area_names.index(area)-1]
else:
area_start = 0
area_end = increments[ordered_area_names.index(area)]
for i, area_name in enumerate(ordered_area_names):
if i == 0:
if area_name == area:
area_STTCs[area_name] = np.max(STTC_matrix_ordered[:increments[i], area_start:area_end]-np.diag(np.diag(STTC_matrix_ordered[:increments[i], area_start:area_end])))
else:
area_STTCs[area_name] = np.max(STTC_matrix_ordered[:increments[i], area_start:area_end])
else:
if area_name == area:
area_STTCs[area_name] = np.max(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end]-np.diag(np.diag(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end])))
else:
area_STTCs[area_name] = np.max(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end])
return area_STTCs
STTC_maxs = dict()
for area in ordered_area_names:
STTC_maxs[area] = sorted(area_STTC_max(area).items(), key=lambda x: x[1], reverse=True)
for area, corrs in STTC_maxs.items():
print(area, ': ', corrs)
VISam : [('CA1', 0.6651158975296159), ('VISam', 0.6350849368523841), ('VISp', 0.5800964689197329), ('LP', 0.5598944905807833), ('SUB', 0.5476996795227088), ('VISal', 0.5454190445814023), ('VISl', 0.536936135487955), ('VISrl', 0.5243718541081331)]
CA1 : [('CA1', 0.7229560496911289), ('VISam', 0.6651158975296159), ('VISp', 0.639178207718907), ('SUB', 0.6080687741427093), ('VISrl', 0.5948133724252141), ('LP', 0.5919274306095716), ('VISl', 0.5347432576074619), ('VISal', 0.4847897527930829)]
VISp : [('VISp', 0.7197770797952999), ('CA1', 0.639178207718907), ('VISal', 0.5850056720337684), ('VISam', 0.5800964689197329), ('VISl', 0.5419931163703515), ('LP', 0.5363600275458325), ('SUB', 0.44749243644458014), ('VISrl', 0.4173519614925355)]
VISal : [('VISal', 0.7116330546720138), ('LP', 0.6033140211296097), ('VISp', 0.5850056720337684), ('VISl', 0.5836344014645951), ('VISam', 0.5454190445814023), ('SUB', 0.5110129379484085), ('VISrl', 0.5016421738810275), ('CA1', 0.4847897527930829)]
VISl : [('LP', 0.6925808517866816), ('VISal', 0.5836344014645951), ('SUB', 0.5637227273834009), ('VISp', 0.5419931163703515), ('VISam', 0.536936135487955), ('CA1', 0.5347432576074619), ('VISl', 0.49897339782199657), ('VISrl', 0.4590807307862382)]
LP : [('LP', 0.805278258710598), ('VISl', 0.6925808517866816), ('VISal', 0.6033140211296097), ('CA1', 0.5919274306095716), ('SUB', 0.5867792358722027), ('VISam', 0.5598944905807833), ('VISp', 0.5363600275458325), ('VISrl', 0.37015812323930314)]
SUB : [('CA1', 0.6080687741427093), ('LP', 0.5867792358722027), ('VISl', 0.5637227273834009), ('VISrl', 0.5493323238278809), ('VISam', 0.5476996795227088), ('SUB', 0.536850745243445), ('VISal', 0.5110129379484085), ('VISp', 0.44749243644458014)]
VISrl : [('VISrl', 0.868529791741372), ('CA1', 0.5948133724252141), ('SUB', 0.5493323238278809), ('VISam', 0.5243718541081331), ('VISal', 0.5016421738810275), ('VISl', 0.4590807307862382), ('VISp', 0.4173519614925355), ('LP', 0.37015812323930314)]
def area_STTC_min(area):
area_STTCs = dict()
if area != 'VISam':
area_start = increments[ordered_area_names.index(area)-1]
else:
area_start = 0
area_end = increments[ordered_area_names.index(area)]
for i, area_name in enumerate(ordered_area_names):
if i == 0:
if area_name == area:
area_STTCs[area_name] = np.min(STTC_matrix_ordered[:increments[i], area_start:area_end]-np.diag(np.diag(STTC_matrix_ordered[:increments[i], area_start:area_end])))
else:
area_STTCs[area_name] = np.min(STTC_matrix_ordered[:increments[i], area_start:area_end])
else:
if area_name == area:
area_STTCs[area_name] = np.min(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end]-np.diag(np.diag(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end])))
else:
area_STTCs[area_name] = np.min(STTC_matrix_ordered[increments[i-1]:increments[i], area_start:area_end])
return area_STTCs
STTC_mins = dict()
for area in ordered_area_names:
STTC_mins[area] = sorted(area_STTC_min(area).items(), key=lambda x: x[1])
for area, corrs in STTC_mins.items():
print(area, ': ', corrs)
VISam : [('CA1', -0.430786862411099), ('VISp', -0.3773482044469345), ('VISal', -0.3510943437121914), ('SUB', -0.34147698357597467), ('VISam', -0.3357315653943669), ('VISrl', -0.3141679115156154), ('VISl', -0.31220707100558076), ('LP', -0.25641605367963805)]
CA1 : [('VISrl', -0.4353173805662587), ('VISam', -0.430786862411099), ('CA1', -0.3798163712676609), ('SUB', -0.34876007655892), ('VISp', -0.34655454124303753), ('VISl', -0.33840381880335446), ('VISal', -0.32544032059193484), ('LP', -0.2842342581065946)]
VISp : [('VISam', -0.3773482044469345), ('CA1', -0.34655454124303753), ('SUB', -0.31920061186430604), ('VISrl', -0.3157738865228304), ('VISp', -0.2872294636971585), ('VISal', -0.2861474743007289), ('LP', -0.2751852609753331), ('VISl', -0.23298831816302507)]
VISal : [('VISl', -0.3655834494929514), ('VISam', -0.3510943437121914), ('VISal', -0.34607952046700446), ('VISrl', -0.3411993649531014), ('SUB', -0.32882782700336877), ('CA1', -0.32544032059193484), ('LP', -0.31912271296107664), ('VISp', -0.2861474743007289)]
VISl : [('VISal', -0.3655834494929514), ('LP', -0.3489079364386581), ('CA1', -0.33840381880335446), ('VISam', -0.31220707100558076), ('VISrl', -0.3120568676745754), ('VISl', -0.2696441045757123), ('SUB', -0.25150597571612704), ('VISp', -0.23298831816302507)]
LP : [('LP', -0.38763823869604636), ('VISl', -0.3489079364386581), ('VISal', -0.31912271296107664), ('SUB', -0.2919104816513442), ('CA1', -0.2842342581065946), ('VISp', -0.2751852609753331), ('VISrl', -0.2578374035862638), ('VISam', -0.25641605367963805)]
SUB : [('SUB', -0.3553883995007206), ('CA1', -0.34876007655892), ('VISam', -0.34147698357597467), ('VISal', -0.32882782700336877), ('VISp', -0.31920061186430604), ('LP', -0.2919104816513442), ('VISrl', -0.26457427969283887), ('VISl', -0.25150597571612704)]
VISrl : [('CA1', -0.4353173805662587), ('VISal', -0.3411993649531014), ('VISrl', -0.3192718625720402), ('VISp', -0.3157738865228304), ('VISam', -0.3141679115156154), ('VISl', -0.3120568676745754), ('SUB', -0.26457427969283887), ('LP', -0.2578374035862638)]
It is interesting to see that both for the maximal and minimal correlations we have signifacntly different results. Not only it is not possible to distinguish the two structures that we had before, but in half of the areas for the maximal and in three quarters for the minimal we see that the strongest correlation between a neuron of that area and another neuron is not with a neuron belonging to the same area. The combination of these two changes, together with the fact that the results deviate significantly from the mean of the absolute value computed above, lead us to think that studying this correlations would be misleading, since maximal and minimal correlations are with high probability outliers.
Additional plot
Since we were able to compute the correlation matrix for the whole length of the spike trains we decided to focus our analysis on that data. In this way we were able to abstract from specific stimuli presentations and gather the most general results possible.
However, we thought it could be interesting to show also the plot for the second block of stimuli presentations (Drifting gabors), which presented the same results described above in a more visible way. In particular, what is interesting here is the presence of higher correlations for blocks in the diagonal, which are correlations within the same area, and the presence of a darker area for the correlations between VISam, VISp, VISal and VISl with respect to a lighter area in the blocks of correlation between CA1 and the afore mentioned areas.
compute_save_plot_block_ordered(2, plot=True);
Final remark
In all the correlation matrices we plotted, we always noted the presence of some neurons that were correlated with strength around 0.5 with all other neurons. This is an interesting result that we tryed to further analyzed, but we did not reach any relevant conclusion on why this might be the case. In any case, this did not happen for neurons in the 8 biggest areas that we selected, so our results were not byased by the presence of such neurons, which were located primarily in APN and TH.
3. Maximal entropy distribution over configurations of activity¶
The model we will implement in this section aims at capturing patterns of activity of neurons through a probability distribution. It is the contact point between our analysis on the correlation, explained in the above sections, and the last part which will be about predicting neuronal activity. We use a maximal entropy model, since it returns the most unbiased probability distribution, and we constrain the mean and correlation of the probability distribution we want to find to match the empirical mean and correlation computed from the data. More precisely the structure will be the following (reference: https://arxiv.org/pdf/2112.14735):
We select one type of stimulus and construct $T$ binary vectors of length $N$:
$\{x_i\}^\mu = \{x_1, x_2, ...., x_N\}^\mu$, $\mu = 1,...,T$
where $N$ in the number of neurons we are considering and $T$ is the number of times that stimulus is presented.
Every vector will take a value $x_i^\mu = \{+1, -1\}$ indicating whether the $i_{th}$ neuron spiked or not during presentation $\mu$.
Through these vectors we can compute the values of empirical mean and correlation as follows:
$m_{i}^{emp} = \frac{1}{T}\sum_{\mu}x_i^\mu$
$C_{ij}^{emp} = \frac{1}{T}\sum_{\mu}x_i^\mu x_j^\mu$
Now we want to find the probability distribution $P$ that satisfies the following optimization problem:
$max$ $S := - \sum_{\{x_i\}} P(\{x_i\})logP(\{x_i\})$
$subject$ $to$
$\langle x_i \rangle := \sum_{\{x_i\}} x_i P(\{x_i\}) = m_{i}^{emp}$
$\langle x_i x_j \rangle := \sum_{\{x_i\}} x_i x_j P(\{x_i\}) = C_{ij}^{emp}$
We can solve this optimization problem by taking the partial derivatives of the Lagrangian and setting it to zero. Doing so, we obtain the following expression, which corresponds to the Boltzmann distribution in the setting of the Spin glass model:
$P(\{x_i\}) = \frac{1}{Z} exp[-E(\{x_i\})]$
where $E(\{x_i\}) = - \sum_{i}h_i x_i - \frac{1}{2} \sum_{i,j}J_{ij} x_i x_j$
and $Z = \sum_{\{x_i\}}exp[-E(\{x_i\})]$
$h_i$ can be seen as local magnetic field and $J_{ij}$ as coupling constants. Thus, we want to infer $h_i$ and $J_{ij}$ to have the compelte probability distribution.
To do so, the best known method is the so called Boltzman machine, which comprises a Monte Carlo simulation step, done to estimate $\langle x_i \rangle$ and $\langle x_i x_j \rangle$ from given $h_i$ and $J_{ij}$, and a gradient descent step to update the parameters to match our initial equations. Nonetheless, this is quite computationally expensive, so we opted for an approximation method known as Mean field learning. Estimating $h_i$ and $J_{ij}$ becomes just an algebraic computation and it follows the following steps:
- if $C^{emp}$ is not full rank, we do an initial step wich is to consider a matrix $C = (1-\alpha)\cdot C^{emp} + \alpha \cdot I$, where $I$ is the identity matrix and $\alpha \in (0,1)$ is a hyperparameter that can be tuned
- Then, we compute $J = - [C]^{-1}$ and set the diagonal to $0$
- Finally, we can compute $h = - J \cdot m^{emp} + tanh^{-1}[m^{emp}]$
This is already a pretty good approximation, but the metohd could be extended to increase precision. A popular way, which could be explored in furteher implementations, is to use the values obtained with mean field as initialization values for the pseudo-likelihood method, and then what one obtains as a result there, as initialization for the Boltzman machine described above.
We will firstly consider the neurons in one brain area (we chose VISam because in our dataset it is the most numerous and thus, measures on neurons themselves can be considered as strong, but it can be repeated in other brain areas) filtering neuronal activation during the presentation of the stimulus "static gratings" with a particular orientation. Doing so, we can understand weather some effective local interactions might be present and, as we said, it might be possible to capture them through the correlation measure.
We will also compare our statistical physics model with a random model (i.e. random patterns of activity which present the same empirical mean across trials, but with zero correlation being drawn independently) to see if there is indeed an improvement.
Lastly, since we are also interested in studying how different areas relate to each other and how results change when moving across them, we will include in our analysis neurons in other brain areas to infer the distribution of configuration in VISam. In particular, we will consider one brain area which is spatially close to VISam and one which is instead further away and compare the results to try to extract some meaningful conclusion
We start by defining some functions to make the computations described easier to be performed multiple times
def create_configuration(n):
"""
Function to create a random configuration of n spins (neurons)
"""
configuration = np.random.choice([1, -1], size=n)
return configuration
def energy_configuration(s,h,J):
"""
Function to calculate the energy of a configuration s
"""
single_term = np.dot(h,s)
double_term = s @ J @ s.T
E = - single_term - (double_term)/2
return E
def delta_E(i,s,delta_s, h, J):
"""
Function to calculate the difference in energy starting from configuration s
Parameters
----------
i : position of the spin flip proposed by the algorithm
s : initial configuration
delta_s : difference between the initial and final spin value in position i
h : array of values of local fields
J : matrix of values of coupling constants
"""
single_delta = h[i]*delta_s
double_delta = np.dot(J[i],delta_s*s)
return single_delta+double_delta
def monte_carlo_simulation(n,nsteps,h,J, mean_corr = False, configs =False, T=1):
"""
Function to run monte carlo simulations using metropolis algorithm
Parameters
----------
n : number of neurons considered
h : array of values of local fields
J : matrix of values of coupling constants
mean_corr : if set to True it returns the final estimation of mean and correlation
configs : if set to True it returns all the configurations obtained through sampling
"""
configuration = create_configuration(n)
res = dict()
si = configuration.copy()
sisj = np.outer(configuration.copy(), configuration.copy())
spins = []
spins.append(configuration.copy())
effective_steps = 1
for j in range(nsteps):
i = np.random.randint(n) ## random sampling of the position of the spin to flip
s_0 = configuration[i]
s_f = - s_0
dE = delta_E(i,configuration,(-s_f+s_0),h,J) ## calculation of the change in energy with the selected spin flip
if dE < 0 or np.random.random() < np.exp(-(dE/T)): ## metropolis
configuration[i] = s_f
if (j >= 50000) and (j % 1000 == 0): ## to make sampling more reliable we start saving configurations after the 50000th step and every 1000 steps
effective_steps+=1
config_copy = configuration.copy()
spins.append(config_copy)
si += config_copy
sisj += np.outer(config_copy, config_copy)
if j % 100000 == 0:
print(f'Step {j} out of {nsteps}')
m_e = si/effective_steps ## empirical mean
c_e = sisj/effective_steps - np.outer(m_e,m_e)
res['effective_steps'] = effective_steps
if mean_corr:
res['m_e'] = m_e
res['c_e'] = c_e
if configs:
res['spin_configs'] = spins
return res
def mean_field_learning(n,m_real,c_real,alpha,full_rank=False):
"""
Function to perform mean field learning.
Used to infer the values of h and J starting from a mean field theory assumption.
Parameters
----------
n : number of neurons considered
m_real : empirical values of mean
c_real : empirical value of correlation
alpha : weight given to identity matrix in linear combination with correlation matrix (hyperparameter)
full_rank : bool to indicate wether the matrix is full rank i.e. when there is no necessity of rescaling with identity
"""
if not full_rank:
c = (1-alpha)*c_real + alpha*np.identity(n) ## taking the empirical correlation and "making it invertible"
if full_rank:
c = c_real
c_inv = np.linalg.inv(c) ## inverting the empirical correlation matrix and changing sign
c_inv -= np.diag(np.diag(c_inv)) ## setting the diagonal to zero
J = - c_inv ## this is the coupling constants matrix
## if m_i real is 1 or -1 the arctanh explodes so we rescale it by a small epsilon
for i,neuron in enumerate(m_real):
if np.abs(neuron) == 1:
m_real[i] -= neuron * 0.01
h = - (J@m_real) + np.arctanh(m_real)
return h,J
def empirical_mean_corr(region_ids, df_stimulus):
"""
Compute the empirical mean and correlation of neurons across different presentations of the same stimulus
Parameters
----------
region_ids : oursession.units[oursession.units["ecephys_structure_acronym"]==area].index.values where area can be one of the visual areas
(region_ids can be comprehensive of two or more brain areas)
df_stimulus : oursession.get_stimulus_table("stimulus") where stimulus is the type of stimulus chosen
"""
si = np.zeros(len(region_ids))
sisj = np.zeros(shape=(len(region_ids),len(region_ids)))
for i in range(df_stimulus.shape[0]):
activity_temp = np.ones(len(region_ids))
for j,id in enumerate(region_ids):
spikes = oursession.spike_times[id]
masking = spikes[(spikes > df_stimulus.iloc[i,1]) & (spikes < df_stimulus.iloc[i,2])]
if len(masking) == 0:
activity_temp[j] = -1
si += activity_temp
sisj += np.outer(activity_temp, activity_temp)
m_real = si/(df_stimulus.shape[0]) ## empirical mean
c_real = sisj/(df_stimulus.shape[0]) ## empirical correlation
return m_real,c_real
def activity(region_ids, df_stimulus):
"""
Detect the activity of neurons in a given area, in a small interval of time when shown a particular type of stimulus (1 : spike, -1 : no spike)
Parameters
----------
region_ids : oursession.units[oursession.units["ecephys_structure_acronym"]==area].index.values where area can be one of the visual areas
(region_ids can be comprehensive of two or more brain areas)
df_stimulus : oursession.get_stimulus_table("stimulus") where stimulus is the type of stimulus chosen and it can be selected one single type of it (e.g. only positive flashes)
"""
activity = []
for i in range(df_stimulus.shape[0]):
activity_temp = np.ones(len(region_ids))
for j,id in enumerate(region_ids):
spikes = oursession.spike_times[id]
masking = spikes[(spikes > df_stimulus.iloc[i,1]) & (spikes < df_stimulus.iloc[i,2])]
if len(masking) == 0:
activity_temp[j] = -1
activity.append(activity_temp)
return activity
def calculate_h_J(region_ids,df_stimulus,alpha,full_rank=False):
"""
Function that returns the values of h and J
Parameters
----------
region_ids : oursession.units[oursession.units["ecephys_structure_acronym"]==area].index.values where area can be one of the visual areas
(region_ids can be comprehensive of two or more brain areas)
df_stimulus : oursession.get_stimulus_table("stimulus") where stimulus is the type of stimulus chosen and it can be selected one single type of it (e.g. only positive flashes)
alpha : weight given to identity matrix in linear combination with correlation matrix (hyperparameter)
full_rank : bool to indicate wether the matrix is full rank i.e. when there is no necessity of rescaling with identity
"""
m_real,c_real = empirical_mean_corr(region_ids,df_stimulus)
n = len(region_ids)
h,J = mean_field_learning(n,m_real,c_real,alpha,full_rank)
return h,J
We must choose some metric to see how the model performs.
Once we have computed all the parameters of our probability distribution, we can do some parameter-free estimations and compare the model with the data.
We chose to compute (and plot):
- the probability that K out of N neurons are active at the same time (denoted P(K)) together with the same probability but derived empirically from the data
- the probability to have a certain energy configuration (denoted P(E)) together with the same probability but derived empirically from the data
Indeed, the following functions perform the computations and relative plots of histograms and probabilities for points 1. and 2. In particular:
For point 1:
- histogram displays the value of K (in steps of 2) in the x-axis VS the number of configurations that present that number of active neurons on the y-axis
- probability plot dislays the the value of K (in steps of 2) in the x-axis VS the probability of having that number of active neurons on the y-axis
For point 2:
- histogram displays the value E of energies (in steps of 2 from the minimum to the maximum of the empirically observed ones) in the x-axis VS the number of configurations that present that value of energy on the y-axis
- probability plot dislays the value E of energies (in steps of 2 from the minimum to the maximum of the empirically observed ones) in the x-axis VS the probability of having that value of energy on the y-axis
def K_outof_N_histogram(s,empirical_activity_K):
configs = s['spin_configs']
## making configurations 0-1 for easier calculations later
for i,x in enumerate(configs):
configs[i] = (configs[i]+1)//2
## for every configuration sampled, computes the number of active neurons in that configuration
n_active_neurons = []
for i,x in enumerate(configs):
n_active_neurons.append(np.sum(x))
## making empirical activity 0-1 for easier calculations later
for i,x in enumerate(empirical_activity_K):
empirical_activity_K[i] = (empirical_activity_K[i]+1)//2
## for every empirical configuration, computes the number of active neurons in that configuration
emp_active_neurons = []
for i,x in enumerate(empirical_activity_K):
emp_active_neurons.append(np.sum(x))
## creating the bins for the histogram : from min of the empirical activity to max in step of 1
## min of empirical activity is the minimum number of simultaneously active neurons
## max of empirical activity is the maximum number of simultaneously active neurons
bins_K = np.arange(min(emp_active_neurons),max(emp_active_neurons)+2,2)
counts_K, edges_K = np.histogram(n_active_neurons, bins=bins_K)
counts_K_emp, edges_K_emp = np.histogram(emp_active_neurons, bins=bins_K)
edges_K = edges_K[:-1]
edges_K_emp = edges_K_emp[:-1]
return counts_K, counts_K_emp, bins_K, edges_K
def plot_K_outof_N_histogram(counts_K,counts_K_emp,bins_K,area:str,stimulus:str):
fig,ax = plt.subplots(1,2,figsize=(15,5))
ax[0].stairs(counts_K, bins_K, color='lightblue')
ax[1].stairs(counts_K_emp, bins_K, color='pink')
for x in [0,1]:
ax[x].set_xlabel('K')
ax[x].set_ylabel('# configurations with K active neruons')
ax[0].set_title('Model')
ax[1].set_title('Data')
plt.suptitle(f'{area}, {stimulus}')
plt.show()
def K_outof_N_prob(s,counts_K,counts_K_emp,df_stimulus):
## from the counts, computing the probabilities, both in the model and empirical cases
prob_K = np.array(counts_K)/s['effective_steps']
prob_K_emp = np.array(counts_K_emp)/df_stimulus.shape[0]
return prob_K,prob_K_emp
def plot_K_outof_N_prob(prob_K,prob_K_emp,edges_K,area:str,stimulus:str):
plt.figure()
plt.plot(edges_K,prob_K,color='lightblue',label='model')
plt.plot(edges_K,prob_K_emp,color='pink',label='data')
plt.xlabel('K simultaneous active neurons')
plt.ylabel('P(K)')
plt.suptitle(f'{area}, {stimulus}')
plt.legend()
plt.show()
def E_histogram(s,h,J,empirical_activity_K):
configs = s['spin_configs']
## the process is the same as before but computing the energies of the configurations
energies = []
for config in configs:
energies.append(energy_configuration(config,h,J))
emp_energies = []
for config in empirical_activity_K:
emp_energies.append(energy_configuration(config,h,J))
bins_E = np.arange(min(emp_energies),max(emp_energies),2)
counts_E, edges_E = np.histogram(energies, bins=bins_E)
counts_E_emp, edges_E_emp = np.histogram(emp_energies, bins=bins_E)
edges_E = edges_E[:-1]
edges_E_emp = edges_E_emp[:-1]
return counts_E, counts_E_emp, bins_E, edges_E
def plot_E_histogram(counts_E,counts_E_emp,bins_E,area:str,stimulus:str):
fig,ax = plt.subplots(1,2,figsize=(15,5))
ax[0].stairs(counts_E, bins_E, color='lightblue')
ax[1].stairs(counts_E_emp, bins_E, color='pink')
for x in [0,1]:
ax[x].set_xlabel('E')
ax[x].set_ylabel('# configurations with energy E')
ax[0].set_title('Model')
ax[1].set_title('Data')
plt.suptitle(f'{area}, {stimulus}')
plt.show()
def E_prob(s,counts_E,counts_E_emp,df_stimulus):
prob_E = np.array(counts_E)/s['effective_steps']
prob_E_emp = np.array(counts_E_emp)/df_stimulus.shape[0]
return prob_E,prob_E_emp
def plot_E_prob(prob_E,prob_E_emp,edges_E,area:str,stimulus:str):
plt.figure()
plt.plot(edges_E,prob_E,color='lightblue',label='model')
plt.plot(edges_E,prob_E_emp,color='pink',label='data')
plt.xlabel('Energy E')
plt.ylabel('P(E)')
plt.legend()
plt.title(f'{area}, {stimulus}')
plt.show()
3.1 VISam -> VISam, static gratings¶
We chose to consider the activity of neurons during the presentation of static gratings since it is the stimulus for which we see a higher correlation of activity. Furthermore, we decided to focus on a particular orientation for the following 2 reasons:
- hope, again, for a stronger correlation in the activity, given the same entity of the type of stimulus considered, which would be more coherent with the assumptions and constraints of our model
- gratings are presented around 6000 times, so to have acomparable sample of monte carlo - simulated configurations, we would need a lot of iterations in the algorithm. Instead, a single orientation is seen around 1000 times which, compared to other stimuli, is still a fair sample to compute empirical probabilities but at the same time would make computations ~ 6 times shorter
Clearly, this model has two main limitations:
- this number is still not enough to conclude that it is a good "empirical probability", so one could hope for better results for example merging other sessions and, thus, considering a higer amount of observations
- the time bin between two consecutive stimuli is relatively big compared to the time of a spike. Thus, since in our model we are just interested in whether a neuron spiked (+1) or not (-1), we are considering neurons that spike > 1 times in the same way as neurons that just spike 1 time, with a risk of loosing important information. Therefore, it would be interesting for further research to repeat the analysis considering a time bin small enough to have maximum one spike per neuron, or directly considering the firing rates instead of binary vectors.
df_stimulus = oursession.get_stimulus_table("static_gratings")
df_stimulus = df_stimulus[df_stimulus['orientation'] == 30]
As we said in the introduction, we restricted our analysis on extracting patterns of activity in VISam, being it the richest region in our dataset
area = 'VISam'
region_ids_VISam = oursession.units[oursession.units["ecephys_structure_acronym"]==area].index.values
h,J = calculate_h_J(region_ids=region_ids_VISam, df_stimulus=df_stimulus, alpha=0.7)
print('Local fields: \n\n', h)
print('\n\n Coupling constants: \n\n', J)
Local fields: [ 2.59083484e-01 2.66342125e-02 4.86465327e-01 -1.08180469e-01 5.11262057e-01 -2.13161809e-03 1.02087384e-01 9.48455336e-03 -1.32653521e-01 7.60690201e-02 -2.96514142e-01 2.03269035e-01 -2.06844425e-02 5.72143940e-03 -1.15785327e-02 -8.71256462e-05 -1.36322001e-02 -3.07563694e-02 -3.41492677e-02 -2.35054268e-02 5.95295788e-02 9.45378047e-02 4.18903887e-01 -1.34663007e-01 5.22469821e-02 9.13114888e-01 2.39397923e-01 1.43219709e+00 2.11190136e+00 3.99624011e-02 2.55760009e-03 4.99652862e-04 7.83740574e-01 1.41575194e-01 3.17415422e-01 -7.48999419e-03 -2.09333056e-02 1.17157611e-02 2.95551518e-03 3.38123128e-02 -2.43789627e-01 -4.23205642e-01 -8.93776835e-02 1.23520736e+00 -4.05809142e-02 -3.77849590e-03 -4.69356300e-01 8.46962276e-03 -7.45287235e-03 -6.59756776e-02 -3.32663239e-01 -4.48348355e-01 -1.95389267e-01 -2.61709123e-02 -1.56077367e-02 7.29034573e-02 -7.47071611e-03 -1.96078055e-02 -1.48562511e-02 -5.16875440e-02 -1.23783177e-01 -6.52016503e-02 -2.50176981e-02 7.89038447e-01 1.97901899e-01 -2.11556230e-01 -7.49998542e-02 -1.96992164e-01 9.43898199e-03 -8.58222121e-03 -4.08821759e-02 -1.57820280e-01 -1.23811461e-01 6.62398783e-03 -5.75316195e-03 9.92912589e-03 -1.20643042e-02 -2.29997352e-02 -1.21445413e-02 8.79557911e-01 3.36100816e-02 -1.36965656e-02 -2.35091446e-02 -5.68720128e-04 -1.25922634e-02 -1.32186108e-02 3.07308875e-01 2.92562414e-04 1.49430908e-02 -1.57557494e-02 -2.46063573e-02 -5.46432877e-03 -7.25882822e-03 3.05571349e-02 4.91836430e-01 1.03523835e-01 -3.18744649e-01 1.85131226e-01 4.11949884e-01 6.37631336e-02 7.12982368e-01 -8.06817502e-02 1.32424340e+00 1.51611194e-02 -5.46470210e-01 -5.69291238e-03 -2.40964195e-01 -5.36742174e-03 2.28903689e-02 -3.23948766e-02 -6.07332802e-03 -1.73573479e-03 9.10502776e-03 1.92278647e-01 -1.16333482e-02 -2.11996188e-02 -1.01551940e-01 -1.99366638e-02 -2.97448689e-01 -9.96319060e-03 2.77116580e-02 -3.86098095e-02 -1.29040400e-02 -2.09304054e-02 -1.36955524e-01 -6.50141896e-02 -1.72096652e-02 -4.25860116e-03 -6.71168746e-03 -1.81902842e-01 -1.05341533e-01 1.35108032e-01 3.12783364e-01 -5.44244230e-02 8.73698323e-01] Coupling constants: [[-0. 0.009083 0.02239585 ... 0.01371031 -0.0195145 0.02083298] [ 0.009083 -0. 0.00755833 ... 0.01536652 0.00337205 0.0074803 ] [ 0.02239585 0.00755833 -0. ... 0.01666217 -0.00865041 0.021728 ] ... [ 0.01371031 0.01536652 0.01666217 ... -0. 0.00111876 0.02686237] [-0.0195145 0.00337205 -0.00865041 ... 0.00111876 -0. -0.01086308] [ 0.02083298 0.0074803 0.021728 ... 0.02686237 -0.01086308 -0. ]]
We choose a number of monte carlo steps in order for us to generate a sample of configurations comparable wit the true one coming from our dataset
mc_steps = 1000000
## sampling spin configurations with monte carlo simulation
s = monte_carlo_simulation(n=len(region_ids_VISam), nsteps=mc_steps, h=h, J=J, configs=True)
## computing the empirical activity with the function defined earlier
empirical_activity_K = activity(region_ids=region_ids_VISam, df_stimulus=df_stimulus)
Step 0 out of 1000000 Step 100000 out of 1000000 Step 200000 out of 1000000 Step 300000 out of 1000000
Step 400000 out of 1000000 Step 500000 out of 1000000 Step 600000 out of 1000000 Step 700000 out of 1000000 Step 800000 out of 1000000 Step 900000 out of 1000000
pickle.dump(s, open('/Users/isottamagistrali/Library/Mobile Documents/com~apple~CloudDocs/Bocconi/Semestre 3.2/Neurosciencee/data_mcs/mcs_VISam_gratings.pkg', 'wb'))
Probability that K out of N neurons are active at the same time:
counts_K,counts_K_emp,bins_K,edges_K = K_outof_N_histogram(s=s, empirical_activity_K=empirical_activity_K)
plot_K_outof_N_histogram(counts_K=counts_K, counts_K_emp=counts_K_emp, bins_K=bins_K, area='VISam', stimulus='gratings')
prob_K,prob_K_emp = K_outof_N_prob(s=s, counts_K=counts_K, counts_K_emp=counts_K_emp, df_stimulus=df_stimulus)
plot_K_outof_N_prob(prob_K=prob_K, prob_K_emp=prob_K_emp, edges_K=edges_K, area='VISam', stimulus='gratings')
From what we can see in the plot above, the model is pretty good at capturing the general trend of the empirical probability. As it is often the case with monte carlo simulations, it oversamples more probable configurations giving less importance to the tails of the distribution. Indeed, some probability mass that in the true data lies in the tails, is instead put in the mean value of the distribution. There are methods that account for this issue, like considering a "temperature" in the exponent of the probability distribution (which would correspond to the physical distribution) or many others, so this would be an interesting starting point for further research.
Probability to have a certain energy configuration:
counts_E,counts_E_emp,bins_E,edges_E=E_histogram(s=s, h=h, J=J, empirical_activity_K=empirical_activity_K)
plot_E_histogram(counts_E=counts_E, counts_E_emp=counts_E_emp, bins_E=bins_E, area='VISam', stimulus='gratings')
prob_E,prob_E_emp=E_prob(s=s, counts_E=counts_E, counts_E_emp=counts_E_emp, df_stimulus=df_stimulus)
plot_E_prob(prob_E=prob_E, prob_E_emp=prob_E_emp, edges_E=edges_E, area='VISam', stimulus='gratings')
The energy resents a bit more of our approximation, but we believe the trend is captured still pretty well
We can compare our model with a randomly generated model of neurons activation, wihch means a model which matches our empirical mean but has 0 correlation. This is a way to see wether adding the correlation constraint can capture a deeper level of complexity in our biological network
m_real,c_real = empirical_mean_corr(region_ids=region_ids_VISam, df_stimulus=df_stimulus)
proba = (m_real + 1)/2
sample = np.zeros((len(region_ids_VISam),mc_steps//1000))
for neuron in range(len(region_ids_VISam)):
sample[neuron] = np.random.choice([0,1], mc_steps//1000, p = [1-proba[neuron], proba[neuron]])
sample = sample.T
s_prime = dict()
s_prime['spin_configs'] = sample
counts_K_prime,counts_K_emp_prime,bins_K_prime,edges_K_prime = K_outof_N_histogram(s_prime, empirical_activity_K)
fig,ax = plt.subplots(1,3,figsize=(15,5))
ax[0].stairs(counts_K, bins_K, color='lightblue')
ax[1].stairs(counts_K_emp, bins_K, color='pink')
ax[2].stairs(counts_K_prime, bins_K, color='green')
for x in [0,1,2]:
ax[x].set_xlabel('K')
ax[x].set_ylabel('# configurations with K active neruons')
ax[0].set_title('Model')
ax[1].set_title('Data')
ax[2].set_title('Random model')
#plt.grid(True)
#plt.legend()
plt.suptitle(f'VISam, gratings')
plt.show()
s_prime['effective_steps'] = mc_steps//1000
prob_K_prime,prob_K_emp_prime = K_outof_N_prob(s_prime, counts_K_prime, counts_K_emp, df_stimulus)
plt.figure()
plt.plot(edges_K,prob_K,color='lightblue',label='model')
plt.plot(edges_K,prob_K_emp,color='pink',label='data')
plt.plot(edges_K,prob_K_prime,color='green',label='random model')
plt.xlabel('K simultaneous active neurons')
plt.ylabel('P(K)')
plt.suptitle(f'VISam, gratings')
plt.legend()
plt.show()
Our model seems to perform better that a ranom model for two main reasons:
- it is less peaked around te mean and it is instead able to distribute some probability mass also in the tails (even though not in a perfect way)
- it captures the shape of the empirical distribution in a more reliable way with respect to the random and uncorrelated one
One thing that can be deduced from this is that, indeed, our model is able to account for some structure of the underlying network and the correlation measure seems to be informative for this matter
3.2 VISam + VISal -> VISam, static gratings¶
We now try to include neurons of different areas to infer the activity of VISam. We start by considering an area that is closer to VISam and then move on to a further one, as explained in the introduction
We would like to understand whether this could give more information improving our results or, instead, worsen our analysis because our assumptions get weaker (i.e. the correlation is less strong when cnsidering neurons coming from differnet areas in the brain)
region_ids_VISal = oursession.units[(oursession.units["ecephys_structure_acronym"] == 'VISal')].index.values
region_ids_visam_visal = np.array(list(region_ids_VISam)+list(region_ids_VISal))
# region_ids_visam_visal.shape, region_ids_visam_visal_old.shape
h2,J2 = calculate_h_J(region_ids_visam_visal, df_stimulus, 0.75)
## sampling spin configurations with monte carlo simulation
s2 = monte_carlo_simulation(len(region_ids_visam_visal), mc_steps, h2, J2,configs=True)
pickle.dump(s2, open('/Users/isottamagistrali/Library/Mobile Documents/com~apple~CloudDocs/Bocconi/Semestre 3.2/Neurosciencee/data_mcs/mcs_VISam_VISal_gratings.pkg', 'wb'))
s_VISam = dict()
s_VISam['spin_configs'] = [x[:len(region_ids_VISam)] for x in s2['spin_configs']]
s_VISam['effective_steps'] = s2['effective_steps']
## computing the empirical activity with the function defined earlier
#empirical_activity_K = activity(region_ids_VISam,flashes)
Step 0 out of 1000000
Step 100000 out of 1000000 Step 200000 out of 1000000 Step 300000 out of 1000000 Step 400000 out of 1000000 Step 500000 out of 1000000 Step 600000 out of 1000000 Step 700000 out of 1000000 Step 800000 out of 1000000 Step 900000 out of 1000000
Probability that K out of N neurons are active at the same time:
counts_K2,counts_K_emp2,bins_K2,edges_K2 = K_outof_N_histogram(s_VISam, empirical_activity_K)
fig,ax = plt.subplots(1,3,figsize=(15,5))
ax[0].stairs(counts_K, bins_K, color='lightblue')
ax[1].stairs(counts_K_emp, bins_K, color='pink')
ax[2].stairs(counts_K2, bins_K, color='purple')
for x in [0,1,2]:
ax[x].set_xlabel('K')
ax[x].set_ylabel('# configurations with K active neruons')
ax[0].set_title('Model')
ax[1].set_title('Data')
ax[2].set_title('Double model (VISal)')
#plt.grid(True)
#plt.legend()
plt.suptitle(f'VISam, gratings')
plt.show()
prob_K2,prob_K_emp2 = K_outof_N_prob(s_VISam, counts_K2, counts_K_emp, df_stimulus)
plt.figure()
plt.plot(edges_K,prob_K,color='lightblue',label='model')
plt.plot(edges_K,prob_K_emp,color='pink',label='data')
plt.plot(edges_K,prob_K2,color='purple',label='double model (VISal)')
plt.xlabel('K simultaneous active neurons')
plt.ylabel('P(K)')
plt.suptitle(f'VISam, gratings')
plt.legend()
plt.show()
Probability to have a certain energy configuration:
configs = s_VISam['spin_configs']
energies = []
for config in configs:
energies.append(energy_configuration(config,h2[:len(region_ids_VISam)],J2[:len(region_ids_VISam),:len(region_ids_VISam)]))
counts_E2, edges_E2 = np.histogram(energies, bins=bins_E)
edges_E2 = edges_E2[:-1]
fig,ax = plt.subplots(1,3,figsize=(15,5))
ax[0].stairs(counts_E, bins_E, color='lightblue')
ax[1].stairs(counts_E_emp, bins_E, color='pink')
ax[2].stairs(counts_E2, bins_E, color='purple')
for x in [0,1,2]:
ax[x].set_xlabel('E')
ax[x].set_ylabel('# configurations with energy E')
ax[0].set_title('Model')
ax[1].set_title('Data')
ax[2].set_title('Double model VISal')
plt.suptitle(f'VISam, gratings')
plt.show()
prob_E2,prob_E_emp2=E_prob(s_VISam, counts_E2, counts_E_emp, df_stimulus)
plt.figure()
plt.plot(edges_E,prob_E,color='lightblue',label='model')
plt.plot(edges_E,prob_E_emp,color='pink',label='data')
plt.plot(edges_E,prob_E2,color='purple',label='double model (VISal)')
plt.xlabel('Energy E')
plt.ylabel('P(E)')
plt.legend()
plt.title(f'VISam, gratings')
plt.show()
Whether we don't see a big difference in the probability of having a number of active neurons, the analysis on the energy configuration is way worsened here with respect to when we were considering one area alone. In this case, indeed, we tend to have more configurations concentrated around higher values of energy
3.3 VISam + VISp -> VISam, static gratings¶
## trying to include area further to each other
region_ids_VISp = oursession.units[(oursession.units["ecephys_structure_acronym"] == 'VISp')].index.values
region_ids_visam_visp = np.array(list(region_ids_VISam)+list(region_ids_VISp))
h2,J2 = calculate_h_J(region_ids_visam_visp, df_stimulus, 0.75)
## sampling spin configurations with monte carlo simulation
s2 = monte_carlo_simulation(len(region_ids_visam_visp),mc_steps,h2,J2,configs=True)
pickle.dump(s2, open('/Users/isottamagistrali/Library/Mobile Documents/com~apple~CloudDocs/Bocconi/Semestre 3.2/Neurosciencee/data_mcs/mcs_VISam_VISp_gratings.pkg', 'wb'))
s_VISam = dict()
s_VISam['spin_configs'] = [x[:len(region_ids_VISam)] for x in s2['spin_configs']]
s_VISam['effective_steps'] = s2['effective_steps']
Step 0 out of 1000000 Step 100000 out of 1000000
Step 200000 out of 1000000 Step 300000 out of 1000000 Step 400000 out of 1000000 Step 500000 out of 1000000 Step 600000 out of 1000000 Step 700000 out of 1000000 Step 800000 out of 1000000 Step 900000 out of 1000000
Probability that K out of N neurons are active at the same time:
counts_K2,counts_K_emp2,bins_K2,edges_K2 = K_outof_N_histogram(s_VISam, empirical_activity_K)
fig,ax = plt.subplots(1,3,figsize=(15,5))
ax[0].stairs(counts_K, bins_K, color='lightblue')
ax[1].stairs(counts_K_emp, bins_K, color='pink')
ax[2].stairs(counts_K2, bins_K, color='purple')
for x in [0,1,2]:
ax[x].set_xlabel('K')
ax[x].set_ylabel('# configurations with K active neruons')
ax[0].set_title('Model')
ax[1].set_title('Data')
ax[2].set_title('Double model (VISp)')
#plt.grid(True)
#plt.legend()
plt.suptitle(f'VISam, gratings')
plt.show()
prob_K2,prob_K_emp2 = K_outof_N_prob(s_VISam,counts_K2,counts_K_emp,df_stimulus)
#plot_K_outof_N_prob(prob_K,prob_K_emp,edges_K,'VISam with VISal','flashes')
plt.figure()
plt.plot(edges_K,prob_K,color='lightblue',label='model')
plt.plot(edges_K,prob_K_emp,color='pink',label='data')
plt.plot(edges_K,prob_K2,color='purple',label='double model (VISp)')
plt.xlabel('K simultaneous active neurons')
plt.ylabel('P(K)')
plt.suptitle(f'VISam, gratings')
plt.legend()
plt.show()
We can already see that we don't have any improvement in our analysis.
Probability to have a certain energy configuration:
configs = s_VISam['spin_configs']
energies = []
for config in configs:
energies.append(energy_configuration(config,h2[:len(region_ids_VISam)],J2[:len(region_ids_VISam),:len(region_ids_VISam)]))
counts_E2, edges_E2 = np.histogram(energies, bins=bins_E)
edges_E2 = edges_E2[:-1]
fig,ax = plt.subplots(1,3,figsize=(15,5))
ax[0].stairs(counts_E, bins_E, color='lightblue')
ax[1].stairs(counts_E_emp, bins_E, color='pink')
ax[2].stairs(counts_E2, bins_E, color='purple')
for x in [0,1,2]:
ax[x].set_xlabel('E')
ax[x].set_ylabel('# configurations with energy E')
ax[0].set_title('Model')
ax[1].set_title('Data')
ax[2].set_title('Double model (VISp)')
plt.suptitle(f'VISam, gratings')
plt.show()
prob_E2,prob_E_emp2=E_prob(s_VISam, counts_E2, counts_E_emp, df_stimulus)
plt.figure()
plt.plot(edges_E,prob_E,color='lightblue',label='model')
plt.plot(edges_E,prob_E_emp,color='pink',label='data')
plt.plot(edges_E,prob_E2,color='purple',label='double model (VISp)')
plt.xlabel('Energy E')
plt.ylabel('P(E)')
plt.legend()
plt.title(f'VISam, gratings')
plt.show()
The energy, again, is shifted towards the right resulting in a worse estimation.
Even though one might expect some improvement when including more areas in the analysis having more information, instead, the results are a bit worsened. We found the following possible reasons:
- Neuronal activity is highly noisy
- To infer the parameters of the probability distribution, we used an approximation method which disregards second order fluctuations; this clearly looses some information about the second moment of the distribution which might instead turn out to be important
- The most informative point is that, since the method performs fairly just in some of the settings considered, it means that, when that is the case, the constraints initially given are sufficient to capture the complexity of the biological network we are considering and so the correlation would be a strong measure; instead, when that is not the case, correlation might not be enough and there might be some underlying dynamics or properties that we are not capturing.
Two final important takeaways are that, first of all, with this model, we have much more information having inferred a probability distribution over all the possible configurations of neural activity in VISam. Moreover, we have a further confirmation of the fact that when restricting ourselves to a single area the correlation becomes stronger, meaning that interactions are stronger when considered locally and they enable us to infer better models on the real patterns of activity.
This analysis could be repeated considering other combinations of area-stimulus, which we do not perform here to avoid redundancy and because of time constraints. Finally, an interesting further analysis could be performed considering a brain area which is intermediate in the hierarchy of the visual path, and see whether results change when considering neurons in earlier or subsequent areas.
We now move on to the use of machine learning models, which might be able to find more hidden patterns and relations.
4. Predictions¶
The ultimate goal of our analysis is to predict neural activity. Take a group of neurons, call it A, to predict the firing rate of a neuron b. Then, the more we are able to predict the activity of neuron b, the more the firing rate of neuron b is correlated with the firing rates of neurons in group A. Througout this analysis, we will define the firing rate of a neuron during a stimulus in the same way we defined it in the analysis of correlation, that is, the firing rate is equal to the number of spikes emitted during the stimulus divided bt the length of the stimulus. As a metric, we will mainly consider explained variance, which tells how much of the total variance is “explained” by each component. Therefore, when performing a regression, the higher the explained variance, the more we are able to say about neuron b. We will proceed in the following way: We start by considering rates during static gratings. First we try some different types of regression on the same neuron, in order to see which one is the better for our analysis. For brevity reasons, we do not include this part on the report: summaryzing, we tried with a linear regressor, random forest regressor, support vector machine regressor , polynomial features + sgd regressor and with a multilayer perceptron, tuning hyperparameters with grid searches in every case. Even though with more complex models we obtain slightly better results than a linear regression (in terms of MSE and Explained-variance) in the test set (while they have much better results in the training set, obviously), this small increase in performance is not so relevant and it is not enough to use a more complex model than the linear one. Once estabilished that a linear regression is enough for our scope, we:
- predict the rate of a neuron in an Area given all the other neurons in the same Area, for all Area
- predict the rate of a neuron in an area Area1 given all the neurons in another area Area2, for all Area1,Area2, Area1!= Area2
- predict the rate of every neurons given all the other neurons (in every area)
Then we try to repeat the same with not only gratings but all the stimuli presented in the session, in order to see how much our measures were biased by the fact that we were considering only one type of stimulus
Disclaimer: for the sake of brevity, we will sometimes refer to 'predictive area' and 'predicted area'. However, one needs to be careful as we are not trying to infer the activity of a global area, rather the the activity of a single neuron within that area. Since the prediction is repeated for all neurons within that area and mean values are computed, this generates an 'abuse of notation'.
Predictions of static gratings¶
1. Area1 given Area1¶
We load the data, using the function get_rates defined before.
areas = {'APN': 25, #0
'CA1': 134, #1
'CA2': 2,
'CA3': 21,
'DG': 31,
'Eth': 1, # 5
'LGv': 16,
'LP': 65,
'MB': 1,
'NOT': 16,
'ProS': 9, # 10
'SUB': 59,
'TH': 2,
'VISal': 89,
'VISam': 135,
'VISl': 78, # 15
'VISp': 94,
'VISrl': 47}
data_areas = {}
for area in areas:
data_areas[area] = get_rates(['static_gratings'], {area : areas[area]} , dir = '/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/')
We define a function in order to divide between training and test data
def shuffle_and_split_data(data, test_ratio):
shuffled_indices = np.random.permutation(len(data))
test_set_size = int(len(data) * test_ratio)
test_indices = shuffled_indices[:test_set_size]
train_indices = shuffled_indices[test_set_size:]
return data.iloc[train_indices], data.iloc[test_indices]
As model we choose a linear model with a standard scaler
linear_nopca = Pipeline([
("scaler", StandardScaler()),
("linear_reg", LinearRegression())
])
We define a function that fit and predicts the firing rates of a neuron given all the others, and does it for all the neurons considered. It returns the array of explained variances in the test set, the array of explained variances in the training set, and the same for the mean squared error.
def prediction_over_all_units(units, model, training_data, test_data):
n = len(units.index)
expl_var_training = np.zeros(n)
expl_var_test = np.zeros(n)
mse_training = np.zeros(n)
mse_test = np.zeros(n)
area = ["" for i in range(n)]
for neuron in range(n):
x_training_data = training_data.loc[:, training_data.columns != neuron]
y_training_data = training_data.iloc[:, neuron]
x_test_data = test_data.loc[:, test_data.columns != neuron]
y_test_data = test_data.iloc[:, neuron]
model.fit(x_training_data, y_training_data)
predictions_test = model.predict(x_test_data) #predictions on test set
predictions_training = model.predict(x_training_data) #predistions on train set
expl_var_training[neuron] = explained_variance_score(y_training_data, predictions_training) #explained_variance score of training set
expl_var_test[neuron] = explained_variance_score(y_test_data, predictions_test) #explained_variance score of training set
mse_training[neuron] = mean_squared_error(y_training_data, predictions_training)
mse_test[neuron] = mean_squared_error(y_test_data, predictions_test)
area[neuron] = units.ecephys_structure_acronym.iloc[neuron]
return expl_var_test, expl_var_training, mse_test, mse_training, area
For every area (if the number of neurons in that area is greater than 1), we split between training and test set and we apply the fuction prediction_over_all_units, so that:
for every area:
for every unit b in that area:
fit the regressor with the firing rates of the other neurons in the area and predict the rates of b
dati = {}
for area in areas:
if len(oursession.units[oursession.units["ecephys_structure_acronym"]==area]) > 1:
training_data, test_data = shuffle_and_split_data(pd.DataFrame(data_areas[area]),0.2 ) #shuffling and splitting the dataset for each area
expl_var_test, expl_var_training, mse_test, mse_training, zone = prediction_over_all_units(oursession.units[oursession.units["ecephys_structure_acronym"]== area], linear_nopca, training_data, test_data)
dati[area] = {}
dati[area]["expl_var_test"] = expl_var_test
dati[area]["expl_var_training"] = expl_var_training
dati[area]["mse_test"] = mse_test
dati[area]["mse_training"] = mse_training
with open(f"/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/single.pkl", "wb") as tf:
pickle.dump(dati, tf)
just to avoid re-running, to load the data:
with open(f'../one_given_the_other/single.pkl', 'rb') as handle:
dati = pickle.load(handle)
We define a function to print our scores (min, max, mean) across areas
def analyze_single(name, expl_var_test, expl_var_training, mse_test, mse_training):
print(name)
print("\tExplained variance")
print(f"\t\ttraining\tMax: {round(expl_var_training.max(), 5)}\tMin: {round(expl_var_training.min(), 5)}\tMean: {round(expl_var_training.mean(), 5)}")
print(f"\t\ttest\t\tMax: {round(expl_var_test.max(), 5)}\tMin: {round(expl_var_test.min(), 5)}\tMean: {round(expl_var_test.mean(), 5)}")
print("\tMSE")
print(f"\t\ttraining\tMax: {round(mse_training.max(), 5)}\tMin: {round(mse_training.min(), 5)}\tMean: {round(mse_training.mean(), 5)}")
print(f"\t\ttest\t\tMax: {round(mse_test.max(), 5)}\tMin: {round(mse_test.min(), 5)}\tMean: {round(mse_test.mean(), 5)}")
for area in areas:
if areas[area] > 1:
analyze_single(area, dati[area]["expl_var_test"], dati[area]["expl_var_training"], dati[area]["mse_test"], dati[area]["mse_training"] )
APN Explained variance training Max: 0.65161 Min: 0.0222 Mean: 0.26865 test Max: 0.64339 Min: 0.00761 Mean: 0.26669 MSE training Max: 219.24482 Min: 2.4217 Mean: 47.45593 test Max: 229.04317 Min: 2.35715 Mean: 48.95882 CA1 Explained variance training Max: 0.83551 Min: 0.03593 Mean: 0.339 test Max: 0.83414 Min: -0.0167 Mean: 0.29516 MSE training Max: 212.47617 Min: 0.87598 Mean: 38.58232 test Max: 236.06089 Min: 0.82168 Mean: 40.43761 CA2 Explained variance training Max: 0.04383 Min: 0.04383 Mean: 0.04383 test Max: 0.03633 Min: 0.03577 Mean: 0.03605 MSE training Max: 55.34544 Min: 22.63183 Mean: 38.98864 test Max: 54.03096 Min: 23.74187 Mean: 38.88642 CA3 Explained variance training Max: 0.51912 Min: 0.03434 Mean: 0.19914 test Max: 0.51399 Min: 0.03996 Mean: 0.18876 MSE training Max: 421.21742 Min: 3.42764 Mean: 56.3878 test Max: 385.30129 Min: 3.41261 Mean: 54.90416 DG Explained variance training Max: 0.67721 Min: 0.01028 Mean: 0.25401 test Max: 0.71418 Min: -0.01317 Mean: 0.24319 MSE training Max: 128.84396 Min: 0.39067 Mean: 32.81834 test Max: 117.45095 Min: 0.31462 Mean: 32.71476 LGv Explained variance training Max: 0.54915 Min: 0.12649 Mean: 0.30679 test Max: 0.60002 Min: 0.00101 Mean: 0.29695 MSE training Max: 420.28142 Min: 1.13222 Mean: 118.8423 test Max: 416.93888 Min: 0.91819 Mean: 119.11035 LP Explained variance training Max: 0.77447 Min: 0.02682 Mean: 0.18268 test Max: 0.75268 Min: 0.00746 Mean: 0.16073 MSE training Max: 258.08527 Min: 2.63829 Mean: 70.45518 test Max: 264.62387 Min: 2.76267 Mean: 73.06763 NOT Explained variance training Max: 0.53905 Min: 0.02094 Mean: 0.28126 test Max: 0.58242 Min: 0.01021 Mean: 0.26729 MSE training Max: 350.97802 Min: 0.85883 Mean: 69.68963 test Max: 372.39397 Min: 1.13802 Mean: 70.32973 ProS Explained variance training Max: 0.25405 Min: 0.01408 Mean: 0.12659 test Max: 0.25936 Min: 0.00984 Mean: 0.12506 MSE training Max: 237.79262 Min: 6.90714 Mean: 53.86658 test Max: 222.10879 Min: 6.95017 Mean: 52.30727 SUB Explained variance training Max: 0.7363 Min: 0.01891 Mean: 0.27115 test Max: 0.70619 Min: -0.00861 Mean: 0.25082 MSE training Max: 386.15947 Min: 0.5817 Mean: 46.52244 test Max: 414.70897 Min: 0.54212 Mean: 47.95195 TH Explained variance training Max: 0.07528 Min: 0.07528 Mean: 0.07528 test Max: 0.05874 Min: 0.05781 Mean: 0.05828 MSE training Max: 95.73183 Min: 75.38494 Mean: 85.55839 test Max: 101.27358 Min: 84.0766 Mean: 92.67509 VISal Explained variance training Max: 0.79992 Min: 0.02338 Mean: 0.33266 test Max: 0.81323 Min: -0.0159 Mean: 0.30963 MSE training Max: 99.65574 Min: 0.09368 Mean: 23.26062 test Max: 115.85297 Min: 0.13451 Mean: 24.50839 VISam Explained variance training Max: 0.79354 Min: 0.0482 Mean: 0.32836 test Max: 0.80018 Min: -0.0348 Mean: 0.28903 MSE training Max: 102.80685 Min: 0.71906 Mean: 23.71995 test Max: 107.96264 Min: 0.79429 Mean: 25.16778 VISl Explained variance training Max: 0.67348 Min: 0.03355 Mean: 0.28011 test Max: 0.62872 Min: -0.01479 Mean: 0.25432 MSE training Max: 193.71216 Min: 0.21551 Mean: 24.40096 test Max: 199.18708 Min: 0.27845 Mean: 24.69047 VISp Explained variance training Max: 0.71641 Min: 0.07248 Mean: 0.35438 test Max: 0.73033 Min: 0.0173 Mean: 0.32009 MSE training Max: 227.5397 Min: 0.79899 Mean: 27.04178 test Max: 228.49792 Min: 0.76279 Mean: 29.13979 VISrl Explained variance training Max: 0.75699 Min: 0.03595 Mean: 0.23854 test Max: 0.76197 Min: -0.02255 Mean: 0.21885 MSE training Max: 148.15256 Min: 1.7946 Mean: 33.22596 test Max: 134.06407 Min: 1.90548 Mean: 33.13624
We note that:
- There is no much difference between training set and test set --> we are not overfitting
- There is high variability of accuracy across areas
- There is high variability of accuracy across neurons of the same area
We will analyze better the data in the next section, predictiong also from neurons of different areas
Area1 given Area2¶
In order to predict, using the units in Area 1, the activity of an unit in Area2, and to do it for all neurons in Area2, we define a new function, similar in spirit to the previous one
def prediction_over_all_first_units(units, model, training_data, test_data):
n = len(units.index)
expl_var_training = np.zeros(n)
expl_var_test = np.zeros(n)
mse_training = np.zeros(n)
mse_test = np.zeros(n)
area = ["" for i in range(n)]
for neuron in range(n):
x_training_data = training_data.loc[:, training_data.columns >= n]
y_training_data = training_data.iloc[:, neuron]
x_test_data = test_data.loc[:, training_data.columns >= n]
y_test_data = test_data.iloc[:, neuron]
model.fit(x_training_data, y_training_data)
predictions_test = model.predict(x_test_data)
predictions_training = model.predict(x_training_data)
expl_var_training[neuron] = explained_variance_score(y_training_data, predictions_training)
expl_var_test[neuron] = explained_variance_score(y_test_data, predictions_test)
mse_training[neuron] = mean_squared_error(y_training_data, predictions_training)
mse_test[neuron] = mean_squared_error(y_test_data, predictions_test)
area[neuron] = units.ecephys_structure_acronym.iloc[neuron]
return expl_var_test, expl_var_training, mse_test, mse_training, area
We will use the same linear model as before. Similarly, we run the prediction
for area1 in areas:
dati = {}
for area2 in areas:
training_data, test_data = shuffle_and_split_data(pd.DataFrame(np.concatenate((data_areas[area2],data_areas[area1]), axis = 1)),0.2 )
expl_var_test, expl_var_training, mse_test, mse_training, zone = prediction_over_all_first_units(oursession.units[oursession.units["ecephys_structure_acronym"]== area], linear_nopca, training_data, test_data)
dati[area2] = {}
dati[area2]["expl_var_test"] = expl_var_test
dati[area2]["expl_var_training"] = expl_var_training
dati[area2]["mse_test"] = mse_test
dati[area2]["mse_training"] = mse_training
with open(f"/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/given_{area1}.pkl", "wb") as tf:
pickle.dump(dati, tf)
Let's give a look to the mean explained variance for every single combination of areas
We load the data
expl_variances_mean = np.zeros((len(areas),len(areas)))
for i, area1 in enumerate(areas):
with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/given_{area1}.pkl', 'rb') as handle:
b = pickle.load(handle)
for j, area2 in enumerate(areas):
if i != j:
expl_variances_mean[i][j] = np.mean(b[area2]['expl_var_test'])
elif area2 not in ['Eth', 'MB']:
with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/single.pkl', 'rb') as handle2:
a = pickle.load(handle2)
expl_variances_mean[i][j] = np.mean(a[area2]['expl_var_test'])
The mean explained variance is represented through a heat map, which visually allows us to capture the predictive capability of neurons in one area and how weel neurons within an area are predicted by others. On the x-axis, there are neurons from the areas that is predicted, while on the y-axis neurons from the area predicted.
c =pd.DataFrame(expl_variances_mean, index=list(areas.keys()), columns=list(areas.keys()))
# Calcolo della linkage matrix utilizzando il metodo 'ward'
link = linkage(c, method='ward')
# Ottenimento dell'ordine dei cluster
order = leaves_list(link)
# Riordino delle righe e colonne secondo l'ordine dei cluster
ordered_exp_matrix = c.iloc[order, order]
plt.figure(figsize=(8, 6))
sns.heatmap(ordered_exp_matrix.T, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("Ordered Mean Explained Variance Matrix")
plt.xlabel('Area predicting')
plt.ylabel('Area predicted')
plt.show()
There are some areas for which it seems to be almost impossible to predict from them (eth, mb, ca2, th), because they have only 1 or 2 neurons each. Those areas are located in the corners of the map and a gray value is associated to them.
MOST IMPORTANTLY, the structure is clear: as seen before, there are 2 main groups: the one composed by the "VIS" areas, and the one composed by the "CA" areas together with DG and SUB
Let's do the same with the max instead of the mean, in order to see outliers.
expl_variances_max = np.zeros((len(areas),len(areas)))
for i, area1 in enumerate(areas):
with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/given_{area1}.pkl', 'rb') as handle:
b = pickle.load(handle)
for j, area2 in enumerate(areas):
if i != j:
expl_variances_max[i][j] = np.max(b[area2]['expl_var_test'])
elif area2 not in ['Eth', 'MB']:
with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/single.pkl', 'rb') as handle2:
a = pickle.load(handle2)
expl_variances_max[i][j] = np.max(a[area2]['expl_var_test'])
from scipy.cluster.hierarchy import dendrogram, linkage, leaves_list
c =pd.DataFrame(expl_variances_max, index=list(areas.keys()), columns=list(areas.keys()))
# Calcolo della linkage matrix utilizzando il metodo 'ward'
link = linkage(c, method='ward')
# Ottenimento dell'ordine dei cluster
order = leaves_list(link)
# Riordino delle righe e colonne secondo l'ordine dei cluster
ordered_exp_matrix = c.iloc[order, order]
plt.figure(figsize=(8, 6))
sns.heatmap(ordered_exp_matrix.T, cmap='coolwarm', fmt=".2f", center = 0)
plt.title("Ordered Maximum Explained Variance Matrix")
plt.xlabel('Area predicting')
plt.ylabel('Area predicted')
plt.show()
With the max we are not able to reconstruct the structures! This means that there are "outliers" that is neurons that can be predicted very well from a given area even if in general from that area is difficult to predict neurons in the area of the outlier. ___________
let's try to grasp the distribution of the explained variances. Let's load the explained variance test set.
areas_def = {
'CA1': 134,
'LP': 65,
'SUB': 59,
'VISal': 89,
'VISam': 135,
'VISl': 78,
'VISp': 94,
'VISrl': 47}
# explained variances of the test set
expl_variances = {}
for i, area1 in enumerate(areas):
expl_variances[area1] = {}
with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/given_{area1}.pkl', 'rb') as handle:
b = pickle.load(handle)
for j, area2 in enumerate(areas):
if i != j:
expl_variances[area1][area2] = b[area2]['expl_var_test']
elif area2 not in ['Eth', 'MB']:
with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/single.pkl', 'rb') as handle2:
a = pickle.load(handle2)
expl_variances[area1][area2] = a[area2]['expl_var_test']
colors_to_area = mcp.gen_color(cmap="Dark2",n=8)
areas_colors = {}
for i, area in enumerate(areas_def.keys()):
areas_colors[area] = colors_to_area[i]
In order to study how the explained variance scores are distributed and change as one area is predicting another, we have represented two grids: on the x-axis of each subgraph there are the values of the explained variance, while on the y-axis the number of units that count have that value (i.e. the neural population). We have considered only 8 areas, i.e. those which seemed the most relevant to us (and had a greater amount of units). Together with areas predicting/predicted by another, there is one area predicting/predicted by itself. What does it mean? It implies that all neurons EXCEPT one are employed to infer the activity of the remaining neuron.
- The title shows the name of the area that is predicting the other 8
- The title shows the name of the area that is predicted by the remaining 8
fig = plt.figure(figsize = (20,10))
for i, area1 in enumerate(areas_def):
maximum = np.max(expl_variances[area1][max(expl_variances[area1])])
minimum = np.min(expl_variances[area1][min(expl_variances[area1])])
ax = fig.add_subplot(2, 4, i + 1)
for area2 in areas_def:
ax = sns.distplot(expl_variances[area1][area2], hist = False, color = areas_colors[area2], kde_kws={'clip': (minimum ,maximum)}, ax = ax)
ax.set_title(f'{area1}')
ax.set(xlabel=None)
ax.legend(areas_colors)
plt.suptitle('Explained Variance Score: how much can a unit in an area predict the values of the same or others areas?', fontsize = 20);
fig = plt.figure(figsize = (20,10))
for i, area1 in enumerate(areas_def):
maximum = np.max(expl_variances[area1][max(expl_variances[area1])])
minimum = np.min(expl_variances[area1][min(expl_variances[area1])])
ax = fig.add_subplot(2, 4, i + 1)
for area2 in areas_def:
ax = sns.distplot(expl_variances[area2][area1], hist = False, color = areas_colors[area2], kde_kws={'clip': (minimum ,maximum)}, ax = ax)
ax.set_title(f'{area1}')
ax.set(xlabel=None)
ax.legend(areas_colors)
plt.suptitle('Explained Variance Score: how much can neurons from same or other areas predict activity of one unit in area?', fontsize = 20);
Grid 1: Areas seem to better predict themselves rather than others. However, there are few exceptions: VISam and VISl demonstrate good scores when it comes to predicting the activity of a neuron in VISal (pink curve). Grid 2: Areas seem to be better predicted by themselves rather than by the others. Only in the case of VISal, VISl seems to predict better. In general, neural activity within the same area is better predicted than the one across brain zones.
Disclaimer: the curves have been produced through a KDE (kernel density estimator), which tends to smooth the data and to produce heavy tails, which are particolarly problematic for bounded domains. As a consequence, the plots have been 'cut' to their minimum and maximum values (even though the areas below the curve does not account for 1). Nevertheless, the qualitative behavior is not affected.
However, it might still be difficult to capture the patterns (as many areas are involved). Let's reproduce the same graphs, restring to cortical regions, (which are the ones exhibiting an interesting behavior).
areas_cortex = {
'VISal': 89,
'VISam': 135,
'VISl': 78,
'VISp': 94,
'VISrl': 47}
fig = plt.figure(figsize = (20,10))
for i, area1 in enumerate(areas_cortex):
maximum = np.max(expl_variances[area1][max(expl_variances[area1])])
minimum = np.min(expl_variances[area1][min(expl_variances[area1])])
ax = fig.add_subplot(2, 5, i + 1)
for area2 in areas_cortex:
ax = sns.distplot(expl_variances[area1][area2], hist = False, color = areas_colors_cort[area2], kde_kws={'clip': (minimum ,maximum)}, ax = ax)
ax.set_title(f'{area1}')
ax.set(xlabel=None)
ax.legend(areas_colors_cort)
plt.suptitle('Explained Variance Score (visual cortex): how much can a unit in an area predict the values of the same or others areas?', fontsize = 20);
fig = plt.figure(figsize = (20,10))
for i, area1 in enumerate(areas_cortex):
maximum = np.max(expl_variances[area1][max(expl_variances[area1])])
minimum = np.min(expl_variances[area1][min(expl_variances[area1])])
ax = fig.add_subplot(2, 5, i + 1)
for area2 in areas_cortex:
ax = sns.distplot(expl_variances[area2][area1], hist = False, color = areas_colors_cort[area2], kde_kws={'clip': (minimum ,maximum)}, ax = ax)
ax.set_title(f'{area1}')
ax.set(xlabel=None)
ax.legend(areas_colors_cort)
plt.suptitle('Explained Variance Score (visual cortex): how much can neurons from same or other areas predict activity of one unit in area?', fontsize = 20);
Grid 1: As could have imagined from the above analysis, VISam and VISl predict 'better' VISal than themselves. Grid 2: VISal is better predicted by VISl and VISam rather than itself. The reason why these areas behave in such manner could not only be due to the amount of units at disposal, but also by the hierarchical structure and proximity of these regions.
Finally, here comes the 'ambitious' task, i.e. predicting the activity of ONE unit given ALL others (from all areas). Because of the large number of units at disposal (i.e. 825), in order to speed up computations and extract the best out of the model, we have performed dimensionality reduction (i.e. PCA) by choosing the first 100 components (i.e. those carrying the highest variance). What could one expect? Since now we have rates from several units, we could expect the activity of a single neuron to be better predicted. However, we should also take into account that there are areas which are way less relevant in terms of predictive power, as they count one or two units.
linear = Pipeline([
("scaler", StandardScaler()),
('PCA', PCA(n_components = 100)),
("linear_reg", LinearRegression())
])
training_data, test_data = shuffle_and_split_data(data, 0.2)
expl_var_test, expl_var_training, mse_test, mse_training = prediction_over_all_units_new( linear, training_data, test_data, mappa, areas)
APN CA1 CA2 CA3 DG Eth LGv LP MB NOT ProS SUB TH VISal VISam VISl VISp VISrl
dati = {}
dati["expl_var_training"] = expl_var_training
dati["expl_var_test"] = expl_var_test
dati["mse_training"] = mse_training
dati["mse_test"] = mse_test
with open(f'/Volumes/GIulia/BAI/TERZOANNOOOO/Laboratoriosegreto007agenteperry/one_given_the_other/all.pkl', "wb") as tf:
pickle.dump(dati, tf)
In the same manner done above, the graph illustrates how the explained variance score is distributed. For each subgraph, two curves have been defined: one shows the values obtained when all neurons are predicting a single one (All -> Area 1), while the blue one when units from the same area predict the 'left out' one. The reason why the two curves have been jointly represented was to capture imporvements in the explained variance score.
fig = plt.figure(figsize = (20,10))
for i, area1 in enumerate(areas_def):
maximum = np.max(expl_variances[area1][max(expl_variances[area1])])
minimum = np.min(expl_variances[area1][min(expl_variances[area1])])
ax = fig.add_subplot(2, 4, i + 1)
ax = sns.distplot(dati['expl_var_test'][area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
ax = sns.distplot(expl_variances[area1][area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
ax.set_title(f'{area1}')
ax.set(xlabel=None)
ax.legend([ 'All -> Area 1', 'Area1 -> Area 1']);
plt.suptitle('Explained Variance Score: the activity of a neuron from one area is predicted from all neurons in an area (same or different)', fontsize = 20);
As a matter of fact, a slight imporvement seems to have occurred when the totality of units are exploited (purple). In particular, cortical regions are the ones exhibiting greater scores.
How could the results of the regressions be interpreted? Indeed, some reassuring results have been discovered, since there is an overall satisfying predictive capability within areas. Furthermore, there are regions which seem to play a determinant role in the prediction of single neural activity of different zones. Such behavior might be linked to the existence of positive (or negative) correlations between firing rates (either within or across brain areas).
All the stimuli¶
We can even predict from all the types of stimuli, and not only static gratings. The result is the following (we omit the fitting for brevity)
with open(f'../one_given_the_other/all.pkl', "rb") as tf:
dati = pickle.load(tf)
with open(f'../Risultati_Luca/linear_whole/expl_var_test.pkl', "rb") as tf:
expl_variances = pickle.load(tf)
fig = plt.figure(figsize = (20,10))
for i, area1 in enumerate(areas_def):
ax = fig.add_subplot(2, 4, i + 1)
ax = sns.distplot(dati['expl_var_test'][area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
ax.set_title(f'{area1}')
ax.set(xlabel=None)
ax.legend([ 'Static Gratings', 'All the stimuli']);
plt.suptitle('Explained Variance Score: difference between considering only static gratings and all types of stimuli', fontsize = 20);
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(dati['expl_var_test'][area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(dati['expl_var_test'][area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(dati['expl_var_test'][area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(dati['expl_var_test'][area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(dati['expl_var_test'][area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(dati['expl_var_test'][area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(dati['expl_var_test'][area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:4: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(dati['expl_var_test'][area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, color = 'purple', ax = ax)
C:\Users\lgand\AppData\Local\Temp\ipykernel_20516\2460109830.py:5: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
ax = sns.distplot(expl_variances[area1], hist = False, kde_kws={'clip': (-0.2 ,0.8)}, ax = ax)
The performces have a dramatic drop. The areas which has mantained the best score is CA1
Limitations¶
Is it enough to confirm the existence of neural connections? In general, correlation might not always be a good measure to grasp such functioning and one should be careful to interpret such result. For instance, there might existence brain regions which exhibit a correlation between spikes emitted by neurons, but the cells are not connected. Furthermore, studying correlations might sometimes be inconvenient, as a large number of data from simoultaneously recorded neurons and are subjects to fluctuations (also due to trial-to-trail variability). Reference : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3586814/
Conclusion¶
Our work was motivated by an interest in understanding whether some correlation in neuronal activity was present, whether this measure was enough to capture some relevant information of the underlying dynamics or properties of the biological network and then, reassured by the proven presence of local interactions, whether it was possible to predict the activity of one neuron given the activity of others (selected in various ways). The main takeaways of our analysis are that there seems to be a higher correlation of single areas with themselves and that areas in the visual cortex seem to be more correlated with each other with respect to other brain areas. Furthermore, we can see that some areas are more able to predict others (more specific conclusions are better explained in each particular section of our research).
Lastly, we will briefly mention some general limitation and possible venues for further research, which can be applied to our work as a whole, but more comments and indications regarding specific types of analysis can be found while reading through this notebook.
Clearly one of the main limitations of our work is the scarcity of data available, so surely one could try to merge different sessions to extract more precise and meaningful conclusions. Furthermore, one could repeat the same work with different combinations of brain areas - stimuli to compare results.

