A comprehensive example on how to download the mouse gene expression data (ISH) is given by the Allen Institute on this page. To produce the interactive picture on that page, the whole gene expression dataset is needed.
In the "Documentation and Resources" section on that page, a Python script download_data.py is provided that downloads all gene expression data and brings it into a form whereby the voxel-based expression data is averaged ('unionized') over brain structures.
The said script does not save the data (it converts it to correlation data), so here we use a modified version that saves the expression matrix to three files:
The csv-files are easy to read:
It is important to realize that the expression data computed in this way is overcomplete, because it contains values for all the brainsites in the ABA mouse hierarchy in its rows, at various levels of granularity. To do something useful with it, one needs to pick specific rows from the matrix that do not overlap. The following example shows how to select only data at the leaves of the structure hierarchy: The connectivity data published in the Oh et al. 2014 paper is not precise enough to populate a connection matrix at the level of the smallest structures of the mouse atlas. Instead, a set of 295 structures one level up on the hierarchy are used, i.e. neglecting layer-specific information. Some further pruning of regions was done to ensure that all chosen regions were targeted by at least one rAAV injection, resulting in a final set of 213 regions per hemisphere (see column 'RepresentedInLinearModelMatrix' in the .csv file). The following code reduces the expression data to only these 213 structures: Further documentation on the Allen Brain expression data is available from their API support page.Using the expression data download script as a template, we created a similar download script for connectivity data. Following this query, there are 2209 injection datasets that are 'not failed' and use 'coronal sectioning' (all do). After including an age range of P56 +/- 2 days, this reduces to 2184 data sets, and after requesting that the donor is not transgenic it goes down to 477. That is close to the 469 datasets mentioned by Oh et al. 2014 (doi:10.1038/nature13186). We will remove the 8 datasets that Oh et al. 2014 did not include at a later stage. They have imageSeries_id=[307558646,307321674,307557934,304585910,307137980,307320960,307297141,304565427]. Note that there are also 10 datasets in product 6, in which two tracer substances are injected (BDA+viral tracer). These are not used in the analysis.
As the measure for connection strength we took the 'normalized projection volume', also used by Oh et al. 2014. There are two resulting projection matrices, one for the left and one for the right hemisphere. Given that all injections were in the right hemisphere, CS_R represents the ipsilateral connectivition strength matrix and CS_L the contralateral. The data are saved to four files:The csv-files are easy to read:
The CR_213 and CR_295 matrices contain the projection strengths for all injections. The CR_295 matrix is also available for download in the Oh et al. 2014 paper, we converted it to oh_etal_2014_injections.csv. We should now have two copies of the same data, one from the ABA api and one from the publication. Let's check whether they are indeed the same. The Oh et al. 2014 paper estimated a 213x213 connectivity matrix from this injection data, whereby multiuple injections in the same structure were merged and the number of sites were reduced to 213 to ensure that the matrix has no holes. We extracted the ipsilateral and contralateral matrices as csv. The following code reads the ipsilateral 213x213 connectivity matrix: