Sherlock is an open source data platform, developed in the Korcsmaros Group (Earlham Institute, Norwich, UK) to store, analyze and integrate bioinformatics data.
Features
- store all datasets in a redundant, organized cloud storage
- convert all datasets to common, optimized file formats
- execute analytical queries on top of data files
- share datasets among different teams / projects
- generate operational datasets for certain services or collaborators
High-level overview and presentations
More technical documentation and examples
- Under the hood: basic components
- Deployment guide
- Backup and restore the metadata
- Loading bioinformatics data into the Data lake
- Data lake structure and schema initialization
- Loading interaction data
- Loading localization data
- Loading genomic sequence data
- Loading gene annotation data
- Loading molecule ID mapping data
- Example queries
- Query Intact protein interactions and mapping them to Uniprot IDs
- Select proteins from a given human tissue
- Filtering human Intact interactions based on tissue
- Enrich protein list with internal interactions (and potentially with first neighbours)
- Build brain-specific molecular network, based on Bgee and IntAct
- Fetching a sequence region around an point mutation
Your feedback
… is very important, feel free to share it with us! :)
Authors
The people behind the Sherlock project:
Developers
- Balazs Bohar
- Matthew Madgwick
© 2018, 2019 Earlham Institute (License)