With the advances in next-generation sequencing technologies over the past decade, genomics has gradually caught up to the big data giants РYouTube, Amazon, and Twitter to name a few Рin terms of its requirement for data storage and computational needs. By 2025[1], the storage requirements for human DNA sequences alone is projected to be 2-40 exabytes (1 exabyte is 1018 bytes or 109 gigabytes).

In this era of big data, several open access data initiatives like TCGA, GEO, ENCODE etc. have made a large amount of omics data publicly available to the scientific community¬†[2]. Such datasets are extremely valuable for drug discovery as they allow integration of large amounts of¬†data from across different sources¬†and give a more robust and comprehensive understanding of diseases. While there is ample data available to the scientist, the real challenge for a scientist now lies in being able to extract meaningful insights from it. With the advent of ‚Äėomics‚Äô, the bottleneck in science has rapidly shifted from data generation to data interpretation and this is where¬†Elucidata¬†comes into the picture.

Elucidata enables bio-pharma companies to tap into these powerful datasets to answer their research questions. We provide them with the necessary bioinformatics expertise and data analysis support. For instance, if a company is interested in studying the mutational landscape of a particular set of genes across different cancer types, we can query them over the TCGA[4] dataset. The Cancer Genome Atlas (TCGA) is an effort to characterize over 10,000 tumor samples across 33 different cancers using different technologies.  We analyze the TCGA dataset to identify previously unknown genomic alterations or driver genes responsible for the disease, which helps identify novel targets for therapy and drug discovery.

The field of genomics has also seen an increase in the development of sophisticated bioinformatics tools for analysis and interpretation of the data. Majority of these tools are open-source allowing greater reproducibility in research and better maintained products. We, at Elucidata, embrace open-source, which enables us to build innovative software products, contribute to the community, and work with cutting-edge technologies like CRISPR. Going ahead, we envision that such open source tools and pipelines could be hosted on Polly as independent applications, allowing scientists to customize their workflow. The ability to perform integrative multi-omics analyses under a single roof makes Polly a one stop platform for drug discovery.


  1. Genome researchers raise alarm over big data, Nature
  2. Databases and web tools for cancer genomics study, ScienceDirect
  3. Leveraging big data to transform target selection and drug discovery, Wiley
  4. The Cancer Genome Atlas
  5. Sequencing the genome creates so much data we don’t know what to do with it, The Washington Post
  6. Broad Institute to release Genome Analysis Toolkit 4 as open source resource to accelerate research, Broad Institute
  7. The real cost of sequencing: scaling computation to keep pace with data generation, Genome Biology
  8. The Open Source Software Debate in NGS Bioinformatics, Mass Genomics
  9. How bioinformatics tools are bringing genetic analysis to the masses, Nature

See how Polly platform handles big biological data like a pro. Book a professional session today!

6 Replies to “Big Data driven drug-discovery

  1. Hi! I’ve been following your weblog for a while now and I want to give you a shout out from Houston Tx! Just wanted to tell you to keep up the fantastic job!

  2. Hello there! Would you mind if I share your blog with my facebook group? There’s a lot of people that I think would really enjoy your content. Please let me know. Many thanks

Leave a Reply

Your email address will not be published. Required fields are marked *