Elia Brodsky - www.ebrodsky.site
5 min readSep 21, 2020

--

Omics Logic Code: Bioinformatics Playground for R and Python

Life sciences are going through a major transformation: the growing volume, velocity and variability of collected data are demanding an ever increasing level of data management, wrangling and analysis skills. The data itself keeps getting more detailed, complex and … interesting! The scale of typical experiments went from 3 replicates per condition to tens, hundreds, or even thousands of samples per experiment. In many areas, thousands of parameters are measured at once, generating millions of data points per observation.

As a result, we can appreciate the variability at the individual, tissue and single-cell level between groups, individuals, tissues and single cells. This fact is compounded by the complexity of interactions between observed variables, protocol variation between labs and the ultimate challenge to discover system-level regulation and interpretable signals in data that make sense from a traditional mechanistic point of view.

Lowering costs of such novel technologies like genomic sequencing are making it easier to start an experiment based on data, making whole experiments “hypothesis generating” and putting the millions of already collected datasets to use. As a result, there is a completely transformed process of conducting science based on data, making data science skills central to many areas of research in biomedicine, biotechnology and agriculture.

Collectively, the science of organization, management, transformation and analysis of biological data is referred to as “bioinformatics”. The name itself has been around for decades, but what it means has changed significantly. Today, bioinformatics is a part of data science with specialization in biology, or life sciences in general. Data Science combines computer science, statistical analysis and biology — three domains one needs to be proficient in to analyze and interpret biological data.

Many universities around the world provide degrees in biology, biotechnology and biochemistry. In the US, life science degrees have been on the decline and data science has experienced a tremendous growth, being named one of the sought-after degrees and professions of our time. As a result, many universities are struggling to address this shift — so much was invested in labs, training and staff for biotechnology and life sciences but yet many of these resources are not adapted to incorporate a data science perspective.

source: Google Trends (https://trends.google.com/trends/explore?hl=en-US&tz=300&cat=958&date=all&geo=US&q=%2Fm%2F0jt3_q3,%2Fm%2F01ftz,%2Fm%2F06q83&sni=3)

That is why our company started to design a comprehensive resource center for Bioinformatics Education. As a result, we developed the Omics Logic training programs that offer a modern approach to teach BIG-DATA bioinformatics:

  1. Projects sourced from high-impact research publications in oncology, biotechnology, neuroscience, agriculture and infectious diseases.
  2. Big data cloud resources for storage, processing, analysis and integration of multi-omics data.
  3. A library of asynchronous training materials for faculty, students and researchers
  4. Code Playground designed around visualization, interpretation and machine learning for omics data.
  5. Community of beginners and experts that are actively learning and working on research problems in an interactive environment.

In my previous posts, I have already introduced the Omics Logic elements like the T-BioInfo platform and our online training portal. Today, I want to focus on the launch of the Omics Logic Code Playground.

Omics Logic Code Playground:

Omics Logic Code Playground: Getting started with multivariate analysis of multi-dimensional gene expression data

So what level of coding is appropriate for a data-aware biologist or clinican to be effective at handing omics data? The most common 3 languages in use by computational biologists are bash scripting, R and Python. There are many functions, libraries and packages that make bioinformatics in these languages well suited for the types of data analysis a computational biologists will be expected to perform:

Bash scripting — most useful for handling large datasets, arranging, transforming and “massaging” or “wrangling” them working directly where they are found on your machine or server.

R scripting — especially useful for statistical analysis and visualization of datasets with specialized libraries designed for omics or biological datasets.

Python — efficient processing and machine learning applications designed for ease of use, logic and efficiency.

The code playground makes it easy to learn such languages using a pre-configured environment for such scripting. There are no installations needed, the code will run directly in the browser and we have implemented our point system to connected the coding challenges to our asynchrounous bioinformatics curriculum.

We are excited to launch today and offer the courses in several pre-configured packages in R and Python: code.omicslogic.com

These include exercises on the very basics (Introduction to Bioinformatics) as well as advanced methods (these are taught in our Omics Logic Data Science Program)that are useful for large RNA-Seq studies like classification, feature selection and regression.

To get started, you only have to register and take a free course. If you like the experience, a premium starter package will grant you access to all of the materials for just $10 per month.

--

--

Elia Brodsky - www.ebrodsky.site

Healthcare, Life Sciences, Data... In the past, startup co-founder @PineBiotech — big data, bioinformatics, healthcare