This is our first article, so exciting! We will use the special occasion to apply Data Science to a noble cause: deciphering the information contained in genetic code. We will here provide an implementation in python of a paper by Alexander Gorban and Andrei Zinovyev, used on MIT's edX Data Science course. We will try to explain the main Data Science concepts targeting the average curious person, but most importantly discuss the potential that these techniques have as well as their limitations. If you are more interested in the coding and implementation aspects of the exercise, feel free to go directly to the Jupyter Notebook . This page is rather aimed at explaining the approach and discussing results. The problem Nowadays genome is a mainstream concept. We know that it contain key messages (genes) that regulate life, and that this information is coded in an "alphabet" with 4 letters (A, T, C and G). You might wonder: "how did scientists discover that thos...