Introduction
Personal
Motivation
Bioinformatics
Statistics
Coding
Bursary
Summary

Coding

The following questions should ensure that you have basic coding skills and will be able to follow the practical part of the school.
You can use any programming language of your choice (we recommend using Python or R, as the practical part of NGSchool will be conducted using those).
C1
Given an input text file Download, parse the records to find a number made when the first digit and the last digit in each line is combined (in that order). What is the sum of all the numbers? Note that the same digit could be the first and the last digit in a line.
For example, if the lines are:
eightg1
4ninejfpd1jmmnnzjdtk5sjfttvgtdqspvmnhfbm
78seven8
6pcrrqgbzcspbd
The numbers for each line are 11, 45, 78 and 66 respectively. The sum of all the numbers would be 200.
Characters:  0/64
C2
Reading any number from left to right, if all the digits are in increasing order (for example 134468) we can call such a number an “increasing number”. Similarly, if all the digits are in decreasing order (for example 66420) we can call it a “decreasing number”. For this question, we shall call a positive integer that is neither increasing nor decreasing (for example 155349) a “bouncy number”. Clearly, there cannot be any bouncy numbers below one hundred, but just over half of the numbers below one thousand (525 exactly) are bouncy. In fact, the lowest number for which the proportion of bouncy numbers first reaches 50% is 538. Surprisingly, bouncy numbers become more and more common and by the time we reach 21780 the proportion of bouncy numbers is equal to 90%. Find the lowest number for which the proportion of bouncy numbers is exactly 99%.
Characters:  0/64

Description for C3a-c

The same DNA sequence template can produce different protein/peptide sequences as a result of different forward and reverse open-reading frames. Translate the below DNA sequence into all possible peptide sequences and answer the below questions.
Do not consider the start codon Methionine as part of the final peptide sequence.

Input non-mitochondrial human DNA sequence in FASTA format:

>myseq
ATGGCACGTTTACGATCGTACTGAAGCGTACTGATGCGTACGATCGTACGTTTAACTGATGCGTAGCTGATGCGTTACTG
ACGTAGCGTAGTTTAGCGTAGCGTATGCTAACGCGTATCGTACGTTGATGCGTACTGATGCGTTTAGCGTACTGTAGCGT
ACTAGCGTACGTAAA

C3a
How many different peptide sequences can be created?
C3b
What is the length of the shortest peptide sequence?
C3c
What is the 2nd amino acid of the longest peptide? Provide a one-letter answer.
C4
The efficient parsing of biological databases is an essential skill for computational biologists. Use any approach, to retrieve information about the human gene located on chromosome 17, with genomic coordinates (in GRCh38.p13 reference genome) of chr17:60,149,942-60,179,021. This gene is translated to a protein involved in many pathological and physiological responses in human diseases. However, these processes are mostly studied using mouse models. Which of the below statement(s) is/are correct?