Introduction
Personal
Motivation
Bioinformatics
Statistics
Coding
Bursary
Summary

Summary

Introduction

I have read the Personal Data Processing and Storage and I agree to have my data processed and stored by the NGSchool Society. NO
I have read the Code of Conduct and declare I will follow it. NO

Personal

Not filled!!!

Motivation

M1
Briefly describe your up-to-date scientific interests and achievements
e.g. study programs completed, scientific fields you have worked in, work-related achievements. Every achievement is valid.
Characters:  0/1000
M2
Briefly describe your skills relevant to the topic of summer school and the computational biology field. Which of those skills would you like to improve during the NGSchool2025? What other skills would you like to acquire and how can NGSchool help you with it?
e.g. programming proficiency, used bioinformatic tools, types of analyses
Characters:  0/1000
M3
Describe one project you participated in, along with its practical implications for the research field. What was your role? Which challenges (big and small) did you have to overcome and how did you tackle them?
If you are describing your first project and it seems easy and not very impressive, do not worry. As long as you have worked on the problem and learned something, it is important and relevant for this section.
Characters:  0/1000
M4
Besides learning domain-specific skills, how will you benefit from attending the school? How will you use and share what you have learned at NGSchool2025?
We want to foster collaboration and maximize the impact and reach of NGSchool, please tell us how you plan to use your gained knowledge and how you will share it with more people.
Characters:  0/1000

Bioinformatics

B1
You performed bulk transcriptomic sequencing of mouse blood samples. However, when you tried to align the reads to a reference genome, you found: 0% uniquely mapped reads, 10% multi-mapped reads, and 90% unmapped reads. Which of the following options cannot explain the failure in mapping:
B2
In this 2013 paper, Christenson et al. investigated the role of miR-638 microRNA in Chronic Obstructive Pulmonary Disease (COPD). The figure below is an ECDF plot that shows the effect of miR-638 inhibition on COPD fibroblast gene expression. Based on the plot, which of the following statement(s) is/are correct?
B3
You are analyzing genomic data from an organism with an unknown reference. Which of the following genome assembly strategies would be the most appropriate in this case?
B4
Ziff et al., 2023 describes the transcriptional landscape of Amyotrophic Lateral Sclerosis (ALS), a disease that causes motor neuron loss and often involves TDP-43 protein abnormalities. The analysis involved transcriptomic profiling of in vitro motor neuron samples generated using induced pluripotent stem cells derived from ALS patients and non-ALS controls (CTRL). Based on the volcano plot below, which of the following statements is/are false?
B5
To study the role of Dlx5 in mouse pup development, you used CRISPR-Cas9 technology to knock out that gene. CRISPR-Cas9 looked for a place in the genome that should be targeted, and Cas9 introduced a break. What are the primary DNA repair mechanisms the cell uses after CRISPR-Cas9’s activity, taking into account that it is a transcription factor that requires the 5'-TAATTA-3' consensus sequence for DNA binding?

Statistics

S1
Last year, 300 people applied for NGSchool. Their mean age was 27.2 (with a standard deviation of 4.5) years and their registration scores were normally distributed with a mean of 80 and a variance of 20. The easiest 6 questions were answered correctly by 100% participants and the hardest one was answered correctly by only one participant. Assuming that scoring is the same this year, what is the probability of receiving a score higher than 90?
*This is not the real participant data, a creative license was applied to design the above question.
S2
While studying the effects of a new drug on immune cell function, you measured the proliferation rate of T-cells in the presence and absence of the drug. Which of the following statistical tests would be most appropriate for analyzing your data?
S3
A researcher developed a logistic regression model that predicted the rate of lung fibrosis after chest radiotherapy based on patient sex, the percentage of lung tissue exposed to >20 Gy and single nucleotide polymorphisms in genes encoding for metalloproteinases. She validated the model in an independent population and found that sensitivity decreased from 98% to 94% and specificity decreased from 73% to 68% relative to performance on the initial study population. How can she assess model calibration in this scenario?
S4
The prevalence of a metabolic disease in the population is 1 in 40,000. You designed a diagnostic method based on assessing the concentration of amino acid X in the spinal fluid using a cutoff value of 22 μmol/L. After running some tests, you determine that while using your new method 95% of tested healthy individuals receive a negative diagnosis, but 2% of people having the disease are incorrectly diagnosed. Assuming your method were to be used as a mass screening tool across a hypothetical population of 2 million individuals (with general health metrics in line with national averages for metabolic markers and distribution of ages reflecting a typical urban population), calculate the approximate number of expected false positives.
S5
Which of the following statement(s) is/are correct?
S6
For the Markov chain on the right, what is the probability that if we start in state 1, we return there in exactly three steps?
Please use the decimal format (e.g. 0.12, 0.99)
S7a
Which of the illustrated hazard rates (alpha, beta, and gamma) corresponds to each of the survival functions a, b and c?
S7b
Looking at the hazard rates, which profile suits the moniker “memoryless” the best?

Coding

C1
Given an input text file Download, parse the records to find a number made when the first digit and the last digit in each line is combined (in that order). What is the sum of all the numbers? Note that the same digit could be the first and the last digit in a line.
For example, if the lines are:
eightg1
4ninejfpd1jmmnnzjdtk5sjfttvgtdqspvmnhfbm
78seven8
6pcrrqgbzcspbd
The numbers for each line are 11, 45, 78 and 66 respectively. The sum of all the numbers would be 200.
Characters:  0/64
C2
Reading any number from left to right, if all the digits are in increasing order (for example 134468) we can call such a number an “increasing number”. Similarly, if all the digits are in decreasing order (for example 66420) we can call it a “decreasing number”. For this question, we shall call a positive integer that is neither increasing nor decreasing (for example 155349) a “bouncy number”. Clearly, there cannot be any bouncy numbers below one hundred, but just over half of the numbers below one thousand (525 exactly) are bouncy. In fact, the lowest number for which the proportion of bouncy numbers first reaches 50% is 538. Surprisingly, bouncy numbers become more and more common and by the time we reach 21780 the proportion of bouncy numbers is equal to 90%. Find the lowest number for which the proportion of bouncy numbers is exactly 99%.
Characters:  0/64
C3a
How many different peptide sequences can be created?
C3b
What is the length of the shortest peptide sequence?
C3c
What is the 2nd amino acid of the longest peptide? Provide a one-letter answer.
C4
The efficient parsing of biological databases is an essential skill for computational biologists. Use any approach, to retrieve information about the human gene located on chromosome 17, with genomic coordinates (in GRCh38.p13 reference genome) of chr17:60,149,942-60,179,021. This gene is translated to a protein involved in many pathological and physiological responses in human diseases. However, these processes are mostly studied using mouse models. Which of the below statement(s) is/are correct?

Bursary

BU1
Travel grant justification
Characters:  0/500
BU2
Fee waiver justification
Characters:  0/500