ieeextreme2013 questions

67
IEEEXtreme Programming Competition 7.0 1. Lenovo Problem Lenovo is working hard to help scientists in biology especially in Bioinformatics to build computers that can handle big biological computations and handling huge set of information. In this problem you will need to help solve one of those tasks. In Bioinformatics, the term “Sequence Alignment” is used to describe the task of arranging two or more biological sequences (DNA, RNA or Protein) in such a way that regions of similarity between the sequences can be easily identified and thus conclusions about the evolutionary relationships between them may be inferred. In order to make the conserved regions more apparent, usually each sequence is represented in a distinct row within a matrix, whereas gaps are also inserted between the sequences’ residues so that identical or similar characters are aligned in successive columns [Source: Wikipedia: “Sequence alignment”]. An example of one possible alignment for the sequences Seq1 = ACGCATTCG, Seq2 = ACGAGTGG, Seq3 = CGATTAG is presented at Table 1: Table 1. One possible alignment for the Sequences: Seq1 = ACGCATTCG, Seq2 = ACGAGTGG, Seq3 = CGATTAG (Gaps are denoted by a dash “-“)

Upload: kanapathipillai-shujeevan

Post on 22-Nov-2015

203 views

Category:

Documents


1 download

DESCRIPTION

The IEEE questions submitted in 2013

TRANSCRIPT

IEEEXtreme 2013 Questions.docx

IEEEXtreme Programming Competition 7.01. Lenovo Problem

Lenovo is working hard to help scientists in biology especially in Bioinformatics to build computers that can handle big biological computations and handling huge set of information. In this problem you will need to help solve one of those tasks.

In Bioinformatics, the term Sequence Alignment is used to describe the task of arranging two or more biological sequences (DNA, RNA or Protein) in such a way that regions of similarity between the sequences can be easily identified and thus conclusions about the evolutionary relationships between them may be inferred. In order to make the conserved regions more apparent, usually each sequence is represented in a distinct row within a matrix, whereas gaps are also inserted between the sequences residues so that identical or similar characters are aligned in successive columns [Source: Wikipedia: Sequence alignment]. An example of one possible alignment for the sequences Seq1 = ACGCATTCG, Seq2 = ACGAGTGG, Seq3 = CGATTAG is presented at Table 1:

Table 1. One possible alignment for the Sequences: Seq1 = ACGCATTCG, Seq2 = ACGAGTGG, Seq3 = CGATTAG (Gaps are denoted by a dash -)

This alignment indicates that the first residue (A) of Seq 1 is alignment to the first residue (A) of Seq 2 and to nothing (gap denoted by a dash) to Seq 3. In a similar way, from the second column of the alignment it is made clear that the second residue of Seq 1 (C) is aligned to the second residue (C) of Seq 2 and to the first residue (C) of Seq 3.

Obviously, this is only one possible alignment for the aforementioned sequences and many more arrangements are possible. Could you help me create a program that can efficiently generate all possible unique (gapped) alignments for a given set of sequences?

TaskYour task is to develop a program that can efficiently generate all the possible unique alignments for a given set of sequences.

To better understand the task of the present problem, please consider the following two examples:

Example 1:if the input sequences were: Seq 1 = AC and Seq 2 = GG, then the following 13 unique alignments are possible:

Example 2: if the input sequences were: Seq 1 = ACT and Seq 2 = GC, then the following 25 unique alignments are possible:

Since the number of possible alignments grows exponentially by the number of the sequences to be aligned and the number of residues forming each sequence, you can safely assume that only toy examples (i.e. only a few sequences comprised of a limited number of residues) will be considered.

As it may be inferred by the examples above: All the sequences in each alignment have the same length (number of columns) and whenever it is required gaps are inserted between the sequences residues If no gaps need to be inserted then they are skipped (like for example in the alignment 1 of Example 1). In other words, columns containing only gaps are considered irrelevant to the alignment task and should not be printed. In every alignment the order of the sequences is kept (e.g. in the examples above, always Seq 1 is in the first line of the output followed by Seq 2) Also, the order of the residues forming each sequence is always respected (i.e. if the sequence is ACT then always the A appears first then the C and finally the T) Regarding the order of the alignments in the final result set, the following rules have to be respected: The alignments should be sorted in increasing order according to the length (number of columns) of the alignment If two or more sequences have the same length, then these alignments should be sorted in alphabetical order according to the sequences in the alignment: Initially, the first sequence of the alignments will be used to decide on the order of the sequences If there still exist two or more alignments that have exactly the same length and also are identical in terms of the first sequence, then the alphabetical order of the second sequence of the alignments will be used to decide the ranking of these alignments. If there are still two or more alignments that have exactly the same length and also are identical in terms of the first and second sequence, the algorithm will continue likewise using the third, fourth etc sequence of these alignments. Please also refer to the provided examples where the alignments are listed in the proper order according to the above sorting rules. For instance, in Example 1, Alignment 1 is positioned first in the list of results since its length (2) is smaller than the length of any other alignment in the result set. Also, in compliance with the ordering rules, Alignment 2 of Example 1 is ordered before Alignment 4 since the first sequence of Alignment 2 (AC-) precedes alphabetically the corresponding first sequence of Alignment 4 (A-C). Moreover, the ordering between Alignment 2 and Alignment 3 (which both have length equal to 3 and sequence AC- as the first sequence) is decided by the alphabetical order of the second sequence in these alignments, where the second sequence of the Alignment 2 (G-G) precedes alphabetically the corresponding second sequence of Alignment 3 (-GG). In order to avoid issues from printing huge result sets, instead of outputting all the possible alignments for a given set of sequences, your program should output only the total number of possible alignments as well as the alignments at some predefined positions that would be given as input to the program.

Input:Your program will receive the following input and in the following order from the Standard Input Stream: N: This number will always be a positive integer value smaller than 6 representing the number of sequences to be aligned Then N lines will follow representing each one of the sequences to be aligned. Each sequence will be comprised of M residues belonging to the alphabet A={A, C, G, T} A positive integer value K will follow representing the number of alignments that would have to be outputted to the screen Then K lines will follow each one containing an integer value representing a position in the ordered list of alignments that should be printed on the standard output stream (Note: You should consider these positions to be 1-based meaning that if for example the position equals to 4, then the 4th alignment in the ordered list should be printed).

Output:Your program should output to the Standard Output Stream the total number of possible alignments for the given set of sequences as well as the alignments (as contained in the ordered result set) at the positions specified from the given input. In particular, the output should be formatted as follows: The first line of the output should always be: Possible Alignments: e.g. Possible Alignments: 25 Note: There is a space between the colon and the number Then K sets of lines should follow each one printing the alignment at the specified position of the ordered list of results. The format for the alignments should be as follows: If no alignment exists at the given position then the following line should be printed: There is no alignment at position: e.g. There is no alignment at position: 1000 Note: There is a space between the colon and the position If there exists an alignment at the given position, then at first the following line should be printed: Alignment at Position: e.g. Alignment at Position: 15 Note: There is a space between the colon and the position And then the alignment at this position of the ordered list should printed with the sequences ordered as they were given in the input. If for example the input sequences were Seq 1 = ACT and Seq 2 = GC (as in Example 2), and the given position was 15, the program should print: -ACT G--C Note: Each sequence ends with a newline character and does not contain any spaces before or after the sequenceSample Input 12ACGG6-911351008

Sample Output 1Possible Alignments: 13There is no alignment at position: -9Alignment at Position: 1ACGGAlignment at Position: 13--ACGG--Alignment at Position: 5A-C-GGThere is no alignment at position: 100Alignment at Position: 8AC----GG

Sample Input 22ACTGC519221432

Sample Output 2Possible Alignments: 25Alignment at Position: 1ACTGC-Alignment at Position: 9AC-T--GCAlignment at Position: 22-ACT-G---CAlignment at Position: 14-ACTG-C-There is no alignment at position: 32

2. Acadox ProblemAcadox vision is to provide innovative and modern Learning Management technologies that empowers the faculty and students to engage and collaborate in a simple and efficient manner.

One of the professor, who loves Acadox so much, posted on his page as a teaser problem for his students to prepare for their programming exam. The problem posted was as follows:

Develop program that emulates a simple hexadecimal calculator that uses postfix notation. Perform the operations of addition, subtraction, logical and, logical or, logical not, and logical exclusive or.

DescriptionSince Acadox is social environment for learning. The professor posted the following description on his page: The programmers calculator accepts a string of hexadecimal digits and operators in reverse Polish (postfix) notation then produces the result. Input digits represent 16 bit unsigned binary bit strings.

InputThe program must accept a sequence of operators and hexadecimal digits following the postfix form, as follows.

Digits: Leading zeros are optional, alphas are not case sensitive: {[0-9 | A-F | a-f]}1-4

Operators :

Each input item is delimited by a white space. An input stream is terminated with a new-line. No more than 20 items are accepted.OutputThe program must display the result of evaluating the entire postfix expression as single hexadecimal string, with leading zeros and upper case letters. If any input is invalid, the string ERROR is displayed.

All operations are bitwise there is no representation of negative quantities. An overflow: x + y > FFFF results in FFFF. An underflow, x y < 0000, results in 0000.

Sample Input 1:1 1 +

Sample Output 1:0002

Sample Input 2:F 1 -

Sample Output 2:000E

Sample Input 3:F - 1

Sample Output 3:ERROR

3. IEEE Computer Society ProblemIEEE Computer Society was working on a little secret project in one of their labs around the world to build a little Robbie with human sense and lots of intelligence. While being in the lab all the time, the little Robbie decided to break himself out and search for a new adventure. While the little Robbie was looking for new adventure he decided to enter an ancient Incas maze located nearby. As little Robbie is too young, he has not yet learned any sophisticated ways to get out of the maze, but fortunately he was taught at school that the right-hand wall follower algorithm is always guaranteed to lead to an exit in the case of a simply connected maze. Nevertheless, this maze is a dynamic one and every time Robbie makes a move, the maze topology changes. Would Robbie be able to finally get out of the maze?TaskYour task is to develop a program that can efficiently simulate the scenario of little Robbie being trapped in a dynamic maze.a) The Linear Congruential GeneratorSince randomness will be necessary for this problem, your first subtask includes the development of a custom pseudorandom number generator that will be used in all cases where a random number has to be drawn.A Linear Congruential Generator (LCG) represents one of the oldest and best-known pseudorandom number generator algorithms.The generator is defined by the recurrence relation:X(n+1) = (a * Xn + c)mod mwhere: X is the sequence of pseudorandom values, m is the modulus, m > 0 a is the multiplier 0 < a < m c is the increment 0 < c < m X0 is the seed or start value 0 < X0msource:wikiFor the case of our LCG, we will assume the X, m, a, c and X0 values will be of integer type. Also we will always usem = 232a = 1664525c = 1013904223and only the start seed value will change (will be an input parameter provided from the standard input stream).Whenever a random positive integer value within a specific range [min, max] is required, then such a value may generated using the absolute value of our custom LCG and the modulus operator. For example, if we want to draw a random integer in the range [1, 100] (i.e. between 1 and 100 inclusive) then the following formula can be used:randnum = abs(X(n+1)mod 100+1)For example, the first 15 randomly generated values forX0 = 999m = 232a = 1664525c = 1013904223alongside with their corresponding [1,100] mapping are presented at Table 1.

b) The MazeThe maze is composed of M rows and N columns (M and N being positive integer values) and consequently of M*N cells. Such an exemplary 10x10 maze is graphically depicted at Figure 1.Regarding the maze, please also consider the following definitions as they will be used throughout the problem description: Corner Cells: The four cells located at the corners of the maze (highlighted in gray at Figure 1). These cells always accommodate the four pillars of the maze and are inaccessible to the robot. Also they cannot be used as start or exit positions. Border Cells: The cells positioned at the four edges of the maze - excluding the Corner cells (highlighted in red at Figure 1). These are the only cells that can be used as start and exit positions. Inner Cells: All the remaining cells of the maze - excluding the corner and border cells (highlighted in yellow at Figure 1). These cells cannot be used as start or exit positions, but may be accessible by the robot. Start Position: The position where the robot starts when it first enters the maze. This cell can be any of the Border cells. Maze Exit: The cell leading to the exit of the maze. This can be any of the Border cells, but it must be different to the start position (i.e. the exit and the start position cannot be located on the same Border cell)Each one of the maze cells may or may not have a wall on each one of the 4 directions: North, South, East, and West (please refer to the compass at Figure 1). Walls (and equivalently openings) that are shared by adjacent cells will be accounted for both cells respectively. For example the East side wall of the rightmost cell in Figure 2b, should also be considered as a West side wall for the middle cell. Equivalently, the East opening of the middle cell should also be considered as an opening for the West side of the leftmost cell. The decision about whether a specific cell will have a wall on each one of the four possible directions will be based on a given positive integer Probability value1