Algorithms and Computability

5.11.2019 - first milestone: documentation, 33% of the grade
19.11.2019 - second milestone: implementation, 33% of the grade
3.12.2019 - third milestone: experimentation, 34% of the grade

5.11.2019 - first milestone, 33% of the grade
19.11.2019 - second milestone, 33% of the grade
3.12.2019 - third milestone: experimentation, 34% of the grade
The first and the second milestone must cover documentation and implementation. Which one comes first is up to you.

## Late delivery:

It is allowed to submit one stage with one week delay without negative points (one-time). In all other cases, a delay causes a reduction of the final grade by 10% (for example, instead of 4.5 you will get 4).

## The scope of the documentation:

detailed problem and detailed solution description. Form: you shall print out the document and bring it in person, documentation must be pieced together (no loose sheets of paper!) The expected level of detail: pseudocode and detailed text-based description. The whole team must be present to discuss the documentation.

### Easy task, initial documentation content:

1. Introduction. 2. Problem description. 3. Solution. 3.1. Solution description. 3.2. Solution pseudocode. 3.3. Pseudocode description. 3.4. Solution correctness proof. 4. Conclusion.

### Easy task, a few remarks on the implementation:

allow the user to read in their set of bricks, if not have a default set of bricks; display bricks that will be processed and the board, use colours; launch the algorithm; allow (an option, not the default setting) a readable text-based output to track computations; after computations are done display the board (the backpack) with the bricks, use colours, display close to the backpack bricks which were not placed in the backpack; display computations time; if multiple threads are used for computations, allow the user to define the degree of parallelism; we can assume an upper limit of, say 25 bricks;

### Easy task, final documentation content (parts will be taken from the initial documentation):

1. Introduction. 2. Problem description. 3. Solution. 3.1. Solution description. 3.2. Solution pseudocode. 3.3. Pseudocode description. 3.4. Solution correctness proof. 3.5. Applied optimisation tricks. 3.6. Threads-based implementation (here the description how - precisely - parallel execution is organised and limits of the parallelism, such as overhead) 4. Brief technical documentation 5. Theoretical complexity analysis. 6. Experiments 6.1. Used data sets (divided by types) 6.2. Results (detailed, by problem size, by data set type, by the degree of parallelism) 6.3. Empirical complexity analysis. 7. Conclusion and discussion on what could be done more efficiently.

## Implementation stage:

Form: presentation using lab computers (obligatory). The whole team must be present to discuss the code and how it works. Reminder: the code must compile and run on lab computers. Windows OS is preferred. A detailed and precise analysis of complexity must be prepared and printed out.

## The scope of the final milestone:

possible are touch-ups in the code. Final documentation: report from experiments. Form: CD. The CD must be appropriately labeled. Content (directories): doc, exe, src.

There are two types of data set that you will process: multivariate time series and fuzzified time series. We will discuss the input data during labs - you will need to preprocess it to obtain training data for the defined model.

### Multivariate time series:

Cricket: 12 classes (k=12), 6 dimensions (c=6), best ACC = 82.69%, base ACC (Naive Bayes) = 37.80%, base ACC (C45 tree) = 34.55%, base ACC (logistic regression) = 21.13%
NonInvasiveFetalECGThorax: 42 classes (k=42), 2 dimensions (c=2), best ACC = 95.39%, base ACC (Naive Bayes) = 82.94%, base ACC (C45 tree) = 79.10%, base ACC (logistic regression) = 74.72%
SonyAIBORobotSurface: 2 classes (k=2), 2 dimensions (c=2), best ACC = 95.98%, base ACC (Naive Bayes) = 88.76%, base ACC (C45 tree) = 75.85%, base ACC (logistic regression) = 88.11%
ToeSegmentation: 2 classes (k=2), 2 dimensions (c=2), best ACC = 95.97%, base ACC (Naive Bayes) = 56.18%, base ACC (C45 tree) = 56.21%, base ACC (logistic regression) = 51.76%
UWaveGestureLibrary: 8 classes (k=8), 3 dimensions (c=3), best ACC = 96.83%, base ACC (Naive Bayes) = 87.94%, base ACC (C45 tree) = 78.24%, base ACC (logistic regression) = 70.43%
You can download these files from here. I would like to acknowledge that the original files were selected from the repository under this link. 22.10.2019: please note that I have crossed out Sony and Toe sets!!! These two files have incorrect format! You can skip those, or you can replace them with, say GestureMidAir or AllGestureWiimote (on the webpage).

### Fuzzified time series:

CinC_ECG_torso: 4 classes (k=4), best ACC = 98.27%, base ACC (Naive Bayes) = 84.73%, base ACC (C45 tree) = 60.44%, base ACC (logistic regression) = 37.91%
FaceFour: 4 classes (k=4), best ACC = 99.56%, base ACC (Naive Bayes) = 78.65%, base ACC (C45 tree) = 64.83%, base ACC (logistic regression) = 81.03%
HandOutlines: 2 classes (k=2), best ACC = 92.39%, base ACC (Naive Bayes) = 83.02%, base ACC (C45 tree) = 84.81%, base ACC (logistic regression) = 83.62%
Haptics: 5 classes (k=5), best ACC = 51.69%, base ACC (Naive Bayes) = 42.72%, base ACC (C45 tree) = 35.88%, base ACC (logistic regression) = 33.64%
ItalyPowerDemand: 2 classes (k=2), best ACC = 97.03%, base ACC (Naive Bayes) = 90.01%, base ACC (C45 tree) = 94.81%, base ACC (logistic regression) = 93.89%
Lighting2: 2 classes (k=2), best ACC = 83.70%, base ACC (Naive Bayes) = 67.77%, base ACC (C45 tree) = 69.59%, base ACC (logistic regression) = 60.52%
Meat: 3 classes (k=3), best ACC = 99.38%, base ACC (Naive Bayes) = 97.07%, base ACC (C45 tree) = 93.95%, base ACC (logistic regression) = 99.28%
OliveOil: 4 classes (k=4), best ACC = 90.13%, base ACC (Naive Bayes) = 86.37%, base ACC (C45 tree) = 76.37%, base ACC (logistic regression) = 88.37%
RefrigerationDevices: 3 classes (k=3), best ACC = 78.46%, base ACC (Naive Bayes) = 36.68%, base ACC (C45 tree) = 45.59%, base ACC (logistic regression) = 33.90%
ScreenType: 3 classes (k=3), best ACC = 67.61%, base ACC (Naive Bayes) = 43.58%, base ACC (C45 tree) = 40.40%, base ACC (logistic regression) = 36.61%
You can download these files from here, here and here. I would like to acknowledge that the original files were selected from the repository under this link.

1. Introduction. 2. Problem description. 3. Solution. 3.1. Solution description. 3.2. Solution pseudocode. 3.3. Pseudocode description. 3.4. Solution correctness justification (no proof, references to literature can be made). 4. Conclusion.

### Advanced task, final documentation content (parts will be taken from the initial documentation):

1. Introduction. 2. Problem description. 3. Solution. 3.1. Solution description. 3.2. Solution pseudocode. 3.3. Pseudocode description. 3.4. Solution correctness justification. 4. Brief technical documentation (1 to max. 2 pages) 5. Theoretical complexity analysis. 6. Experiments 6.1. Used data sets - comment on their statisitcal properties (just one paragraph of text, there are two types of sets "fuzzified" and "multivariate") 6.2. Results - the influence of window (detailed analysis how the change of window size matters for different data sets, this concerns both multivariate and fuzzified sets) 6.3. Results - the influence of dimensionality (detailed analysis how the dimensionality of data sets influences the accuracy, this concerns only fuzzified sets) 6.4. Results - the influence of stride on the outcome (concerns both dataset types, detailed analysis) 6.5. Results - the comparison between accuracy on data "chunks" (made by applying the window to each row) and accuracy as a whole (computed for each line separately via voting strategy) 6.6. Results - the influence of other parameters (tell if your code allows other parameters and what is their role and how they influenced the computation outcomes, for example if you implemented simmulated annealing or if you play with different activation functions or if you added extra layers, etc.), by problem size, by data set type) 6.7. Results: best models for each data set (take set by set and tell which model was the best: tell which window, which stride, print confusion matrix for test, compute accuracy, attach printout of each best network with visible weights) 6.8 Empirical complexity analysis. 7. Conclusion and discussion on what could be done more efficiently. (In conclusion give a table where you show base acc (from my page), best acc (from my page), your average accuracies and best accuracies.

### Progress (per team):

SI,KM,JB - code shown and accepted, documentation accepted, waiting for the CD
AA,TN,WC - code shown and accepted, documentation 80%
JK,FC,UB - documentation shown, 95%, code shown, GUI needs corrections
DK,SM,YI - 90%, code passed,
ŁN,PP,KK - code passed, first documentation submitted, 100%, we were discussing some issues on the order of tiles
PO,IO,AS - 85%, code passed
TH,IY - code passed, documentation 100%
KC,KL,BS - code passed with one week delay (no minus points), introductory documentation passed 100%
MM,HH,MA,BS - documentation passed 100% but with 2 weeks delay one week for "free" one week with penalty points that will be subtracted at the end

SI,KM,JB
ŁN,PP,KK 4.0