Skip to content
This repository was archived by the owner on Feb 12, 2024. It is now read-only.

Commit a787e80

Browse files
committed
lab6 finished
1 parent 276635c commit a787e80

File tree

1 file changed

+9
-2
lines changed

1 file changed

+9
-2
lines changed

lab6.ipynb

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -640,7 +640,7 @@
640640
"\n",
641641
"### Additional question regarding ReLU gradient implementation\n",
642642
"\n",
643-
"ReLU function may have differences in gradient implementation - some may set gradient to 0 for negative values, some may set it as 0.1 for example (called Leaky ReLu).\n",
643+
"ReLU function may have differences in gradient implementation - some may set gradient to 0 for negative values, some may set it as 0.01x for example (called Leaky ReLu).\n",
644644
"\n",
645645
"### Comparison of speed and accuracy of networks with different number of hidden layers and activation functions\n",
646646
"\n",
@@ -650,10 +650,17 @@
650650
"\n",
651651
"Linear activation function is useless as no matter how many layers and neurons, the output will always be a linear function.\n",
652652
"\n",
653-
"It's also worth noticing that ReLU and tanh activation functions are faster than the sigmoid activation function. The cause of this is that ReLU and tanh activation functions are implemented in numpy, which turns out to be much faster than a vectorized python function implemented for sigmoid.\n",
653+
"It's also worth noticing that ReLU and tanh activation functions are faster than the sigmoid activation function. The cause of this is that ReLU and tanh activation functions are implemented in numpy, which \n",
654+
"turns out to be much faster than a vectorized python function implemented for sigmoid.\n",
655+
"\n",
656+
"In terms of learning speed, the more layers and neurons, the more time it takes to train the network. It's especially visible for networks with tanh and sigmoid activation functions.\n",
654657
"\n",
655658
"### Performance on other datasets\n",
656659
"\n",
660+
"For regression tasks, same architecture and activation functions were used as in the comparison above.\n",
661+
"\n",
662+
"For the classification tasks, all networks had the same architecture of 3 hidden layers with 20 neurons each. The output layer had a tanh activation function to map the output to values closer to 0. Hidden layers were using either ReLU or tanh activation functions as sigmoid has poor implementation and is therefore slower.\n",
663+
"\n",
657664
"#### Regression - steps-large\n",
658665
"\n",
659666
"Model 1 - 1x30 neurons, tahn activation function (on output layer linear activation)\n",

0 commit comments

Comments
 (0)