Sorry for the confusing title.
I’m a student trying to establish myself in STEM.
I interned on a team doing ML for a while and when designing networks we’d encounter hyperparameters like batch size, learning rate, or number/width of layers that we’d have to eyeball the value of as we needed a sane, working value, but didn’t have the time to play about with.
Then I spent a while on a team doing cellular biology. Again, we’d encounter choices like the selection of medium for cells, the length of incubation, etc. that I’d have no idea what to pick if it was up to me.
Since I’m trying to get a grip in these fields, I’d like to understand why the people I was mirroring chose these values, because to me they seemed completely arbitrary. We didn’t get to alter them while completing the project so I never had the opportunity to gain an intuition for how they influence the result and why they selected the values they did.
What should I do? Should I look for the original research papers that investigated these things?
I think in practice you start with something arbitrary and hope that training or updates converge to something reasonable. There is a school of thought that says start with uninformative priors, those that give the least information. There is a famous book by E. T. Jaynes arguing for this. Lots of people swear by it. I tried to read it once and it didn’t make much sense, but maybe I should try again sometime.
https://omega0.xyz/omega8008/JaynesBookPdf.html
Hmm it looks like ML uses the term “hyperparameter” differently from how it is used in statistics. TIL.
https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)