> The paper itself is enough to reproduce all the results.
No, this is almost never the case. It should be. But it cannot really be. There are always more details in the code than in the paper.
Note that even the code itself might not be enough to reproduce the results. Many other things can matter, like the environment, software or library versions, the hardware, etc. Ideally you should also publish log files with all such information so people could try to use at least the same software and library versions.
And random seeds. Make sure this part is at least deterministic by specifying the seed explicitly (and make sure you have that in your log as well).
Unfortunately, in some cases (e.g. deep learning) your algorithm might not be deterministic anyway, so even in your own environment, you cannot exactly reproduce some result. So make sure it is reliable (e.g. w.r.t. different random seeds).
> In my field many scientists tend to not publish the code nor the data.
This is bad. But this should not be a reason that you follow this practice.
> clean and organize the code for publishing
This does not make sense. You should publish exactly the code as you used it. Not a restructured or cleaned up version. It should not be changed in any way. Otherwise you would also need to redo all your experiments to verify it is still doing the same.
Ok, if you did that as well, then ok. But this extra effort is really not needed. Sure it is nicer for others, but your hacky and crappy code is still infinitely better than no code at all.
> it will increase the surface for nitpicking and criticism
If there is no code at all, this is a much bigger criticism.
> publishing the code will be removing the competitive advantage
This is a strange take. Science is not about competing against other scientists. Science is about working together with other scientists to advance the state of the art. You should do everything to accelerate the process of advancement, not try to slow it down. If such behavior is common in your field of work, I would seriously consider to change the field.
I agree with almost all of this, however I believe that publishing random seeds is dangerous in its own way.
Ideally, if your code has a random component (MCMC, bootstrapping, etc), your results should hold up across many random seeds and runs. I don’t care about reproducing the exact same figure you had, I want to reproduce your conclusions.
In a sense, when a laboratory experiment gets reproduced, you start off with a different “random state” (equipment, environment, experimenter - all these introduce random variance). We still expect the conclusions to reproduce. We should expect the same from “computational studies”.
The thing is, if you want to ignore someone's random seed, you can if it's provided. If it's not provided and you need it to chase down why something isn't working, you're SOL.
I think being able to re-run code with a paper is great, but I think we should be sure to distinguish it from scientific replication.
When replicating physics or chemistry, you build fresh the relevant apparatus, demonstrating that the paper has sufficiently communicated the ideas and that the result is robust to the noise introduced not just by that "random state" you discuss but also to the variations from a trip through human communication.
I acknowledge that this is substantially an aside, but it's something I like to surface from time to time and this seemed a reasonable opportunity.
> And random seeds. Make sure this part is at least deterministic by specifying the seed explicitly (and make sure you have that in your log as well).
> Unfortunately, in some cases (e.g. deep learning) your algorithm might not be deterministic anyway, so even in your own environment, you cannot exactly reproduce some result. So make sure it is reliable (e.g. w.r.t. different random seeds).
Publishing the weights of a trained model allows verification (and reuse) of results even before going to the effort of reproducing it. This is especially useful when training the model is prohibitively expensive.
To some extent Science is a big project to understand how the universe works. We should hope to understand the phenomena that we investigate to the point where library versions and random seeds don't matter so much -- assuming the code is not buggy, and the statistics are well done, those factors shouldn't come into play.
However, sometimes chemists find out that the solvents they use to clean their beakers are leaving trace amounts of residue, which accidentally contribute to later reactions.
> Ideally you should also publish log files with all such information so people could try to use at least the same software and library versions.
looks to me like a result that requires borrowing a particular lab's set of beakers. Not what we're looking for.
No, this is almost never the case. It should be. But it cannot really be. There are always more details in the code than in the paper.
Note that even the code itself might not be enough to reproduce the results. Many other things can matter, like the environment, software or library versions, the hardware, etc. Ideally you should also publish log files with all such information so people could try to use at least the same software and library versions.
And random seeds. Make sure this part is at least deterministic by specifying the seed explicitly (and make sure you have that in your log as well).
Unfortunately, in some cases (e.g. deep learning) your algorithm might not be deterministic anyway, so even in your own environment, you cannot exactly reproduce some result. So make sure it is reliable (e.g. w.r.t. different random seeds).
> In my field many scientists tend to not publish the code nor the data.
This is bad. But this should not be a reason that you follow this practice.
> clean and organize the code for publishing
This does not make sense. You should publish exactly the code as you used it. Not a restructured or cleaned up version. It should not be changed in any way. Otherwise you would also need to redo all your experiments to verify it is still doing the same.
Ok, if you did that as well, then ok. But this extra effort is really not needed. Sure it is nicer for others, but your hacky and crappy code is still infinitely better than no code at all.
> it will increase the surface for nitpicking and criticism
If there is no code at all, this is a much bigger criticism.
> publishing the code will be removing the competitive advantage
This is a strange take. Science is not about competing against other scientists. Science is about working together with other scientists to advance the state of the art. You should do everything to accelerate the process of advancement, not try to slow it down. If such behavior is common in your field of work, I would seriously consider to change the field.