Collaboration on Open Data Science Projects: Glossary

Key Points

The reproducibility crisis and open science
  • Open Science practices aims to transform research by making it more reproducible, transparent, reusable, collaborative, accountable, and accessible to society

Become a champion of open (data) science
  • Make your data and code available to others

  • Make your analyses reproducible

  • Make a sharp distincion between exploratory and confirmatory research

Intro to Version Control with Git
  • Git provides a way to track changes in your project.

  • Git is a software for version control, and is separate from GitHub.

Using GitHub
  • You can use GitHub to store your project online where you or others can access it from a central repository.

  • You can use GitHub to store your projects so you can work on them from multiple computers.

Code Collaboration using GitHub
  • To contribute to someone else’s project, you should fork their repository.

  • All development work should be done on a new branch. Each branch should implement one feature.

  • Once you’ve implemented a new feature, push to your repository and create a pull request on the original repo.

Licensing and citation
  • People who incorporate General Public License (GPL’d) software into their own software must make their software also open under the GPL license; most other open licenses do not require this.

  • The Creative Commons family of licenses allow people to mix and match requirements and restrictions on attribution, creation of derivative works, further sharing, and commercialization.

  • People who are not lawyers should not try to write licenses from scratch.

  • It’s highly recommended to get a digital object identifier (DOI) for your dataset or code

Glossary