Git, Github, Source Control, and You

double doorThink about the thousands upon thousands of completely free and openly available source code packages for professional-level software that you can install and tinker with on your own computer for fun and profit (remember, license terms vary, so always check before installing). It’s hard to estimate, but surely the average computer user has immediate access to software worth many millions of dollars if priced in the old conventional way. That’s good fortune in every sense of the phrase.

Not only can we explore an endless variety of software, we can also collaborate with people in different time zones, in different parts of the world, all without even speaking a word to them (although I highly recommend good communication with far-flung collaborators). When you create an accessible place to put software project files, provide clear tasks to be done, anyone with access can do them and commit their changes to the repository. Collaborators help you move the project forward, without having to mail floppy disks around.

How to store software projects, publicly or privately

Although there are numerous ways that such open source software can be stored and distributed, one of the most popular ways to do this is to create a repository on Github.

Github offers two basic options: you can upload your source files publicly and use their tools and space for free, or you can pay a rather small fee to keep your source code private to you or your group. You need to decide if you are okay with the Terms of Service, but for an increasing set of developers, Github is the place to store their work. Make your files public, on Github and anyone can clone a copy of your project, but if you are fine with that, proceed and enjoy the openness.

Happily, you need not be traditional software developer to use Github. Writers, often working in Markdown or HTML or some other text-based format, store their files in Github repositories as well.

My philosophy, not original to me but increasingly common, is that documentation and source code should live together in the same repository. Often, docs and source code are hooked together through context-sensitive help for example.  But keeping even standalone documentation in the same place as the other project files just makes sense.  Source code isn’t required–you could just as easily put your crowdsourced novel on Github, even if there’s zero code behind it.

Github is not the only site where projects can be stored. You may want to check into bitbucket.org, powered by Atlassian, to find one that works for you.

Github and git

Hadoop is a framework of primary importance in the Big Data world, and its open source code is available in GitHub to anyone who wants to  take it for a spin.

Hadoop is a framework of primary importance in the Big Data world, and its open source code is available in GitHub to anyone who wants to take it for a spin.

Git, the source control program developed for the Linux open source project, powers Github and predates it by many years. With git, any user can clone your repository and then do what he wants with the copy of your files. However, your original repository remains untouched unless you consent to push to the original repository. Git refers to the process of sharing changes to repositories as pull requests.

If you are a technical writer feeling a bit overwhelmed by using git and Github, I highly recommend the tutorials that Sarah Kiniry has developed here: http://technicolorwriter.com/why-agile-writers-need-to-use-git/

If you follow this philosophy of keeping docs and code together, you may want to use text-based formats for your documentation. Few tools do proper tracking of binary files in a repository. With text, you can easily do a diff between different versions (commits) of the file and see what the differences are.

For images, purists often use the text-based SVG (Scalable Vector Graphics) format. But you can use standard formats such as PNG or JPG and just accept you will have to manually compare different versions of the image should the need arise.

Sarah’s session at STC Summit 2015 covered how git allows you to do archeology in your project. By following the commit trail, you can see the work that developers have done. If some developer forgot to tell you about a small but crucial detail, you can find it in these logs. If the developers use verbose commit comments (which some implementations force), your chances of figuring things out increase even further. In my experience, a combination of reading source-control comments, participating in code reviews, and following JIRA make it pretty hard to hide software changes if the technical writer is paying attention at all.

The psychology of “commits”

As writers, many of us want our work to be perfect before others see it. It’s painful for us to reveal rough, choppy work. Yet, in order to commit your work, and follow a typical Agile workflow, you must get used to not being perfect. The culture of your workplace and specific project type (regulatory or RFP work for instance) influence the workflow, but experience indicates that team would prefer to see an imperfect something early rather than nothing at all for weeks, and then have to then tell you that you were wrong all along. When you start with something imperfect, the feedback process allows you to make your work better. Keep in mind that you should find a way to be comfortable with the final state of the document before it is released to a wider audience, even when it’s less perfect than you’d like. At a 2015 Write the Docs session, a writer from Atlassian stated bluntly that with their continuous delivery process, they had to accept the docs would sometimes not be completely up to date. Be honest and communicate with your team and your managers about what you can do and what they expect.

Git versus SVN (Subversion)

SVN (Subversion) and git are both source control systems, two ways of creating repositories to which you can commit changes, and keep exact track of everything that has ever happened to your codebase. But the two programs have some implementation differences, and somewhat different philosophies. You may want to start with git, because having Github freely available makes it very easy to learn.

I am more familiar with SVN, having used it since 2008 at jobs where the software is not open source. We want the SVN trunk to become the releasable product, and every developer works in a branch, often several developers (writers or coders) work in the same branch. In my experience, once technical writers and programmers work together in this way, they both will end up doing some of each (coding and documentation), in a way that fits their culture. SVN allows a user to lock down individual files or an entire branch, if it’s important to prevent others from making changes immediately. SVN is a good fit for a centralized culture, whether with small or large teams. SVN also has a reputation for being easier to learn initially, with numbered commits that make the logs easier to follow.

However, git has plenty of advantages of its own. It promotes a decentralized view of the world, with no central repository taking a privileged position. This is a view that works well on many open source projects. With git, you can commit regularly to your branch even if you do not have network access. With SVN, each commit requires that you have network access, which might discourage you from committing as often as you should. With a large decentralized team, network problems that might prevent developers from being able to connect to the central repository are more likely.

The Git Bash shell is one way to interact with git, but other shell and GUI options are available too.

The Git Bash shell is one way to interact with git, but other shell and GUI options are available too.

JIRA, an Atlassian tool which has become indispensable at many companies for its bug-tracking, feature-tracking and sprint-planning capabilities, can work with either git or SVN. Code review, conducted in an open manner, is an important part of the development process (and if docs are stored as code, code review works for docs too).

For code review, I use Code Collaborator, which works very nicely with SVN, and my research indicates CC can work very nicely with git too. Many other code review tools are available too.

I am neglecting other source control programs such as perforce and Mercurial. However, the same general principles apply. In general, if you want to start right now with a source control system for your own projects, use git. If you are working for a company that uses SVN or another source control system, and it is properly implemented, you will be fine. Just make sure to commit following your team’s rules, and do so frequently. You cannot keep track of changes if you are not committing these changes, and you could lose your work if you do not commit regularly. How often? Your team may have its own rules, but my opinion is never less than once per day, if working on the file at all.

Each of SVN and git have commands that scare even experienced users. For me, that command in SVN is switch, because it is so easy to mess up a branch using it. In git, so I have read, a scary command is rebase. But you have to try pretty hard to make an irrecoverable change, other than if you do not commit at all.

The Advantage of Text-based Tools for Code Reviews

Once you get used to the easy diff comparisons possible with different text files, working with binary formats can be very frustrating. That said, you can commit binary files as you continue to work on them, and you can always go back to a previous version, even if you will have to do manual examination to determine if you have the old version you want. But code review tools for binary files are much harder to use.

Binary formats are typically proprietary. For example, if using unstructured FrameMaker to make .fm files, only those employees who have FrameMaker can view the original files. You can hand around your PDFs to SMEs, and perhaps there are other review tools, but I find the clarity of the reviews I get with a code review tool to be superior.

Github supports its own version of Markdown, a very simple text-formatting system that works well for technical writers who integrate their docs with the code. Developers love Markdown because it is human-readable, easy to write with, and can be composed in any text editor, whether emacs or Oxygen or Notepad.

Github also makes it easy for a group with multiple operating systems to work together. One developer can have a Mac, another might use Linux, and another Windows, but that is irrelevant to git and Github. In my experience, there can be some difficulty mixing SVN on Linux with SVN on Windows due to subtle program differences.

Given that DITA uses XML files, which are text, DITA could work well with a workflow that integrates docs into code, although anyone unfamiliar with DITA will not be able to edit the files, which makes DITA somewhat less open than a human-readable format like Markdown or restructuredText.

Your Github site as a portfolio

I have frequently read about hiring managers who say that they want to see a software developer’s work on Github. Of course, many of us work for companies that do not have open-source software, so we certainly cannot put that work on Github for public viewing. Nonetheless, Github is an excellent place to store projects that you can make publicly available. If you have done the work yourself, your commits will tell the story to anyone who examines the logs. Other than for outright plagiarism, you cannot fake the work that you do in a Github (or SVN) repository.

Github for collaboration

Many writers find Google Docs to be an excellent solution for collaborative writing. That’s what I used when Tom Johnson edited the September 2014 edition of STC Intercom issue about API documentation to which I contributed an article.

But Github is a great way to collaborate also. While, you cannot see each other typing in real time, you can easily go back and forth to see each other’s changes. If you and your collaborator each do work that conflicts with the other’s, Github guides you on how to resolve this conflict.

What’s available on Github?

Many open-source projects have moved to Github. Just do a search on a topic that interests you, such as “music”, and you will find at least one project that relates. When you visit an open-source project website, you likely can click a link to its Github repository.

Also of interest to technical writers, the Github API documentation and the general Github documentation are models of clear, simple documentation literally used by millions. Don’t fear commitment, take a look and decide for yourself how collaborating on Github can work for you.

Subscribe to TechWhirl via Email