Skip to end of metadata
Go to start of metadata

The Community for Data Integration is experimenting with group learning with online platforms.

We've heard that a lot of people would like to start with the basics with Git. Maybe you know how to populate a repository on GitLab, and submit an issue, and sometimes get the command line to do what you want with a lot of copying and pasting, but it would be nice to start from the very beginning.

However, sometimes it's hard to self-motivate to finish an entire online course. Let's do it together with a peer group and a set deadline!


What:  DataCamp free course: Introduction to Git for Data Science

Where: https://www.datacamp.com/courses/introduction-to-git-for-data-science

When: Between August 8 and September 12, 2018 (approximate 4 hour commitment)

Who: You, and Leslie, and peers from the Community for Data Integration.

How:

  1. Sign up with your email on the list at goo.gl/JmSiUZ
  2. Every Wednesday from August 14-Sept 5, 2018, you'll get a reminder to do the next section.
  3. Every Friday from August 16-Sept 7, 2018, Leslie will post her progress, any questions or comments she had, and will encourage you to do the same. (Probably as comments on this wiki page.)
  4. We will all complete the module before the next CDI monthly meeting and celebrate in our new Git knowledge.


Questions? Send them to lhsu@usgs.gov or post them here. (You must be a CDI member and signed in to post a comment here. Email cdi@usgs.gov to become a member.)



I want this badge. 

  • No labels

27 Comments

  1. Part 1: Basic Workflow, completed! 

    It took me about 40 minutes (including snack breaks, etc.).

    Impressions: I ended up having more impressions than I thought I would - feel free to simply comment (or virtually pat yourself on the back) if you completed the exercise - long, wordy comments are not mandatory.

    1. Overall, I liked it, but I wish that it started even farther back at the beginning. i.e., Have us create our own directories and files, make that directory a repo, then start with where this exercise starts. I spent years editing databases in SQL but to this day do not know exactly how to create a database from scratch, and that irks me.
    2. I noted this intro statement and thought it was a bit opinionated but that is okay, I'll go with it for now: "Version control isn't just for software: books, papers, parameter sets, and anything that changes over time or needs to be shared can and should be stored and shared using something like Git."
    3. raw diff syntax is hard to get immediately. I've seen things in (for example) Github Desktop with the green and red coloring and that is more intuitive. However it has been hinted to me that power users don't mess with the GUIs.
    4. Using nano was okay, but I wish we had a choice of text editors if we were more familiar with one of the more popular ones. (My first and only computer science course forced vi upon us and the commands have stuck with me since.) Can we make nano wrap text?
    5. This might be a bit of a harsh intro to someone who hasn't taken a course or had real life experience that requires the command line. Hopefully it wouldn't discourage anyone since there are explicit instructions and hints, lots of respect to anyone starting cold. 
    6. Real-time learning in a browser is cool, but if I didn't take any notes, I'd forget everything immediately. I took notes on the back of my water bill envelope. 
    7. There are two exercises that I "passed" but still might not completely get.Interpreting the diff syntax was a bit of a guess (the supplied info did not answer all of my questions, 3 lines removed? 4 lines added? still a bit confused), and when we were supposed to type in a longer commit message in nano, what was the appropriate message? I wasn't sure how the few sentences given related to what message was appropriate.
    8. I definitely disliked the advertisement pop up to upgrade/enroll with no X-box to minimize at the end of the course, but Esc did the trick. I guess they are trying to run a business.
    9. I wish it were a bit more obvious how to exit out after the first section - it jumped right into the second section and I probably just missed my cue.
    10. The point system - I have mixed feelings about this. I do like accumulating and seeing points, but I don't like getting points subtracted for typos in filenames that I could easily fix. Perhaps I just don't like getting points subtracted in general.
    1. You're in luck on point number 1! They go over creating new repos in one of the later modules!

  2. Part 1: Basic Workflow, completed!

    I agree with most of Leslie's comments. Here are a few thoughts:

    1. Some of the tutorial guidance text proved either in accurate to me, or I just didn't get it. Primary example was the "What is in a diff?" exercise. For example, it says "Here, the diff output shows that 4 lines from line 1 are being removed and replaced with new lines." referring to the sample text above it, but that sample text only appears to show a single line that was replaced instead of the 4. Maybe I just misunderstand the lesson :o|
    2. I like the keyboard shortcuts. Mouse over any area that is shortcut enabled and it shows you the command to go there. 
    3. I'm happy they went straight to command line. I've done some limited Git stuff, almost entirely using GUI interfaces. It works, but learning what's happening behind the scenes is nice. I'll be happy when I can become a true power user and drop the GUI crutch that's holding me back!
    4. Point system is pointless, for me. I've done other similar online training and never felt compelled, encouraged, or discouraged by them. They help some people with focus and critical thinking so I'm glad they are part of the application, but I just ignore them (smile)
    5. Looking forward to the next lesson.
  3. Completed Part 1.

    As noted, I also didn't see any official end to section 1 and was part way through 2 before I realized that I could stop!

    I hate the command line - always have, always will.  GUIs are great.

    It's nice to have all the detail in the output messages explained.

  4. Completed Part 1.

    Enjoy it.

    I was slightly frustrated by the presentation of the advertisement at the end. Fumbled around trying to get back to the Free course. But, made me come here and read all your great comments and learn I was not alone.

    I attempted accessing the course in Chrome on an Android phone and iPad. Almost success except for Control characters in Nano. iPad had a Logitech Slim Folio keyboard but Control characters did not work. Did not spend much time trying. Feels like full computer OS required?

  5. From Chris Sherwood:

    (Chris was having some trouble saving a comment, which included a URL to a Github repo, and I am having trouble saving it as well when the URL is included! I'm trying to figure out what is the issue, stay tuned!)

    Hi all

    These are all great comments.

    I toiled away on a long comment, and Confluence could not save it. It is reproduced here:

    On GitHub: 

    https://github.com/csherwood-usgs/Demo_repository/blob/master/Git_workflow.md

    (Looks like comments are having issues when there is a live URL hyperlink in the text, the text above is not hyperlinked, please copy and paste in your browser. I'll try to find out if we can change that. - LH)

  6. Completed Part 1, and then before I knew it, I completed Part 2.  I guess I am caught up now.

    I have used Git a fair amount and I agree that someone without any experience might have a difficult time on a couple of the questions. 

    For example, how many lines changed -22,3 and +22,4; I thought the explanation of number pairs for blocks to be misleading because in the explanation example -1,4 and +1,4 had four lines change...leading to understanding the second number of the pair is the number of lines change, but it is not.  The second number of the pair is the number of lines in the block of lines being evaluated for a changed block of lines.  I do not think that any type of special math with number pairs will produce the number of lines changes.  Any one else know a way to determine number of lines changes other than counting lines with - and +?

    It might be my reading skills, but there was an exercise that we were suppose to know the other file that changed: data/eastern.csv.  I recall it saying use use git add to stage the other changed file.  There was no additional information about which file had changed.  Maybe it did say use git status, but I missed that part and I was afraid of doing the right thing and losing points so I followed what I thought the instructions were and did the wrong thing and actually lost points doing that too.  That was when I decided not to worry about a perfect score...and to slow down and read more carefully.

    1. Thanks for explaining the bit about the number of lines changing. That really confused me when I went through the exercise! So, basically, it's telling you the line where to start looking for changes and then the second number tells you how many lines after that you'll want to look through?

      I was also confused by the exercise about adding the other file that changed. I don't think they did a particularly good job of what they expected you to do. I think I recall doing git status and found that there were four files that changed. 

      1. Agreed. Calling both numbers in the pair lines is confusing. The exercise though had -22,3 +22,4 and the answer was 1. This left me wondering if that was because there was only one start line changed (both were started at 22) or if within those lines, three were removed and four added for a sum difference of 1. Honestly, I wanted to call that 7 changed lines but that wasn't even an option to lose points for.

        Mark, my points are a mess because I keep checking my work to see what looks different after I follow the first step. When I enter anything but what the next prompt calls for, I get dinged. I'm going to 'fail' but it's helping my recall of previous commands.

        Leslie thanks so much for this, I'm really enjoying the course and the sharing.

        * I haven't used git at all before.

      2. Madison, that is correct, the second number is the number of lines from each file that are in the block below that you will want to look at to see the change. Below is the example from the exercise. I added line numbers starting at "22:" and an annotation in parenthesis right of the line contents. In this file data/northern.csv the original version had 24 lines and the revised file had a total of 25 lines total. In the block of lines given by git diff there are 3 lines from the original file (--- a/data/northern.csv) and the revised file (+++ b/data/northern.csv) that are the same and 1 line in the revised file that was added. Again, git provides context of the change by showing surrounding lines. In this example, the revision is simply an addition of one record to the end of the file. Git could have returned only the added line 25, but in other cases context is important to understand the change.

        $ git diff data/northern.csv
        diff --git a/data/northern.csv b/data/northern.csv
        index 5eb7a96..5a2a259 100644
        --- a/data/northern.csv
        +++ b/data/northern.csv
        @@ -22,3 +22,4 @@ Date,Tooth
        22: 2017-08-13,incisor (same in both - + files)
        23: 2017-08-13,wisdom (same in both - + files)
        24: 2017-09-07,molar (same in both - + files)
        25: +2017-11-01,incisor (added in + revised file)
  7. I was actually working in Git today and became frustrated with my lack of knowledge on working with multiple branches and collaborating with others. I kept trying to checkout branches before committing my local work or trying to push my local work to remote repositories that were out of sync with my local repo. Needless to say, I was inspired to start and finish this whole course. The last two chapters were super helpful for me! I wish I would have taken this course months ago! Thanks, Hsu, Leslie , for finding the course and inspiring the rest of us to do some learning! Now, I just need to remember what I learned the next time I'm using Git.

  8. Finished modules 2 and 3.  Thanks for the tip with the escape key, Leslie!  I got my first ad screen.

    I'm looking forward to the next section on branching.  I could use a good tutorial on that.

  9. Part 2: Repositories, completed.

    Now I have a bunch more git commands written down on various post-it notes that I should digitize.

    Things I looked up/noted/wondered about:

    1. Hash function definition in wikipedia: https://en.wikipedia.org/wiki/Hash_function
    2. Why did all of the commits for the examples have the same timestamp? I was hoping they would have different timestamps so that I could check that I was looking at the most recent one. Is it reasonable to have three different commits in the same second?
    3. I had to look up a command from Part 1 on my notes written on the back of an envelope.
    4. Finally it is explained to me what kind of files would be included in .gitignore. I always wondered what files would I want to ignore? "Temporary or intermediate files" was the example explanation given here. I’ve also heard the example of files with secure/internal info that you don’t want to be placed in the public repository.
    5. What do the -n and -f flags stand for after git clean? Explaining that would help me to remember the flags. OK, I looked it up at https://git-scm.com/docs/git-clean, -n is for --dry-run: don’t actually remove anything, just show what would be done; -f is for --force: sort of a check that you don’t clean things by accident, as I understand it.
    6. What is the difference between two-dash flags and one-dash flags? I was familiar with one-dash flags previously, but why --list?
      1. Answer one: Usually - options can be chained together, like pacman -Syu being equivalent to pacman -S -y -u, and -- options generally take a parameter as in ./configure --prefix=/usr (https://askubuntu.com/questions/492544/what-are-the-differences-between-and-in-commands)
      2. Answer two, with more answers at this site: A single hyphen can be followed by multiple single-character flags. A double hyphen prefixes a single, multicharacter option. https://serverfault.com/questions/387935/whats-the-difference-betwen-the-single-dash-and-double-dash-flags-on-shell-comm 
    7. Why is the example email rep.loop - what does that stand for? Can't you have a better example email that I don't wonder about?

    You now have insight into all of the questions I have, and why I was always really confused in school.

    1. Note: We submitted a ticket/question and it looks like the issues with posting URLs and other things with special characters in comments is fixed now. It was a firewall thing.

  10. I just finished the course and overall I found it very helpful in connecting some of the dots in my previous understanding of Git.

    I too struggled with the documentation and comprehension of the Diff command. I am also still working on my understanding of Checkout. Its weird how it is used to revert changes as well as create new branches.

    Despite those gripes I found the class did a good job presenting material that is confusing at its core. I plan to follow up with another course or two in the future.

    And, I really enjoyed having the group incentive to keep me on track - Thanks Leslie for wrangling us together!

  11. I've finished through module 3 and so far so good. However, being new to Git (other than downloading files) I feel like I'm missing some stuff, like when it talks about editing a file in my text editor and then doing something with it on Git. I am going to actually make a Git account, and am hoping that will clear up the disconnect I feel between working with files locally and console commands in Git. 

    1. I agree...just talking about Git is only part of the story...how you use Git with GitHub or GitLab in your daily workflow is the bigger part. Since I had problems with this wiki, I posted my personal workflow on my GitHub account here: https://github.com/csherwood-usgs/Demo_repository/blob/master/Git_workflow.md


      If you make a GitHub account and want to use if for work, make your username bennion-usgs, and let the guys at gs_github@usgs.gov know.


      1. I agree too. Today I created a new folder in my Desktop and ran a git init. Then added a Jupyter Notebook file. After adding and commiting I then created the Remote connection to GitHub - but instead of calling it 'origin' like all the tutorials I've done, I changed it to 'github'. Now when I push or pull I see git push github master. That makes much more sense to me than git push origin master - small change but helps me.

        Then I created a README file through Git Hub. This forced me to pull my new GitHub repo back to my working folder on my desktop before I could push anything else. After completing these simple steps on my own I feel much more confident in my overall understanding of WHEN and WHY to use git and git hub, not just the HOW.

  12. Thank you Leslie. Your efforts are pulling me deeper into GIT than previous learning attempts.

    Thank you Chris. Those personal details are exactly what I need.

  13. Parts 3 “Undo” and 4 “Working with Branches” completed.

    In general I can follow the prompts and get through the exercises, but I don’t understand all of the concepts 100%.

    1. Here’s some documentation on git checkout, which may or may not clear up some confusion about how it is used to “discard changes” and “create new branches.” https://git-scm.com/docs/git-checkout

    2. Do NOT attempt to use the nano editor if your browser window width is not large enough (80 characters?) to display a whole line. That really frustrated me when editing the file. Solution: expand width of browser window.

    3. When the tutorial says “include “a few” of the hash characters to identify the hash, I wonder “how many exactly are necessary?" Just enough to identify the hash from all of the options in the repo? So it could theoretically be one or two characters?” I did not do testing as going “off road” is not really encouraged in these tutorials.

    4. It was nice to complete a full example of merge, stage, commit, but I think I still have issues understanding the “conflicts” for a merge. It seems like it should be very straight-forward, but I completely did not comprehend one of the examples where the answer was that line B was the only one that changed in both files. Do they mean literally “line 2” of the file? When line A (line 1) was deleted, does it matter if line B shifts to line 1, or stays in place in line 2? Something about my brain does not get these explanations. Embarrassingly, I would prefer a GUI with red and green highlighting. Or, I just need more examples of what can and cannot be merged without resolution.

    Overall, as mentioned by others, I would benefit from a real-life-relevant-to-me example. Maybe a Jupyter notebook that multiple people are editing? Or a metadata file? Or a simple R script that someone else edits to improve? But I do think my comprehension has increased since I’ve started. Totally worth it.

    I’m “saving” part 5 for next week.

    1. I propose we all try to collaborate on a repo...for practice and maybe to produce something useful.

      In my Demo repository https://github.com/csherwood-usgs/Demo_repository, I have a markdown document called "Cheat_sheet.md". This is where I plan to put my growing list of crib notes on how to use git. Maybe we could all collaborate on that repo, adding to Cheat_sheet and/or adding other documents.

      There are two ways I know how to do this: 1) I can make you collaborators, in which case you would be able to push commits to the repo, or 2) You can fork the repo, which will give you  a copy that is all yours. You can submit pull requests to me, and I can (at my discretion) merge the changes into the repo. I believe this is the more common model...repos are administered by one or a few people, and the fork / revise / pull request / merge workflow is used to incorporate contributions by others.

      I have not done very much of this, so it is the blind leading the blind. You will have to read about forking on GitHub. It is easy to fork a repo, and get a copy, but it is harder to keep that copy up to date with the original. Here are the instructions: https://gist.github.com/Chaser324/ce0505fbed06b947d962

      I am looking forward to getting some pull requests!

  14. I completed parts 3 and 4.

    This has been a good tutorial.  I have used git a fair amount, but in each lesson I learn something new.  I have not used the stage in the past as a level of saving as it was described in lesson 3.  I can see how that would be useful.

    After lesson 4, I was wondering how best to preserve a definite version of a project while moving forward with a new version.  If version 1 was something determined to have been complete at one time and was then used operationally to produce products (and maybe even shared with partners), but then version 2 was needed when the project expanded, what is the best way to preserve version 1? 

    Alternative approaches:

    1. Make version 1 a branch within the project that will no longer change.
    2. Start a new repository for version 2 that begins by cloning version 1 repository.
    3. Don't worry about it and move forward with version 2 because version 1 can be identified, if needed, by commits within the repository.

    I lean toward #1 but I am not sure of the best practice.

    1. Hi Mark,

      I think the answer to this is tagging. I did a USGS data release, and before moving from the GitLab site to the code.usgs.gov site, I made a branch called release_1.0, and tagged it. So it is easy to go back to that touchpoint.

  15. Chris,

    Thanks for that answer.  Tagging looks like a great function for formal versions.  Here is some documentation https://git-scm.com/book/en/v2/Git-Basics-Tagging

  16. Completed part 5: Collaboration

    Very quick section. Maybe it’s a result of doing this on a Friday afternoon, but I’m still feeling a bit lost. However - I'm very happy to have completed the course!

    This week’s comments:

    1. I am 90% sure but not completely sure that what is referred to in this less as  “git push” is the same as a “pull request” to a GitHub repository.

    2. The example of a dental data directory not compelling to me. Is my dentist using git?

    3. An exercise asks “Please enter a commit message to explain why this merge is necessary.” It would be nice to see more examples of good commit messages to make sure mine was not completely off.

    4. A lot of text shows up in the terminal that is not explained. For example “Merge made by the ‘recursive’ strategy.” I guess a web search or more than an “Intro to Git” could cover that…

    Hope that many of you reach the satisfying end!
  17. I also completed this section. I was a little miffed that I got dinged for typing "git push" instead of "git push origin master"...because origin is the default, and the branch in question is your current branch, so when on master, the short version is the same as the longer explicit version.

    I really like the idea of working on these lessons as a group...I think it works both to motivate, and to add valuable info through the comments. I am less enamored of this set of lessons...as a barely competent git user, I found most of the new stuff to be kind of out of the mainstream, and some of the valuable and basic stuff missing (like tags and simple but complete workflows for various scenarios: working alone, working in collaboration with a small group, working on someone elses project (fork, branch, revise, send pull request).

    I *never* use the editor for commit messages. I always use

    git commit -a -m "some message"

    My messages are frequently along the lines of "still broken" or "almost working".

    My dentist may have started using Git. We just got a bill with revised charges for the last two years, asking for $88.50 more. I think I would fall in ./data/northern.csv.

  18. Completed part 5.

    This was a good idea to go through the course as a group.  Leslie, thank you for organizing this group and keeping us going through completion.  It helped and I learned something from each lesson.