Big Data Forum Collection: [scm-user] Git for Source Control

I am wondering if anyone is using Git for source control in conjunction with Cloudera Manager configuration changes to services. It would be nice if there was a way to make configuration changes follow the same dev practices in a typical SDLC (code, test, review, deploy). The best case scenario would be to have CM read from master XML config files for each service in Git, then have CM update its stored configs after all changes are merged for a release, and finally, deploy these changes using the CM API to restart the cluster and deploy client configurations via a script.

If anyone has done this or can suggest some type of solution similar to this, please let me know.

I know some people use the /cm/deployment endpoint and check that into source control.

Conceptually, the /cm/deployment API is a representation of all the configuration state of the cluster. It includes the hosts you have, the clusters they're assigned to, all the services and roles, and all the configurations for those services and roles. (Note that it doesn't include whether or not those roles are running or the command history or the commands currently running: that's runtime state as opposed to configuration state.) I'm aware of people spinning clusters reproducibly by using the API (and this includes using the deployment PUT) and also of using this endpoint to notice that folks have gone in and changed settings. It would take a bit of work to drive configuration changes from this representation, and it's not obviously desirable: it's very easy to get yourself into an unhappy state. (e.g., something like enabling HA has a lot of steps which affect more than configuration, and must be done in a specific order.)

That's very good to know. If it were a brand new cluster, this would be useful in quickly bringing one up already setup. Also, if we were to track config changes, then this would be useful by having these files under source control. But, if we have config changes be driven by these files, then we run the possible chance that certain service functions will not work properly because of dependencies and the order they should be implemented. Are my assumptions right?

If this is the case, then it makes sense that this capability to drive from these files is not supported and not publicly advertised.

This would mean the steps are:

1. Use Git to checkout and pull latest

2. Use CM to make config changes and save them

3. Use CM API to get config changes to overwrite files from Git

4. Use Git to commit and push changes

5. Use Git to merge after review of config changes

6. Lastly, use CM API to restart and deploy changes to the cluster

How does this sound?

Your workflow sounds alright. I worry that your other users, however, will be confused by the rigamarole. We also built the "configuration changes" view to look at the materialized configuration files (rather than what's changed in CM), because we felt that those made sense. You'll be lacking that facility.

The whole purpose of this exercise is to appease the Java Dev Managers. They are very comfortable with JSON and have viewers for them. As a matter of fact, they use JSON almost everywhere. The steps I outlined are basically their dev process adapted to CM config changes. I know that they'll never make any config changes, but they want the ability to review diffs in Git Stash and ask questions. Plus, they want the secure feeling that we can version, track, and visibly rollback diffs to a past config. I think, for now, this solution satisfies.

By the way, we already use CM's built-in config diff view before restarting and deploying cluster changes. I definitely prefer this handy tool than doing what the Java dev folks will be using. It's cleaner.

Big Data Forum Collection

2014년 12월 30일 화요일

[scm-user] Git for Source Control

댓글 없음:

댓글 쓰기