development

생성 된 코드를 소스 제어에 저장해야합니까?

big-blog 2020. 8. 24. 21:05
반응형

생성 된 코드를 소스 제어에 저장해야합니까?


이것은 제가 참여하고있는 토론입니다. 더 많은 의견과 관점을 얻고 싶습니다.

DB 작업을 처리하기 위해 빌드 타임에 생성되는 일부 클래스가 있습니다 (이 특정 경우에는 SubSonic을 사용하지만 질문에 대해 그다지 중요하지 않다고 생각합니다). 생성은 Visual Studio에서 사전 빌드 단계로 설정됩니다. 따라서 개발자 (또는 공식 빌드 프로세스)가 빌드를 실행할 때마다 이러한 클래스가 생성 된 다음 프로젝트로 컴파일됩니다.

이제 일부 사람들은 이러한 클래스를 소스 제어에 저장하면 코드가 자신의 환경에서 생성 된 것과 일치하지 않을 경우 혼란을 일으킬 수 있다고 주장합니다.

일반적으로 블랙 박스로 취급 되더라도 코드의 역사를 추적 할 수있는 방법을 갖고 싶습니다.

어떤 주장이나 반대 주장?


업데이트 : 나는 결정적인 대답이 하나 있다고 정말로 믿었 기 때문에이 질문을했습니다. 모든 답변을 살펴보면 그런 답변은 없다고 확신 할 수 있습니다. 결정은 둘 이상의 매개 변수를 기반으로해야합니다. 아래 답변을 읽으면이 문제를 결정할 때 스스로에게 물어봐야 할 질문 유형에 대한 매우 좋은 지침이 될 수 있습니다.

위에서 언급 한 이유 때문에이 시점에서 수락 된 답변을 선택하지 않겠습니다.


소스 제어에 저장하는 것은 가치보다 더 많은 문제입니다.

어떤 가치가 되려면 빌드를 할 때마다 커밋을해야합니다.

일반적으로 우리는 생성 된 코드 (idl, jaxb 등)를 내가 작업하는 소스 제어 외부에 두며 문제가되지 않았습니다.


내 개인 저장소의 소스 트리에 대한 변경 사항을 표시하고 싶을 때마다 모든 '생성 된 파일'이 변경된 것으로 표시되고 커밋이 필요합니다.

자동 생성 된 변경 사항이 아닌 수행 된 실제 업데이트 만 포함하는보다 깔끔한 수정 목록을 갖고 싶습니다.

그대로두고 빌드 후에 생성 된 각 파일에 '무시'를 추가합니다.


소스 코드 제어에 넣으십시오. 미래의 개발자가 작성한 모든 기록을 사용할 수 있다는 장점은 동기화 후 가끔 재 구축하는 사소한 고통보다 큽니다.


이런 식으로보세요 : 개체 파일을 소스 제어로 확인합니까? 생성 된 소스 파일은 개체 파일, 라이브러리 및 실행 파일과 같은 빌드 아티팩트입니다. 동일하게 취급되어야합니다. 대부분은 생성 된 개체 파일과 실행 파일을 소스 제어로 검사해서는 안된다고 주장합니다. 생성 된 소스에 동일한 인수가 적용됩니다.

생성 된 파일의 기록 버전을 확인해야하는 경우 해당 소스의 기록 버전과 동기화하고 다시 빌드 할 수 있습니다.

모든 종류의 생성 된 파일을 소스 제어로 확인하는 것은 데이터베이스 비정규 화와 유사합니다. 있습니다 가끔 (일반적으로 성능)이 작업을 수행하는 이유는, 그러나 데이터가 비정규되면 정확성과 일관성을 유지하기 위해 더 힘들어 질수록이 큰 관심 만 수행해야합니다.


생성 된 코드 (또는 다른 아티팩트)를 소스 제어에 추가하지 말아야합니다. 생성 된 코드가 주어진 입력에 대해 동일하면 비교하려는 버전을 확인하고 비교할 코드를 생성 할 수 있습니다.


나는 DRY 원칙이라고 부릅니다. 빌드시 이러한 코드 파일을 생성하는 데 사용되는 "소스 파일"이 저장소에 이미있는 경우 동일한 코드를 "두 번"커밋 할 필요가 없습니다.

또한 언젠가 코드 생성이 중단되는 경우 이러한 방식으로 일부 문제를 방지 할 수 있습니다.


나는 정말로 당신이 그들을 체크인해야한다고 생각하지 않습니다.

확실히 생성 된 코드의 모든 변경은 소음이 될 것입니다 (환경 간 변경) 또는 다른 결과 (예 : DB 변경)의 결과로 변경됩니다. DB의 생성 스크립트 (또는 기타 종속성)가 소스 제어에있는 경우 생성 된 스크립트도 왜 필요합니까?


아니요, 세 가지 이유가 있습니다.

  1. 소스 코드는 현재 또는 이전 시점을 기준으로 애플리케이션의 스냅 샷을 재현하는 데 필요하고 충분합니다. 그 이상도 이하도 아닙니다. 이것이 의미하는 것은 누군가가 체크인 한 모든 것에 대한 책임이 있다는 것입니다. 일반적으로 저는 제가 작성한 코드에 대해 책임을지는 것이 기쁩니다. 그러나 제가 작성한 코드의 결과로 생성 된 코드는 아닙니다.

  2. 나는 누군가가 현재 일 수도 있고 아닐 수도있는 중간 코드를 사용하여 기본 소스에서 빌드를 바로 가기하려고하는 유혹을 받고 싶지 않습니다 (더 중요한 것은 제가 책임을 받아들이고 싶지 않다는 것입니다.). 일부 사람들은 부분 빌드를 기반으로하는 중간 코드에서 충돌을 디버깅하는 데 의미없는 프로세스에 휩싸 이도록 유혹합니다.

  3. 일단 소스 제어에 들어가면 a. 거기있는 것, b. 그것은 현재이고 c. 다른 모든 것과 안정적으로 통합 할 수 있습니다. 여기에는 더 이상 사용하지 않을 때 제거하는 것도 포함됩니다. 그 책임이 적을수록 좋습니다.


일반적인 규칙은 no입니다 .하지만 코드 생성에 시간이 걸리는 경우 (DB 액세스, 웹 서비스 등으로 인해) 소스 제어에 캐시 된 버전을 저장하고 모든 사람의 고통을 덜어주고 싶을 수 있습니다.

도구는 또한이를 인식하고 필요할 때 소스 제어에서 체크 아웃을 처리해야합니다. 너무 많은 도구가 이유없이 소스 제어에서 체크 아웃하기로 결정합니다.
좋은 도구는 캐시 된 버전을 건드리지 않고 (파일의 시간 단계를 수정하지 않고) 사용합니다.

또한 파일을 수정하지 않도록 생성 된 코드 내부에 큰 경고를 넣어야합니다. 맨 위에있는 경고로는 충분하지 않으며 12 줄마다 반복해야합니다.


생성 된 DB 코드도 저장하지 않습니다. 생성 된 DB 코드는 소스 파일에서 특정 버전으로 자유롭게 가져올 수 있습니다. 그것을 저장하는 것은 바이트 코드 등을 저장하는 것과 같습니다.

이제 주어진 버전에서 사용되는 코드 생성기를 사용할 수 있는지 확인해야합니다! 최신 버전은 다른 코드를 생성 할 수 있습니다.


놔둬.

생성 된 파일을 체크인하는 경우 뭔가 잘못된 것입니다. 잘못된 점은 다를 수 있으며 빌드 프로세스가 비효율적이거나 다른 것일 수 있지만 좋은 생각 이라고볼 수 없습니다 . 기록은 생성 된 파일이 아니라 소스 파일과 연결되어야합니다.

차이점을 해결하고 빌드에 의해 더 이상 생성되지 않는 파일을 찾은 다음 삭제하는 등의 시도를하는 사람들에게 골칫거리 일뿐입니다.

A world of pain awaits those who check in generated files!


There is a special case where you want to check in your generated files: when you may need to build on systems where tools used to generate the other files aren't available. The classic example of this, and one I work with, is Lex and Yacc code. Because we develop a runtime system that has to build and run on a huge variety of platforms and architectures, we can only rely on target systems to have C and C++ compilers, not the tools necessary to generate the lexing/parsing code for our interface definition translator. Thus, when we change our grammars, we check in the generated code to parse it.


arriving a bit late ... anyway ...

Would you put compiler's intermediate file into source version control ? In case of code generation, by definition the source code is the input of the generator while the generated code can be considered as intermediate files between the "real" source and the built application.

So I would say: don't put generated code under version control, but the generator and its input.

Concretely, I work with a code generator I wrote: I never had to maintain the generated source code under version control. I would even say that since the generator reached a certain maturity level, I didn't have to observe the contents of generated code although the input (for instance model description) changed.


In some projects I add generated code to source control, but it really depends. My basic guideline is if the generated code is an intrinsic part of the compiler then I won't add it. If the generated code is from an external tool, such as SubSonic in this case, then I would add if to source control. If you periodically upgrade the component then I want to know the changes in the generated source in case bugs or issues arise.

As far as generated code needing to be checked in, a worst case scenario is manually differencing the files and reverting the files if necessary. If you are using svn, you can add a pre-commit hook in svn to deny a commit if the file hasn't really changed.


It really depends. Ultimately, the goal is to be able to reproduce what you had if need be. If you are able to regenerate your binaries exactly, there is no need to store them. but you need to remember that in order to recreate your stuff you will probably need your exact configuration you did it with in the first place, and that not only means your source code, but also your build environment, your IDE, maybe even other libraries, generators or stuff, in the exact configuration (versions) you have used.

I have run into trouble in projects were we upgraded our build environment to newer versions or even to another vendors', where we were unable to recreate the exact binaries we had before. This is a real pain when the binaries to be deplyed depend on a kind of hash, especially in secured environment, and the recreated files somehow differ because of compiler upgrades or whatever.

So, would you store generated code: I would say no. The binaries or deliverables that are released, including the tools that you reproduced them with I would store. And then, there is no need to store them in source control, just make a good backup of those files.


The job of configuration management (of which version control is just one part) is to be able to do the following:

  • Know which changes and bug fixes have gone into every delivered build.
  • Be able to reproduce exactly any delivered build, starting from the original source code. Automatically generated code does not count as "source code" regardless of the language.

The first one ensures that when you tell the client or end user "the bug you reported last week is fixed and the new feature has been added" they don't come back two hours later and say "no it hasn't". It also makes sure they don't say "Why is it doing X? We never asked for X".

The second one means that when the client or end user reports a bug in some version you issued a year ago you can go back to that version, reproduce the bug, fix it, and prove that it was your fix has eliminated the bug rather than some perturbation of compiler and other fixes.

This means that your compiler, libraries etc also need to be part of CM.

So now to answer your question: if you can do all the above then you don't need to record any intermediate representations, because you are guaranteed to get the same answer anyway. If you can't do all the above then all bets are off because you can never guarantee to do the same thing twice and get the same answer. So you might as well put all your .o files under version control as well.


The correct answer is "It Depends". It depends upon what the client's needs are. If you can roll back code to a particular release and stand up to any external audit's without it, then you're still not on firm ground. As dev's we need to consider not just 'noise', pain and disk space, but the fact that we are tasked with the role of generating intellectual property and there may be legal ramifications. Would you be able to prove to a judge that you're able to regenerate a web site exactly the way a customer saw it two years ago?

I'm not suggesting you save or don't save gen'd files, whichever way you decide if you're not involving the Subject Matter Experts of the decision you're probably wrong.

My two cents.


There are good arguments both for and against presented here. For the record, I build the T4 generation system in Visual Studio and our default out-of-the-box option causes generated code to be checked in. You have to work a bit harder if you prefer not to check in.

For me the key consideration is diffing the generated output when either the input or generator itself is updated.

If you don't have your output checked in, then you have to take a copy of all generated code before upgrading a generator or modifying input in order to be able to compare that with the output from the new version. I think this is a fairly tedious process, but with checked in output, it's a simple matter of diffing the new output against the repository.

At this point, it is reasonable to ask "Why do you care about changes in generated code?" (Especially as compared to object code.) I believe there are a few key reasons, which come down to the current state of the art rather than any inherent problem.

  1. You craft handwritten code that meshes tightly with generated code. That's not the case on the whole with obj files these days. When the generated code changes, it's sadly quite often the case that some handwritten code needs to change to match. Folks often don't observe a high degree of backwards compatibility with extensibility points in generated code.

  2. Generated code simply changes its behavior. You wouldn't tolerate this from a compiler, but in fairness, an application-level code generator is targeting a different field of problem with a wider range of acceptable solutions. It's important to see if assumptions you made about previous behavior are now broken.

  3. You just don't 100% trust the output of your generator from release to release. There's a lot of value to be had from generator tools even if they aren't built and maintained with the rigor of your compiler vendor. Release 1.0 might have been perfectly stable for your application but maybe 1.1 has a few glitches for your use case now. Alternatively you change input values and find that you are exercisig a new piece of the generator that you hadn't used before - potentially you get surprised by the results.

Essentially all of these things come down to tool maturity - most business app code generators aren't close to the level that compilers or even lex/yacc-level tools have been for years.


Both side have valid and reasonable argument, and it's difficult to agree on something common. Version Control Systems (VCSs) tracks the files developers put into it, and have the assumption that the files inside VCS are hand crafted by developers, and developers are interested in the history and change between any revision of the files. This assumption equalize the two concepts, "I want to get this file when I do checkout." and "I am interested in the change of this file."

Now, the arguments from both sides could be rephrase like this:

  • "I want to get all these generated files when I do checkout, because I don't have the tool to generate them in this machine."
  • "I should not put them into VCS, since I am not interested in the change of this file."

Fortunately, it seems that the two requirements are not conflicting fundamentally. With some extension of current VCSs, it should be possible to have both. In other words, it's a false dilemma. If we ponder a while, it's not hard to realize that the problem stems from the assumption VCSs hold. VCSs should distinguish the files, which are hand crafted by developers, from files which are not hand crafted by developers, but just happens to be inside this VCS. For the first category of files, which we call source files (code) usually, VCSs have done great job now. For the latter category, VCSs have not had such concept yet, as far as I know.

Summary

I will take git as one example to illustrate what I mean.

  • git status should not show generated files by default.
  • git commit should include generated files as snapshot.
  • git diff should not show generated files by default.

PS

Git hooks could be used as a workaround, but it would be great if git supports it natively. gitignore doesn't meet our requirement, for ignored files won't go into VCSs.enter code here


I would argue for. If you're using a continuous integration process that checks out the code, modifies the build number, builds the software and then tests it, then it's simpler and easier to just have that code as part of your repository.

Additionally, it's part and parcel of every "snapshot" that you take of your software repository. If it's part of the software, then it should be part of the repository.


I would say that yes you want to put it under source control. From a configuration management standpoint EVERYTHING that is used to produce a software build needs to be controlled so that it can be recreated. I understand that generated code can easily be recreated, but an argument can be made that it is not the same since the date/timestamps will be different between the two builds. In some areas such as government, they require a lot of times this is what's done.


In general, generated code need not be stored in source control because the revision history of this code can be traced by the revision history of the code that generated it!

However, it sounds the OP is using the generated code as the data access layer of the application instead of manually writing one. In this case, I would change the build process, and commit the code to source control because it is a critical component of the runtime code. This also removes the dependency on the code generation tool from the build process in case the developers need to use different version of the tool for different branches.

It seems that the code only needs to be generated once instead of every build. When a developer needs to add/remove/change the way an object accesses the database, the code should be generated again, just like making manual modifications. This speeds up the build process, allows manual optimizations to be made to the data access layer, and history of the data access layer is retained in a simple manner.


I (regretfully) wind up putting a lot of derived sources under source control because I work remotely with people who either can't be bothered to set up a proper build environment or who don't have the skills to set it up so that the derived sources are built exactly right. (And when it comes to Gnu autotools, I am one of those people myself! I can't work with three different systems each of which works with a different version of autotools—and only that version.)

This sort of difficulty probably applies more to part-time, volunteer, open-source projects than to paid projects where the person paying the bills can insist on a uniform build environment.

When you do this, you're basically committing to building the derived files only at one site, or only at properly configured sites. Your Makefiles (or whatever) should be set up to notice where they are running and should refuse to re-derive sources unless they know they are running at a safe build site.


Absolutely have the generated code in source control, for many reasons. I'm reiterating what a lot of people have already said, but some reasons I'd do it are

  1. With codefiles in source control, you'll potentially be able to compile the code without using your Visual Studio pre-build step.
  2. When you're doing a full comparison between two versions, it would be nice to know if the generated code changed between those two tags, without having to manually check it.
  3. If the code generator itself changes, then you'll want to make sure that the changes to the generated code changes appropriately. i.e. If your generator changes, but the output isn't supposed to change, then when you go to commit your code, there will be no differences between what was previously generated and what's in the generated code now.

I would leave generated files out of a source tree, but put it in a separate build tree.

e.g. workflow is

  1. checkin/out/modify/merge source normally (w/o any generated files)
  2. At appropriate occasions, check out source tree into a clean build tree
  3. After a build, checkin all "important" files ("real" source files, executables + generated source file) that must be present for auditing/regulatory purposes. This gives you a history of all appropriate generated code+executables+whatever, at time increments that are related to releases / testing snapshots, etc. and decoupled from day-to-day development.

There's probably good ways in Subversion/Mercurial/Git/etc to tie the history of the real source files in both places together.


If it is part of the source code then it should be put in source control regardless of who or what generates it. You want your source control to reflect the current state of your system without having to regenerate it.


Looks like there are very strong and convincing opinions on both sides. I would recommend reading all the top voted answers, and then deciding what arguments apply to your specific case.

UPDATE: I asked this question since I really believed there is one definitive answer. Looking at all the responses, I could say with high level of certainty, that there is no such answer. The decision should be made based on more than one parameter. Reading the other answers could provide a very good guideline to the types of questions you should be asking yourself when having to decide on this issue.

참고URL : https://stackoverflow.com/questions/893913/should-i-store-generated-code-in-source-control

반응형