Skip to main content

 

 

NCI Header
The Cancer Genome Atlas

Search GO

 

 


Expanded Program
  More information

Scientific Publications

View a growing list of scientific publications using TCGA Data.


National Cancer Institute National Human Genome Research Institute

| RFA Questions & Answers

Questions & Answers
RFA-CA-09-010 Genome Characterization Centers
and Genome Data Analysis Centers for
The Cancer Genome Atlas Research Network (TCGA)[U24]

January 29, 2009

  • Is there overlap between Type 4 data analysis between GCC and GDAC A?
    There may or may not be a small overlap. The GCCs will analyze their own data and submit levels 1-4 to the DCC. GDAC A can use the level 1-4 results in their integrative analysis from each GCC or they may analyze the data with a different approach. Any overlap could potentially foster communication between the GCCs and GDAC A so that the two groups support each other. The bioinformaticsts, cancer biologists and genomicists in the GDAC A may need the support of the cancer biologists/genomicists in the GCCs to develop the tools for the pipeline while the GCCs may benefit from the bioinformatic insight of the GDAC A. It is important to remember the GDAC A will spend 80% of time working on their primary goal of generating an analysis pipeline. It is also important to remember that this is a collaborative project and everyone working together will make the project stronger.

  • Can you give us a better idea/picture of the TCGA once the NHGRI component is put in? Can you speak about the NHGRI RFA? What is the involvement of NHGRI?
    The NCI cannot specifically address NHGRI’s plans to issue a RFA. The NGHRI staff and their grantees are involved in the project and are providing funding for the sequencing component of TCGA. They are working to develop and implement “next gen” sequencing into TCGA.

  • Can you explain the scenario you mentioned where one GCC might do 50% of a given tumor type?
    Centers that are capable of doing full analysis will be encouraged to do so, and for centers that can characterize an entire tumor type there will be no overlap or distribution of samples to multiple GCCs. The 50% is relevant in scenarios where a center has high through-put capability, but due to various limitations can only do a less than 100% case load. These proposals will still be considered, as NCI is trying to incorporate as many groups as possible.

  • Is a parent applicant of both GCC and GDAC A limited to its own data usage?
    GDAC A will be required to integrate and analyze data from all GCCs and genome sequencing centers (GSCs).

  • If you submit an application for both GCC and GDAC A it is possible that they will not both be funded?
    Yes, it is possible that only one or the other would be funded. These require separate applications, one for each type of center. GCC and GDAC A have separate functions and as such it is possible you may get funding for one, but not the other

  • I notice that the last tumor type has yet to be determined. Any thoughts on when that might be?
    No, not at this time. That is probably going to be the decision of the steering committee once the project begins.

  • In the past the genome centers have been physically localized as are the GCC’s by nature themselves, however it sounds like the GDAC’s B is a distributive network of collaborative interactive people even if they are physically separated would certainly be a working model? Is that the idea? How is it arranged for this RFA?
    Yes. We want to foster a network environment. We want interaction. GDAC B’s can bring their favorite methods for data analysis, but focus is on the network, the development of tools and an analysis pipeline

  • Would that for example include intramural NCI as investigators?
    No, intramural NCI investigators are not included.

  • In GDAC A, is the emphasis on additional analyses to replace/supplement GCC analysis or on providing tools that external users can use to analyze the data by themselves?
    Both to augment GCC analysis and to provide a pipeline for external users to analyze the data by themselves.

  • Can you provide some details on the GDAC A versus B distinction? I got the impression that the linear pathway analysis might fall under GDAC A. Are you saying that the analyses that are in the GBM paper are basically all under GDAC A and we are looking for something beyond the GBM paper?
    Yes. The Nature paper that TCGA published last year, Comprehensive genomic characterization defines human glioblastoma genes and core pathways, (http://www.nature.com/nature/journal/v455/n7216/abs/nature07385.htm) is similar to this and is exactly what we want the GDAC A’s to be doing. They need to generate a pipeline for these types of analysis.

  • Is that true for the type B centers as well? For example, if type B proposed to do novel informatics analysis on one or two platforms. Is the vision to have type B centers develop novel informatics analytical approaches across all platforms or one or two?
    GDAC B is the next level of analysis. It is data driven and adds an additional level of information about the potential for a mutation to be disease causing or relevant to the biology in some way. One can not predict what kind of data the GDAC B’s will find the most informative to generate biological conclusions. They will integrate all the different TCGA data types available. They will have access through the DCC to all the data types generated by GCC, clinical data and all of the analysis developed by the GDAC A’s to build upon.

  • If two or more GDAC A’s are funded is it anticipated that those two or more will focus on a subset of analysis or subset of the tumor types that are being processed?
    If more than one GDAC A is funded the hope is that they will work together to coordinate analytical tool development and analyses. They would develop things that would complement each other and not overlap functionality. They would be looking at all the tumor types potentially, but in the end everyone needs to work together to get the most out the data that is being generated.

  • Obviously the GDAC B may get to the point that they may want to worry about biological function validation? Is there any thought of using wet labs model systems to do some of the validation work?
    This is not included in the RFA at this time. R01 grants would be the mechanism for dealing with that. Under this grant there is not biological validation.

  • Are the samples identified in the de-identified form provided to the awardees?
    The samples come coded from the tissue source site to the BCR. The BCR then assigns a TCGA ID number. All genomic characterization and clinical data are linked through this number. The tissue source site is the only group that has the ability to link TCGA ID numbers to patient samples, but only to their own patient samples.

  • Is the patient coding only contained in the BCR and not available to the GCC awardees?
    Under HIPAA regulations only the coded data is available to the GCCs and the public. Only the tissue source site that provided the tissue knows the codes that identify their patient samples.

  • Are the subcontracts considered indirect or direct costs for the submitting institution?
    They do become direct costs. This is relevant only if there are caps under this funding opportunity, but there are no caps under this RFA.

  • What is the expected size of grants for the different types of grants? I imagine that data generation grants will have a bigger apportionment of the budget than the processing or the analysis and method development. Yet all speakers highlight the importance of data integration.
    Right now there are no absolute rules, but the goal is 2-4 GCCs and 2-6 GDACs. Respondents will have to justify their budget proposals.

  • Can you speak further on developing new analytical pipeline and the possibility of making the analytical capability available through a GDAC?
    The software for the pipeline as well as the analyses will be deposited to the DCC for download. If you wanted to make the analytical capability available through your center, you would need to include a plan and the cost for that in your application.

  • Can you distinguish further between the GDAC type A and type B? In the RFA it appears that the type ‘A’ analysis were very much biology agnostic, unsupervised analysis that did not take into consideration prior biological knowledge, whereas the type ‘B’s were more biological driven and informed analysis. But at the same time we are hearing that existing tools belong in type ‘A’ and type ‘B’ is meant to fund the development of new tools. So there is a little bit of conflict between those two. If you have a pathway analysis tool that is existing, does that mean that we should be putting that into a type ‘A’ application?
    If you plan to use an existing tool to analyze data, then you should submit an application for type A GDAC. For example, the type of analysis that was done for the GBM paper published by the TCGA research network last year was a fairly direct way of using known pathway analysis. On the other hand if you’re going to use the data and identify new pathways, new interactions, new approaches that add biological significance to the analysis, then that could be a type B GDAC activity. The more linear types of analysis fall into the type A GDAC.

  • So let’s say you want to perform an unsupervised integration of multiple data types to identify categories of cancers for which we don’t have a prior biological understanding. So would that kind of method be more like type ‘B’ method or type ‘A’ method?
    Type ‘A’ method. With the type B GDACs, the NCI is trying to respond to what is perceived as an emerging need in the community to develop new bioinformatics tools and approaches that don’t exist yet. The NCI is trying to move the community towards integrating bioinformatics and computational approaches making them a much more organic part of this process. So in answer to the question specifically, it will depend a great deal on how you decide you want to position this more advanced center. Arguably you could do it starting from the cancer biology perspective and if you have the bioinformatics and computational tools to justify your approach, then that would be an extremely strong application. Another approach could be coming from a computational background or bioinformatics background and you might drive your type B GDAC from that standpoint. In other words, use your expertise in computation and work with a co-PI be from the oncology community. There is an anticipation to bring organizations into TCGA from the computation and bioinformatics community that are not classically from the oncology field. Either application could be quite strong.

  • I understand that you are encouraging partnerships in for-profit/commercial and academic groups. In the area of software, we might be using proprietary software but the guidelines require it to be posted caBIG®. Do you have any thoughts on how to reconcile those potentially competing interests?
    If it is commercially developed, it’s not software that you developed. So its not incumbent upon you to make it caBIG® compatible. If you are using government funds to develop software, then there is an expectation that software would be caBIG® compatible.

  • If you have access to next generation sequencing technology that is not validated in the high through-put format, how would you incorporate that? Could you use it to comparison to validate an existing technology?
    Yes, you would provide information on how it compares to existing technology in efficiency, accuracy, through-put, cost.

  • What if it has not been tested yet and you can’t determine that?
    If you know of a technology that has not been used yet, then it does not fall under the rubric of validated technology. You can submit preliminary data, doesn’t have to be published, to make a case for your technology.

  • How much of a priority for the GDACs will be creating other ways for users to interact with the data, as in a GDAC-specific data portal?
    GDAC As are putting 80% of their efforts towards their goal and GDAC Bs are putting 70% of their efforts towards their goal. Both goals are related to generating a pipeline, performing analyses and making them available to the community. If 80% of A’s goal is to develop the pipeline and make it accessible, then their proposal needs to show how much of their effort is going to be towards supporting that pipeline. If 70% of B’s goal is towards novel discovery, then they have less funds at their discretion to support other activities so they would not have the same capabilities early on to support that. But later on in their funding period maybe GDAC B’s would have the capability to support the pipeline.

  • Is there a plan to have specific portals for the GDAC?
    There is nothing in the RFA about portals for GDACs

  • For Type A, I understand the data processing pipeline should be made available by end of 6 months. For Type B if we are developing and submitting new algorithms and software, is there an expectation on when the first software submission should be made?
    Yes, between 2.5 – 3 years and that is specifically addressed in the RFA. It is ideal for the information to be available as soon as possible, that will be encouraged and viewed as an advantage.

  • Can you clarify the outputs of the GDAC A? Is the development of the pipeline itself an output out of the GDAC A and it will be sent to the BCC? What about e.g. sample annotations or sample cohorts used for analytical experiments – are those also considered outputs?
    Yes to all.

  • Will each GDAC have its own interface to users, or will everything go through DCC and some other portal?
    The DCC will not be hosting/running the analytic software developed by the GDACs; however, the DCC certainly needs to receive the analytical results.

  • Question is about software or analytical pipelines. How much support will be provided by caBIG®?
    There are no specific caBIG® funds available concurrent with this RFA. But caBIG® currently has knowledge centers that are available.

  • If you have PIs from different institutions joining forces, does that count as an application from each institution in terms of the overall count?
    Yes.

  • Can the Biospecimen Core Resource provide cells for experimental interrogation for which DNA and RNA alone is not sufficient? What limitations on cell quantities are there if any?
    Because the NCI doesn’t know what kind of centers there will be, this is a difficult question to answer. At this point sites are asked to provide at least 100 milligrams of tissue. The NCI is trying to go to move to a smaller amount because it is very hard to meet the 100 milligram criteria. The amount of tissue that we get will need to provide DNA and RNA to the characterization centers. If there is anything left over, we may be able to provide some cells. For purposes of this RFA, assume 10% of the starting material will be available for cell based studies.

  • Can a GDAC keep the software and pipeline that they develop on their website and have access routed through the DCC?
    Everything that a GDAC develops has to be deposited to the DCC, however, it is possible that a GDAC could also house a mirror on their website. The GDAC would have to propose a mechanism and work with the NCI to make it work, but theoretically, it is possible to do.

  • If access is through a GDAC, how would the GDAC ensure that a user is an authorized user with approval to access data from the NCI Data Use Committee?
    Again, a GDAC would have to propose a mechanism for how to grant access only to approved users. One possible scenario is for the GDAC to work with the NCI to create a system where the user must log into the DCC in order to gain access to the GDAC’s site. For example, SAML/Shibboleth protocols (see www.incommon.org for more info) could be used to have authentication and authorization managed by the NCI.

  • What level of detail is expected in the design document outline? Is there a template that the NCI would like the GDACs to use or should a standard design document format be used?
    The design document and a design document outline are two different things. When responding to the RFA a GDAC applicant should include a design document outline in a standard format that would, during the beginning months of the award period be turned into a design document in a standard format. The NCI does not have a preferred format. The level of detail for a design document outline would be different for the two types of GDACs. One might expect to have more detail in a GDAC A outline as the tools to be used are existing tools and it really is an exercise in how the tools might be used together to generate the required analysis. For GDAC Bs an outline might be less well defined. Both documents should provide enough information to demonstrate that they have a well thought out plan for how to proceed with the development required under the RFA.

  • Should the design document be part of section N2, a separate section or as an appendix?
    The design document should be included in section.

  • The RFA uses the terminology “MAF format.” Can you clarify what is meant by “MAF format?”
    MAF format is used by at least two groups, UCSC and TCGA. It is unfortunate that the two acronyms mean two different things. UCSC uses “.maf” as an acronym for multiple alignment format, http://genome.ucsc.edu/FAQ/FAQformat#format5. TCGA uses “MAF” to describe mutations detected by the genome sequencing centers. These files contain information such as: Hugo Symbol, Entrez Gene ID, GSC Center, NCBI Build, Chromosome, Start Position, End Position, Etc. For a full list of information provided by a “MAF” file please refer to the supplemental information provided in TCGA’s Nature publication, http://www.nature.com/nature/journal/v455/n7216/abs/nature07385.html.

  • Is the structure outlined in the RFA for GCCs and GDACs flexible or is this pretty much how it will work? Can a center propose a different discovery strategy that would include both a GCC function and GDAC function in a single proposal that would be used to study tumor types for which fewer than 500 cases are available for study?
    The structure as outlined in the RFA is not flexible with regards to how applications must be submitted. If your center wants to perform both GCC and GDAC functions, then the center must submit two applications. The center must also work on the tumor types designated by TCGA.

National Cancer InstituteNational Human Genome Research InstituteNational Institutes of HealthDepartment of Health and Human ServicesFirstGov.gov