
Tailorx @ MindSay 
A very large clinical trial (~10,000 women) is underway to assess the usefulness of the Oncotype Dx gene profile for selecting treatment for early stage estrogen receptor positive breast cancer: the TAILORx Trial. The trial sponsors will ask women with breast cancer to undergo Oncotype Dx testing (for which they will be billed $3650) and then assign them to three risk groups based on their recurrence score. (The groups are defined using different cut-off’s for this trial than those used for the study done on the NSABP B-20 tumors.) Women in the low risk score group will get hormonal treatment (tamoxifen or one of the newer “aromatase inhibitor” class of drugs). Women in the high risk group will get chemotherapy and hormone therapy. The women in the intermediate group will be randomly assigned to receive either hormone treatment alone or a combination of hormone and chemotherapy.
A least four concerns make this trial seem premature. First, the chemotherapeutic "validation" of Oncotype Dx is based on one small, old chemotherapy trial. Second, TAILORx does not incorporate many other old and new prognostic indicators that might trump Oncotype Dx in some circumstances. Third, not enough information exists about how the Oncotype Dx score might interact in a confounding way with specific hormone treatments, chemotherapy drugs, and regimens. Finally, far too many non-random, non-blinded choices are afforded to patients and physicians in TAILORx to make for decent "science".
TAILORx is based on a retrospective analysis of a small, non-random sample of assays performed on preserved tissue. The ability to use this old material to obtain reproducible gene profiles is an amazing technological feat, to be sure. But only 651 of about 2300 patient results were available. The chemotherapy used in the NSABP B-20 dates to the 1970's. Substantial evidence supports a conclusion that these regimens improve prognosis for many patients. But two of the agents (methothrexate and 5-fu) are not often employed today. The third agent, cytoxan, has lost popularity over concerns that it may lead to late complications like second malignancies. Meta-analysis of the many thousands of patients treated with chemotherapy indicates that the B-20 drugs are not as effective as newer drugs like the taxols and adriamyin.
So inspite of the fact that the paper describing the results for the 651 patients uses the word "prospective" five times, it is actually a retrospective survey, not a clinical trial. This might matter for a woman who is enrolled in TAILORx because she cannot be sure that the Oncoype Dx group studied really is representative of the group she is in. And remember that much of the power of the argument for Oncotype Dx is contained in the one subgroup that contained only 47 patients.
A second concern revolves around the lack of consideration of many other older and newer prognostic variables when deciding whether a patient falls into "Group 2" (the intermediate risk group) as defined here:
Group 2 (Primary study group; ODRS 11-25): Patients are stratified according to tumor size (≤ 2.0 cm vs ≥ 2.1 cm), menopausal status (postmenopausal vs premenopausal vs perimenopausal), planned chemotherapy (taxane-containing [i.e., paclitaxel, docetaxel] vs nontaxane-containing), and planned radiotherapy (whole breast with no boost planned vs whole breast with boost planned vs partial breast irradiation planned vs no planned radiation therapy [for patients who have had a mastectomy]). Patients are then randomized to receive either hormonal therapy alone or combination chemotherapy and hormonal therapy.
For example, tumor size, tumor dna ploidy, quantitative level of estrogen receptor, tumor lymphovascular invasion, per cent of dividing cells ("s phase"), Ki 67 expression and many other features and meaurements have been used to stratify patients for risk of treatment failure. (Some of these features can make a woman with a very small tumor (<1 cm) eligible for inclusion in the trial, an acknowledgement of their possible importance). And of course, clinical features on presentation may predict risk of failure. Mammographically discovered cancers may have a different prognosis compared to those discovered by the patient feeling a mass, for example.
Under TAILORx rules, a 48 year old woman with a palpable 4.9 cm cancer that is high grade with lymphatic invasion, weakly estrogen receptor positive, Ki 67 expression and an S phase of 20 and a Oncotype Dx score of 23 (intermediate risk) could be randomly assigned to tamoxifen alone.
Some will argue that the hypothetical patient described above would be unlikely to have an Oncotype Dx score as low as 23. (The B-20 Oncotype Dx analysis used 18 to 31 to define the intermediate risk group. The range was changed to 11-25 for TAILORx, possibly because her2+ patients are excluded - more on this below.) This is probably true, since the Oncotype Dx score does correlate to some extent with many "traditional" indicators. However, if the above hypothetical patient exists, few cancer physicians would advise tamoxifen alone. This would result in one of two outcomes for our hypothetical patient.
First, the woman would not likely be offered the TAILORx trial. Or, if she were offered the trial and the randomization resulted in tamoxifen alone, she would be advised to withdraw from the trial and receive chemotherapy. Either way, the results from TAILORx will be compromised. If patients on the "edges" of the "groups" are manipulated to get the "right" treatment, the trial results will be of little use, because the "borderline" cases are the very ones that present the most difficult decisions, and the Will Rogers phenomenon will be in play to distort the results.
A third concern that may make TAILORx premature is the paucity of information about how the Oncotype Dx predictive power might be affected by the choice of hormone and chemotherapy agents. The TAILORx protocol allows each physician to choose a chemotherapy regimen and/or hormone agent. Some might still choose the B-20 CMF and tamoxifen regimen for patients perceived to be at lower risk (within the intermediate group), while others might choose "dose dense" taxol containing aggressive treatment for a patient like the hypothetical one described above. There are many reasons why this may lead to errors.
The gene for glutathione S-transferase (GST) GSTM1 is one of the 16 predictor genes in Oncotype Dx. The presence of this gene tends to improve prognosis. The problem is that this gene also affects the metabolism of some cancer drugs. Anticancer drugs that have been shown to be substrates for GSTs are, for example, chlorambucil, melphalan, cytoxan metabolites, and steroids. Indirect evidence for a role of GSTs in modulating drug effects through deactivation of drug-generated hydroperoxides or other reactive oxygene species exists for adriamycin, mitomycin C, carboplatin, and cisplatin, but not taxol. Some have postulated for other malignancies like acute leukemia in children that gstm1 confers a favorable prognosis because it changes chemotherapy metabolism. Oncotype Dx does not identify which patients have one copy of GSTM1 (a null polymorphism) and which have two copies. Remember B-20 chemotherapy often included cytoxan but never taxol.
Another gene in Oncotype Dx is her2. Patients with tumors that express her2 are exluded from TAILORx. This gene is a prototype marker for chemotherapy selection since the drug Herceptin works very well when it is expressed and not at all if it isn't. B-20 contained a number of her2 positive patients and herceptin was not available in that era. So the Oncotype Dx used in TAILORx is a different one than the one tenuously "validated" in B-20. If the her2 gene is a totally independent predictor, the change might not matter. But her2 does interact with other cancer genes, for example the Src family. Src is a family of proto-oncogenic tyrosine kinases originally discovered by J. Michael Bishop and Harold E. Varmus, for which they won the Nobel Prize.
Bag1 is another gene in Oncotype Dx and it interacts with the estrogen receptor (alpa) mechanisms which can control cancer growth. Does it interact with tamoxifen (studied in B-20) the same way as the aromatase inhibitors included in TAILORx (but not in B-20)? BAG1 seems to predict respone to tamoxifen but how it predicts benefits from other hormones or chemotherapy is less clear.
TAILORx is constructed on a tenuous foundation based on a very small number of observations (remember the 47 patieint subgroup) and on many exrapolations based on assumptions. Some of these may seem niggling, such as the fact that B-20 used age 70 and tumor size 4cm as cut offs and TAILORx uses 75 and 5 cm. Yet B-20 found a suggestion that age and chemotherapy effectivenes interact and that tumor size affects prognosis.
And, one more thing: how can "partial breast radiation" be allowed in a non-random way? This would imply that partial breast radiation has become an acceptable "standard of care". Where are the appropriately powered randomized trials that support this implication? Don't be fooled into thinking that an analysis of this non-randomly assigned "stratification" can answer any useful question.
Stratification (i.e., prospective randomization within smaller, rigidly predefined clinical subroups) is often employed in clinical trials (though purists might argue that this is unnecessary in trials with a large number of outcome events). But TAILORx is not stratified by rigidly predefined criteria. Rather it will permit thousands of patients and physicians to choose among a rich buffet of treatment options related to local therapy ((type of node sampling (any type among many permitted), type of mastectomy (any type among many), lumpectomy (any type among many), radiation (type of radiation, partial/whole radiation), chemotherapy (literally thousands of combinations and permutations of drugs and doses) and hormones (serms and aromatase inhibitors).)
Stratification might make sense if it were based on close to totally objective (not really ever possible in the real world) and independent classifications. Stratification makes no sense if it based on thousands of differing views and biases related to the risks and benefits of a multitude of differing therapies. Those decisions will be largely based on the very risk assessment conderations that the TAILORx trial is supposed to answer.
"Although technological advances will further improve our understanding of breast cancer and will contribute to tailoring treatment to the individual patient, our experience with adjuvant CMF over 30 years confirms that the effects of such a regimen are long lasting and may benefit patients with favourable and unfavourable prognostic indicators, at the cost of minimal long term sequelae." This is how Dr. Bonadonna himself described in 2005 (emphasis added) results from the chemotherapy regimen that is still often called "Bonadonna CMF". "Tailoring treatment" is the holy grail but TAILORx is designed too clumbsily to trump 30 years of better designed clinical trials.
Monks from the Order of the Brothers of the Statistic will study the scripture that flows from TAILORx and will be able to devine all the potential biases and confounders by utililizing probabilistic testing based on dubious underpinnigs that may well result in some very "significant" and small "p" numbers. Cultists who worship at that particular altar of "evidence based" medicine will travel about with their power point slides of life table graphs and p values carried to the fourth decimal point, Genomic Health will turn a profit for the first time and its shares will skyrocket, and another level of the temple known to heretics as the "House of Cards" will have been constructed.
Or maybe the results will look so powerful that even a skeptic like me will be convinced (or fooled).
TAILORx may be another example of the technological imperative in action. TAILORx reflects the fervent need of patients and doctors for a simple "black box" method for making difficult choices. The admonition of H.L. Mencken bears remembering: "For every complex problem, there is a solution that is simple, neat, and wrong."
