Understanding coxph output in R

I am attempting to fit a Cox proportional hazard model to my data. I think I have the formula correct but am having trouble understanding the output. I have tried looked through the documentation and it is hard for me to understand. Any help would be greatly appreciated, thank you!

The formula:

coxfit1 <- coxph(Surv(days, status)~GENE1, data=dataset1)
summary(coxfit1)

Where "days" is days until an event occurred (or last known followup if no event), "status" is an event (recurrence), GENE1 is expression data of a gene that I am testing if it has an effect on recurrence.

The output:

Call:
coxph(formula = Surv(days, status) ~ GENE1, data = dataset1)

n= 34, number of events= 22 

            coef exp(coef) se(coef)     z Pr(>|z|)   
GENE1    0.6370    1.8908   0.2362 2.697  0.00699 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

          exp(coef) exp(-coef) lower .95 upper .95
GENE1        1.891     0.5289      1.19     3.004

Concordance= 0.618  (se = 0.068 )
Rsquare= 0.166   (max possible= 0.98 )
Likelihood ratio test= 6.17  on 1 df,   p=0.01298
Wald test            = 7.27  on 1 df,   p=0.006993
Score (logrank) test = 7.81  on 1 df,   p=0.005198

Now, this is one that is obviously significant, but what do the different parts of this output mean? Where is the hazard ratio??? And which of this information is appropriate for reporting?

Answers


  • exp(coef) is the hazard ratio $\frac{\lambda_T(t\;|\;x+1)}{\lambda_T(t\;|\;x)} = \frac{\lambda_T(t\;|\;\textrm{gene expression})}{\lambda_T(t\;|\;\textrm{no gene expression})} \left[=\exp(\beta) \right]$, where $\lambda_T$ is our hazard function. x is the treatment parameter. E.g. in this example x is given by GENE1, being 1 for samples that express the gene and 0 for samples that do not express the gene.
  • exp(-coef) is therefore the (inverse) hazard ratio $\frac{\lambda_T(t\;|\;\textrm{no gene expression})}{\lambda_T(t\;|\;\textrm{gene expression})}$
  • coef is this estimated coefficient $\hat \beta$ from the model (see below).
  • se(coef) is the standard error $\sqrt{\mathrm{Var}(\hat \beta)}$
  • z is the z-score $\frac{\textrm{coeff}}{\textrm{se(coeff)}}$ (how many standard errors is $\hat \beta$ away from $0$)
  • Pr(>|z|) the propability that the estimated $\hat \beta$ could be $0$.

  • lower .95 and upper .95 are the 95%-confidence interval for the estimated hazard ratio exp(coef)

  • Then there are different test scores, which I'm unfortunately not versed enough on.

Some details on the model

The cox model is a linear transformation model of the form $\mathbb{P}(T\le t \;|\; x) = \exp\left(-\exp\left(g(t)+\tilde x^{T}\beta\right)\right) $ where $g(t)$ is an unspecified linear transformation function.

The cool thing is that this unknown $g(t)$ goes into a baseline hazard $\lambda_0(t)$ which is independent of $\beta$. This allows us to estimate the optimal parameter $\hat \beta$ independent of the baseline hazard. (Like we're only interested in the hazard ratio but not in the absolute values)

Leaving out calculations, the hazard function has the form: $\lambda_T(t) = \lambda_0(t) \cdot \exp(\tilde x^T\beta)$ and in order to estimate $\hat \beta$ we take $\lambda_0$ as piecewise constant (changes only when an event happens) and minimize the log-likelihood.

I hope this helps other people for future reference.


Need Your Help

PHP / MySQL price matrix form validation

php mysql validation mysqli

Im sure the solution is fairly simple, my brain however doesn’t appear to be in a right state of mind today. I have a MySQL table storing a pricing matrix of my products.

In python, what is the fastest way to determine if a string is an email or an integer?

python string integer

I'd like to be able to pull users from a database using either a supplied e-mail address or the user id (an integer). To do this, I have to detect if the supplied string is an integer, or an e-mai...