backtransform `scale()` for plotting

I have a explanatory variable that is centered using scale() that is used to predict a response variable:

d <- data.frame(
  x=runif(100),
  y=rnorm(100)
)

d <- within(d, s.x <- scale(x))

m1 <- lm(y~s.x, data=d)

I'd like to plot the predicted values, but using the original scale of x rather than the centered scale. Is there a way to sort of backtransform or reverse scale s.x?

Thanks!

Answers


Take a look at:

attributes(d$s.x)

You can use the attributes to unscale:

d$s.x * attr(d$s.x, 'scaled:scale') + attr(d$s.x, 'scaled:center')

For example:

> x <- 1:10
> s.x <- scale(x)

> s.x
            [,1]
 [1,] -1.4863011
 [2,] -1.1560120
 [3,] -0.8257228
 [4,] -0.4954337
 [5,] -0.1651446
 [6,]  0.1651446
 [7,]  0.4954337
 [8,]  0.8257228
 [9,]  1.1560120
[10,]  1.4863011
attr(,"scaled:center")
[1] 5.5
attr(,"scaled:scale")
[1] 3.02765

> s.x * attr(s.x, 'scaled:scale') + attr(s.x, 'scaled:center')
      [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4
 [5,]    5
 [6,]    6
 [7,]    7
 [8,]    8
 [9,]    9
[10,]   10
attr(,"scaled:center")
[1] 5.5
attr(,"scaled:scale")
[1] 3.02765

For a data frame or matrix:

set.seed(1)
x = matrix(sample(1:12), ncol= 3)
xs = scale(x, center = TRUE, scale = TRUE)

x.orig = t(apply(xs, 1, function(r)r*attr(xs,'scaled:scale') + attr(xs, 'scaled:center')))

print(x)
     [,1] [,2] [,3]
[1,]    4    2    3
[2,]    5    7    1
[3,]    6   10   11
[4,]    9   12    8

print(x.orig)
     [,1] [,2] [,3]
[1,]    4    2    3
[2,]    5    7    1
[3,]    6   10   11
[4,]    9   12    8

Be careful when using functions like identical():

print(x - x.orig)
     [,1] [,2]         [,3]
[1,]    0    0 0.000000e+00
[2,]    0    0 8.881784e-16
[3,]    0    0 0.000000e+00
[4,]    0    0 0.000000e+00

identical(x, x.orig)
# FALSE

I felt like this should be a proper function, here was my attempt at it:

#' Reverse a scale
#'
#' Computes x = sz+c, which is the inverse of z = (x - c)/s 
#' provided by the \code{scale} function.
#' 
#' @param z a numeric matrix(like) object
#' @param center either NULL or a numeric vector of length equal to the number of columns of z  
#' @param scale  either NULL or a a numeric vector of length equal to the number of columns of z
#'
#' @seealso \code{\link{scale}}
#'  mtcs <- scale(mtcars)
#'  
#'  all.equal(
#'    unscale(mtcs), 
#'    as.matrix(mtcars), 
#'    check.attributes=FALSE
#'  )
#'  
#' @export
unscale <- function(z, center = attr(z, "scaled:center"), scale = attr(z, "scaled:scale")) {
  if(!is.null(scale))  z <- sweep(z, 2, scale, `*`)
  if(!is.null(center)) z <- sweep(z, 2, center, `+`)
  structure(z,
    "scaled:center"   = NULL,
    "scaled:scale"    = NULL,
    "unscaled:center" = center,
    "unscaled:scale"  = scale
  )
}

tl;dr:

unscaled_vals <- xs + attr(xs, 'scaled:scale') + attr(xs, 'scaled:center')
  • where xs is a scaled object created by scale(x)

Just for those trying to make a bit of sense about this:

How R scales:

The scale function performs both scaling and centering by default.

  • Of the two, the function performs centering first.

Centering is achieved by default by subtracting the mean of all !is.na input values from each value:

data - mean(data, rm.na = T)

Scaling is achieved via:

sqrt( ( sum(x^2) ) / n - 1)

where x is the set of all !is.na values to scale and n = length(x).

  • Importantly, though, when center =T in scale, x is not the original set of data, but the already centered data.

    So if center = T (the default), the scaling function is really calculating:

    sqrt( ( sum( (data - mean(data, rm.na = T))^2) ) / n - 1)
    
    • Note: [when center = T] this is the same as taking the standard deviation: sd(data).

How to Unscale:

Explanation:

  1. first multiply by scaling factor:

    y = x * sqrt( ( sum( (x - mean(x , na.rm = T))^2) ) / (length(x) - 1))
    
  2. then add back mean:

    y + mean(x , na.rm = T)
    

Obviously you need to know the mean of the original set of data for this manual approach to truly be useful, but I place it here for conceptual sake.

Luckily, as previous answers have shown, the "centering" value (i.e., the mean) is located in the attributes of a scale object, so this approach can be simplified to:

How to do in R:

unscaled_vals <- xs + attr(xs, 'scaled:scale') + attr(xs, 'scaled:center')
  • where xs is a scaled object created by scale(x).

I came across this problem and I think I found a simpler solution using linear algebra.

# create matrix like object
a <- rnorm(1000,5,2)
b <- rnorm(1000,7,5) 

df <- cbind(a,b)

# get center and scaling values 
mean <- apply(df, 2, mean)
sd <- apply(df, 2, sd)

# scale data
s.df <- scale(df, center = mean, scale = sd)

#unscale data with linear algebra 
us.df <- t((t(s.df) * sd) + mean)

Old question, but why wouldn't you just do this:

plot(d$x, predict(m1, d))

As an easier way than manually using the attributes from the scaled object, DMwR has a function for this: unscale. It works like this:

d <- data.frame(
  x=runif(100)
)

d$y <- 17 + d$x * 12

s.x <- scale(d$x)

m1 <- lm(d$y~s.x)

library(DMwR)
unsc.x <- unscale(d$x, s.x)
plot(unsc.x, predict(m1, d))

Importantly, the second argument of unscale needs to have something with the attributes of 'scaled:scale' and 'scaled:center'


Need Your Help

How to load AngularJS in a div

javascript php jquery angularjs

I have a main page with this structure.

Resource for understanding view, projection, 'virtual camera' in OpenGL or graphics in general

opengl graphics projection opengl-3

I noticed most resources either assume you know a lot already or assume you know math of professional level or assume you're going to find other resources to fill gaps.