Monday, December 21, 2009

Normalizing Affymetrix Data with bioconductor


#!/usr/bin/env Rscript
# Normalize a set of CEL files

## Note: To use biocLite packages (i.e. 'affy,' 'rma()', ...), it may be
# necessary to do a one-time [re-]install of the related packages. This can be
# done with the following sequence:

# > source("http://bioconductor.org/biocLite.R"); # Tell R where to download package info from
# > biocLite(); # Update biocLite package grounds. This turns out to be necessary 1-time step before a call to rma(...), which requires up-to-date biocLite library functions
# > biocLite("affy"); # d/l, install 'affy' package (1-time req. only)

library("affy");
# We'll be using this library

# Read CEL files
filenames <- list.files("path/to/CEL", full.names=TRUE);
affy.data <- ReadAffy(filenames=filenames, verbose=TRUE);

## Normalize using RMA
write.table( exprs( rma(affy.data, verbose=TRUE)), file="rma.matrix", sep='\t', quote=FALSE); # file="" to write to stdout, but watch for verbose output from rma(...) and/or ReadAffy(...)

## Using Quantile Norm (this post is edited--the old way didn't collapse measurements per probe)
library("affyPLM")
eset <- threestep (affy.data, background.method="IdealMM", normalize.method="quantile", summary.method="tukey.biweight" )
write.table(eset, file="qnorm.matrix", sep='\t', quote=FALSE)
# for some reason, this transposes the data and adds an "X" in front of every numeric probe ID?

1 comment:

  1. Depending on the contents of the CEL file(s), you may need to reinstall some packages; e.g.,

    biocLite('hgu133acdf')

    ReplyDelete

Followers