PROBE ARRAY DESCRIPTIONS
Human Genome |
|
U95Av2 |
|
|
U95B-Ev2 |
The Human Genome U95 Set is composed of 5 arrays and represents
approximately 60,000 full-length genes and EST clusters.The sequences
represented are derived from sequence clusters in Build 95 of the
UniGene database (sequences in UniGene Build 95 are from GenBank
113 and dbEST/10-02-99).UniGene clusters are represented by one
or more consensus sequences derived directly from cluster members. |
|
HG-U133 A-B |
The HG-U133 Set is comprised of two microarrays which represent
greater than 33,000 of the best characterized human genes. The
HG-U133 Array represents the vast majority of the genes featured
on its predecessor, the HG-U95Av2 Array.Similarly, a large percentage
of the HG-U95B, HG-U95C, HG-U95D and HG-U95E arrays is represented
on the HG-U133 Set |
Yeast Genome |
|
S98 Array |
This probe array offers complete gene expression monitoring capability
for the entire yeast genome on a single array.Approximately 6,400
well-recognized Open Reading Frames of the yeast.Saccharomyces cerevisiae are represented. |
Rat Genome
|
U34A or B or C
|
This set of 3 GeneChip probe arrays contains probes interrogating
approximately 24,000 mRNA transcripts and EST clusters from Build#34
of the UniGene database.The A array of the Rat Genome U34 Set analyzes
approximately 7,000 full-length sequences and approximately 1,000
EST clusters. The B and C arrays each analyze approximately 8,000
EST clusters.All sequences analyzed by the Rat Toxicology U34 Array
are also analyzed by the A array of this set. |
|
Rat Toxicology Probe Array |
This GeneChip probe array interrogates approximately 850 mRNA
transcripts and EST clusters from Build#34 of the UniGene database.All
sequences represented on the Rat Toxicology U34 Array are also
represented on the Rat Genome U34A Array |
Rat Neurobiology
|
|
U34 |
This probe array contains over 1,200 sequences relevant to neurobiology
research.Affymetrix teamed up with neurobiology researchers from
both academic and pharmaceutical laboratories to select the sequences. |
Mouse Genome |
|
U74Av2 |
The Murine Genome U74 Set contains probe sets for approximately
6,000 full-length genes and approximately 30,000 EST clusters.All
full-length genes are on the A array of the set.In addition, all
representation of the Mu11K Set is also on the A array.Representation
of the Mu19K Set is split between the A and the B arrays. |
Arabidopsis Genome |
|
|
Sequences used to develop this array were obtained from GenBank
in collaboration with Novartis Agriculture Discovery Institute
(NADI).Eighty percent of the genes represented on the GeneChip
Arabidopsis Genome Array are predicted coding sequences from genomic
BAC entries.Twenty percent are high quality cDNA sequences.The
array also contains approximately EST clusters, sharing homology
with the predicted coding sequences from BAC clones. |
|
E.coli Genome
|
|
|
The E.coli Genome Array contains probes for more than 4,200 known
Open Reading Frames of E.coli.Sequence information for probes
on the array corresponds to the M54 version of E.coli Genome
Project database at the University of Wisconsin.Sequences corresponding
to both characterized genes and those whose function is unknown
can also be identified according to the numbering convention used
by the E.coli Genome Project database. |
Drosophila Genome |
|
|
Sequences used to develop this array are accessible through FlyBase.Greater
than 13,500 gene sequences predicted from the annotation of the
Drosophila genome are represented on the array.This includes genes
for which confirming EST or full-length cDNA evidence is available
(over 8,000 genes have at least 1 EST/cDNA match). |

DEFINITIONS OF BASIC TERMS
- Array: A collection of probes on
glass that make up a Genechip.
- Comparative Analysis: The analysis
of an experimental array to a baseline array.
- Feature: A collection of probes
in a defined area called a “cell”.Each probe cell contains millions
of copies of a specific nucleotide probe and is a defined square
area on the array.
- Homomeric Mismatch:A mismatch that
is complementary of the base changed (i.e., A®T)
- Hybridization Spikes: Controls
that added to the sample before cDNA synthesis and hybridization
reactions.
- Metrics:The calculated answer of
mathematical equations used by the GeneChip algorithm software.
- Probe: A single stranded DNA oligonucleotide
complementary to a specific sequence(usually 25 bases long)
- Probe Cell: A single square-shaped
feature on an array containing one type of probe.Each probe cell
contains millions of probe molecules Perfect Match: (PM) Probes
that are designed to be complementary to a reference sequence
- Mismatch: (MM) Probes that are
designed to be complementary to the reference sequence except
for a homomeric base mismatch at the central (13th)
position. Mismatch
probes serve as a control for cross-hybridization
- Probe Pair: Two probe cells, a
PM and its corresponding MM.On the probe array, a probe pair
is arranged with a PM cell directly above the MM cell.
- Probe Set: A set of probes designed
to detect one transcript.A probe set usually consists of 16-20
probe pairs.For example, a 20 probe pair set is made up of 20
PM and 20 MM for a total of 40 probe cells.
- SAPE: Streptavidin-phycoerythin
dye used to bind to the biotinylated nucleotide incorporated
into the transcript of interest during the in vitroitranscription
(IVT) reaction.
- Target: Fragmented, biotinylated
anti-sense cRNA prepared from mRNA to be analyzed.Target molecules
are hybridized to the probe array, and the levels of hybridization
are measured with the HP GeneArray scanner after the array is
stained with streptavidin-phycoerythin(SAPE)
- Probe Array Tiling: T probe cells
into probe sets and probe pairs.
|
 |
RUNNING A COMPARISON ANALYSIS
Note: A comparison analysis examines
the hybridization intensity data (*.cel file) from an experiment
and baseline probe array (of the same probe array type) and identifies
relative changes in the expression level of each transcript represented
on the arrays. An absolute analysis of the baseline
must be run before the comparison analysis of the experiment and
baseline.
- Select “file” from the menu
bar
- Select “open” from the GeneChip
menu and open the desired experiment data file from the *.dat
files listed in the Open dialog box.
- Select “run” from the menu
bar
- Select “analysis” from the GeneChip menu
The “Save Results As” dialog box automatically
open and displays the analysis output file (*.chp) default
name, which is the same as the experiment name specified during
experiment set up.
- Enter a new name (the name you want your comparison file to
be).
- Select “OK”. The Expression Call Settings dialog box automatically
opens.
- Select the “scaling” tab from the expression call settings
window. Make sure that “all probe sets” is selected and that “target
signal” is set to 500. Also, under “algorithm parameters” both “prompt
for output filename” and display settings when analyzing data” should
be selected.
- Select “normalization” tab. Make sure “user defined” is chosen
and that “normalization value” is one. Also, under “algorithm
parameters” both “prompt for output filename” and “display settings
when analyzing data” should be selected.
- Select “probe mask” tab. When running a comparison analysis
you can select different types of default probe mask as well
as design your own. However, until you become familiar with the
software we recommend that “NO” mask be selected at this time.
Both, “prompt for output filename” and “display settings when
analyzing data” should be selected
- Select the “baseline” tab, place a check mark next to “Use
Baseline Comparison File”
- Select “browse”, select the baseline file from the Baseline
Comparison File dialog box, and double-click the appropriate
file name.
- Select “OK”.The Comparison will take a few minutes.After the
analysis is finished, the Expression Analysis window (EAW) opens
and displays the output analysis results. If the EAW is already
open, the results are added to the open window and it may be
necessary to scroll down to see the newly added results.
SETTING UP DATA TABLE FOR A COMPARISON ANALYSIS
- Select “analysis” from the menu bar (must be in Expression
Analysis Window)
- Select “options”
- Select the “pivot” tab
- Select the following under the Statistical Comparison Results:
Stat Common Pairs
Signal Log Ratio
Signal Log Ratio Low
Signal Log Ratio High
Change
Change p-value SAVE “.CHP” FILE AS A MICROSOFT EXCEL DOCUMENT
- Select “File” from the menu bar.
- Choose “save as”.
- Give Pivotdata file a Name
- Press “save” button
- Open Microsoft Excel Program
- Open the appropriate file (Text File) You may have to change
the file type to show all files.
- A delimiting “wizard” will start, simply press the “next” button
to input the data into the spreadsheet.
- Press “next” button
- Press “finish” button
- Adjust columns.
- Press “X” button in the upper right hand corner of Microsoft
Excel Window
- Press “yes” button
- Select “Save AS Type”
- Choose “Microsoft Excel97& 5.0/98 Workbook.
- Press “save” button.
- Delete .txt file once the Microsoft Excel File has been created.
SORTING THROUGH A COMPARISON ANALYSIS DATA FILE
-
Remove all “NC” from
report. All other probe sets remain and is sorted “increase” to “decrease”
|
|
Expression Report Parameters
- Probe Pair Threshold:The minimum
number of probe pairs a probe set must have in order for the
probe set data to be included in the calculation of the report
statistics.
- Alpha1:The significance level
for the detection p-value in an absolute analysis.Alpha1 is user-modifiable
parameter that is set in the Parameters tab of the Expression
Analysis Settings.If the probe set detection p-value < alpha1,
the call is present.Default = 0.04
- Alpha2:The second significance
level for the detection p-value in an absolute analysis.Alpha2
is a user-modifiable parameter set in the Parameters tab of the
Expression Analysis Settings.
- If the probe set detection p-value ³ alpha2, the call is absent.If alpha1 £ detection p-value < alpha2, the call is marginal.Default = 0.06
- Tau:Tau is a user-modifiable
parameter that is set in the Parameters tab of the Expression
Analysis Settings.Ideally, tau should be set to a value that
is a little larger than the median of the discrimination scores
of the probe sets whose targets are absent to avoid false detected
calls.Default = 0.015
- Noise (Raw Q):The degree of
pixel-to-pixel variation among the probe cells used to calculate
the background.
- Scale Factor:The scale factor
specified in the Scaling tab of the Expression Analysis Settings
dialog box or computed by the algorithm.
- TGT Value:The user-specified
target signal for scaling of the experiment probe array.The target
signal is set in the Scaling tab of the Expression Analysis Settings
dialog box.
- Norm Factor (NF):The normalization
factor specified in the Normalization tab of the Expression Analysis
Settings dialog box or computed by the algorithm.
- Gamma1 H:The small significance
level for the change calls at high intensities.Gamma1 H is a
user-modifiable parameter that is set in the Parameters tab of
the Expression Analysis Settings.Default = 0.0025
- Gamma2 H:The large significance
level for the change calls at high intensities.Gamma2 H is a
user-modifiable parameter that is set in the Parameters tab of
the Expression Analysis Settings.Default = 0.003
-
Gamma1 L:The small significance level
for the change calls at low intensities.Gamma2 Lis a user-modifiable parameter
that is set in the Parameters tab of the Expression Analysis Settings. Default
= 0.0025
- amma2 L:The large significance
level for change calls at low intensities.Gamma2 L is a user-modifiable
parameter that is set in the Parameters tab of the Expression
Analysis Settings.Default = 0.003
- Perturbation:A user-modifiable
expression algorithm parameter that is set in the parameters
tab of the Expression Analysis Settings.Perturbation influences
the p-value computed for a probe set in a comparison analysis.Default
= 1.1
- Baseline Noise (Raw Q):The
degree of pixel-to-pixel variation among the probe cells used
to calculate the background in the baseline probe array.
- Baseline Scale Factor (SF):The
scale factor specified for the baseline probe array in the Scaling
tab of the Expression Analysis Settings dialog box or computed
by the algorithm.
- Background:Minimum, maximum,
average, and standard deviation of the background intensity calculated
for the probe array.
- Noise:The minimum, maximum,
average, and standard deviation of the noise calculated for the
probe array.
- Corner +:The average cell intensity
for the sense probe cells used in the grid alignment process.
- Corner -:The average cell intensity
for the antisense probe cells used in the grid alignment process.
- Central +: The average cell
intensity for the nine probe cells that compose the cross at
the center of a sense probe array.
- Central -:The average cell
intensity for the nine probe cells that compose the cross at
the center of the antisense probe array.
- Total Probe Sets:The number
of probe sets on the array that exceed the probe pair threshold
and are not called No Call.
- Average Signal:The average
signal for all probe sets that exceed the probe pair threshold
and are not called No Call.
- Controls:The expression report
includes the signal and call data for the probe sets that correspond
to the housekeeping or spike control transcripts.Separate signal
and call data are reported for the probe pairs specific to the
5', middle (M'), and 3' regions of the control transcripts.
- Sig(all):The average signal
for all control probe sets.
- Sig(3'/5'):For a probe set,
Sig(3')/Sig(5').
|
|
Description of File Types
AFFYMETRIX DATA FILES
| *.exp |
experimental information file.Information about
experiment name, sample and probe array are stored in this
file.The experiment name then becomes the file name for subsequent
files generated in the analysis. |
| *.dat |
data file.The image of the scanned probe array
is stored in this file. |
| *.cel |
cell intensity file.The software derives the
*.cel file from a *.dat file and automatically creates it upon
opening a *.dat file.It contains a single intensity value for
each probe cell delineated by the grid (calculated by the Cell
Analysis algorithm). |
| *.chp |
analysis output file (chip file).This is the
output generated by the analysis of a *.dat or *.cel file. |
| *.rpt |
output file (report file).The report generated
from the analysis output file *.chip). Expression Report Parameters. |
PROBE INFORMATION FILES
|
*.cif |
chip information file.Contains grid size and parameters
for analysis and scanner settings.DO NOT change
any information in this file |
|
*.cdf |
chip description file.All 50mm arrays have
an encrypted .cdf file, which contains names and coordinated
for each gene represented on the chip.All 24mm arrays have
unencrypted, probeless .cdf files. |
| *.msk |
mask file.This is a user defined file, which permits the
user to select a subset of probes from analysis or a subset
of genes for normalization and/or scaling. |
|
|
Gene expression is the process by
which messenger RNA (mRNA) and subsequently protein is synthesized
from the DNA template of each gene. Although protein concentrations
ultimately dictate the functional state of the cell, mRNA levels
serve as a readily accessible intermediate by which gene expression
can be monitored. For most genes, steady state mRNA levels approximate
protein levels and therefore quantitation of mRNA levels provide
important clues with regards to cellular processes. Expression
levels of genes are altered by a combination of environmental and
genetic factors. The hypothesis that these factors and ultimately
the resulting changes in gene expression levels dictate the occurrence
of human diseases has generated much interest in the area of functional
genomics. As a result of the need for high throughput analysis of
gene expression, microarray technology has emerged as the foremost
technology utilized in functional genomics.
Microarray assays borrow traditional hybridization techniques utilized
on flexible membranes and apply them to a solid surface such as glass. Non-porous
surfaces allow for the deposition of small amounts of biochemical material in
a precise location, thus providing highly dense arrays. It is this miniaturization
of the array that allows for high throughput gene expression analysis of hundreds
of genes in a single experiment. In addition to the change in surface compared
to traditional methods, fluorescently labeled probes have replaced radioactive
labeling.
Fabrication
Although several technologies for printing arrays have been developed,
the core utilized contact printing. Microspotting spins free-floating in
a printhead, which is affixed to a robot capable of XYZ motion, uptake
material from a source plate (0.25ml) and deposit it on multiple glass
slides (0.6nl).
Biochemical reactions
Total RNA is isolated from a biological source and is used to generate
cDNA. Oligo dT primed-reverse transcription reactions incorporate fluorescently
labeled nucleotides into the synthesized cDNA.This labeled antisense target
can now be hybridized to sense probes, which have been affixed to a substrate
as described above.
Detection and data analysis
The hybridized array is subsequently washed and scanned on a confocal laser
scanner. The laser emits light in the excitation range of the fluorescent
dye, which in turn emits light at its specific emission range. The
resulting light is capture by a photomultiplier tube (PMT) and converted
into an electronic signal, which can be quantitated to generate a corresponding
signal intensity. |
|
| |
Overview of Gene Chip Technology
GeneChip hybridization technology makes it possible
to assess the relative mRNA expression level of thousands of genes
and Expressed Sequence Tags(ESTs) simultaneously. Various
types of GeneChips are currently available and more are being developed. Human
gene arrays, as well as comparable murine, rat, yeast and E.coli
are available (see below for current chips we can provide). GeneChip
Array technology is best used for comparative studies and has been
successfully employed in genomics discovery programs to highlight
differences in gene expression patterns between normal and cancer
cells, to detect polymorphisms, facilitate genotyping and to assist
in disease management. Quantitative and reproducible
detection of transcripts over a wide range of mRNA expression levels
is possible.
Definitions of Basic Terms |
|
|
Description of the Methodology
Thousands of target genes are probed by single stranded oligonucleotides
constructed on the GeneChips. Each gene is represented
on the array by multiple probe pairs, ranging form 12-16, depending
on which array is used. Each probe pair consists of a perfect
match oligonucleotide probe and a single base mismatch oligonucleotide. The
difference in hybridized signal between members of the probe
pair is used to identify non-specific hybridization and background
signal. Probes are chosen from unique regions at the 3'-end
of the gene, which allows for detection of individual transcripts
within gene families. Each gene array also contains probes
for reference genes, which may be spiked into the RNA samples as
standards, allowing for comparisons between experiments. For
a description of Affymetrix GeneChips, click
here.
|
|