Core Overview Custom Arrays Affymetrix Gene Chips Sequencing Forms Equipment Page Navigator
FAQ-Gene Chip FAQ-Custom Array
Can you give me an overview of the Technology? What is the purpose of a custom array?
Can you give me a description of the Methodology? Understanding GenePix output
Where can I find probe array descriptions?
Can I have a copy of the protocol used? (large pdf file)
What are the definitions of basic terms?  
How do I run a comparison analysis?  
How do I run a absolute analysis?  
What are the expression report parameters?   
What are the different file types I am receiving?

PROBE ARRAY DESCRIPTIONS

Rat Genome
Human Genome

U95Av2

U95B-Ev2

The Human Genome U95 Set is composed of 5 arrays and represents approximately 60,000 full-length genes and EST clusters.The sequences represented are derived from sequence clusters in Build 95 of the UniGene database (sequences in UniGene Build 95 are from GenBank 113 and dbEST/10-02-99).UniGene clusters are represented by one or more consensus sequences derived directly from cluster members.

HG-U133 A-B

The HG-U133 Set is comprised of two microarrays which represent greater than 33,000 of the best characterized human genes. The HG-U133 Array represents the vast majority of the genes featured on its predecessor, the HG-U95Av2 Array.Similarly, a large percentage of the HG-U95B, HG-U95C, HG-U95D and HG-U95E arrays is represented on the HG-U133 Set

Yeast Genome

S98 Array

This probe array offers complete gene expression monitoring capability for the entire yeast genome on a single array.Approximately 6,400 well-recognized Open Reading Frames of the yeast.Saccharomyces cerevisiae are represented.

U34A or B or C

This set of 3 GeneChip probe arrays contains probes interrogating approximately 24,000 mRNA transcripts and EST clusters from Build#34 of the UniGene database.The A array of the Rat Genome U34 Set analyzes approximately 7,000 full-length sequences and approximately 1,000 EST clusters. The B and C arrays each analyze approximately 8,000 EST clusters.All sequences analyzed by the Rat Toxicology U34 Array are also analyzed by the A array of this set.

Rat Toxicology Probe Array

This GeneChip probe array interrogates approximately 850 mRNA transcripts and EST clusters from Build#34 of the UniGene database.All sequences represented on the Rat Toxicology U34 Array are also represented on the Rat Genome U34A Array

Rat Neurobiology

U34

This probe array contains over 1,200 sequences relevant to neurobiology research.Affymetrix teamed up with neurobiology researchers from both academic and pharmaceutical laboratories to select the sequences.

Mouse Genome

U74Av2

The Murine Genome U74 Set contains probe sets for approximately 6,000 full-length genes and approximately 30,000 EST clusters.All full-length genes are on the A array of the set.In addition, all representation of the Mu11K Set is also on the A array.Representation of the Mu19K Set is split between the A and the B arrays.

Arabidopsis Genome

Sequences used to develop this array were obtained from GenBank in collaboration with Novartis Agriculture Discovery Institute (NADI).Eighty percent of the genes represented on the GeneChip Arabidopsis Genome Array are predicted coding sequences from genomic BAC entries.Twenty percent are high quality cDNA sequences.The array also contains approximately EST clusters, sharing homology with the predicted coding sequences from BAC clones.

E.coli Genome

The E.coli Genome Array contains probes for more than 4,200 known Open Reading Frames of E.coli.Sequence information for probes on the array corresponds to the M54 version of E.coli Genome Project database at the University of Wisconsin.Sequences corresponding to both characterized genes and those whose function is unknown can also be identified according to the numbering convention used by the E.coli Genome Project database.

Drosophila Genome

Sequences used to develop this array are accessible through FlyBase.Greater than 13,500 gene sequences predicted from the annotation of the Drosophila genome are represented on the array.This includes genes for which confirming EST or full-length cDNA evidence is available (over 8,000 genes have at least 1 EST/cDNA match).

Back to the top

DEFINITIONS OF BASIC TERMS

  • Array: A collection of probes on glass that make up a Genechip.
  • Comparative Analysis: The analysis of an experimental array to a baseline array.
  • Feature: A collection of probes in a defined area called a “cell”.Each probe cell contains millions of copies of a specific nucleotide probe and is a defined square area on the array.
  • Homomeric Mismatch:A mismatch that is complementary of the base changed (i.e., A®T)
  • Hybridization Spikes: Controls that added to the sample before cDNA synthesis and hybridization reactions.
  • Metrics:The calculated answer of mathematical equations used by the GeneChip algorithm software.
  • Probe: A single stranded DNA oligonucleotide complementary to a specific sequence(usually 25 bases long)
  • Probe Cell: A single square-shaped feature on an array containing one type of probe.Each probe cell contains millions of probe molecules Perfect Match: (PM) Probes that are designed to be complementary to a reference sequence
  • Mismatch: (MM) Probes that are designed to be complementary to the reference sequence except for a homomeric base mismatch at the central (13th) position.  Mismatch probes serve as a control for cross-hybridization
  • Probe Pair: Two probe cells, a PM and its corresponding MM.On the probe array, a probe pair is arranged with a PM cell directly above the MM cell.
  • Probe Set: A set of probes designed to detect one transcript.A probe set usually consists of 16-20 probe pairs.For example, a 20 probe pair set is made up of 20 PM and 20 MM for a total of 40 probe cells.
  • SAPE: Streptavidin-phycoerythin dye used to bind to the biotinylated nucleotide incorporated into the transcript of interest during the in vitroitranscription (IVT) reaction.
  • Target: Fragmented, biotinylated anti-sense cRNA prepared from mRNA to be analyzed.Target molecules are hybridized to the probe array, and the levels of hybridization are measured with the HP GeneArray scanner after the array is stained with streptavidin-phycoerythin(SAPE)
  • Probe Array Tiling: T probe cells into probe sets and probe pairs.
Back to the top

RUNNING A COMPARISON ANALYSIS

Note: A comparison analysis examines the hybridization intensity data (*.cel file) from an experiment and baseline probe array (of the same probe array type) and identifies relative changes in the expression level of each transcript represented on the arrays. An absolute analysis of the baseline must be run before the comparison analysis of the experiment and baseline.
  1. Select “file” from the menu bar
  2. Select “open” from the GeneChip menu and open the desired experiment data file from the *.dat files listed in the Open dialog box.
  3. Select “run” from the menu bar
  4. Select “analysis” from the GeneChip menu

    The “Save Results As” dialog box automatically open and displays the analysis output file (*.chp) default name, which is the same as the experiment name specified during experiment set up.
  5. Enter a new name (the name you want your comparison file to be).
  6. Select “OK”. The Expression Call Settings dialog box automatically opens.
  7. Select the “scaling” tab from the expression call settings window. Make sure that “all probe sets” is selected and that “target signal”  is set to 500. Also, under “algorithm parameters” both “prompt for output filename” and display settings when analyzing data” should be selected.
  8. Select “normalization” tab. Make sure “user defined” is chosen and that “normalization value” is one. Also, under “algorithm parameters” both “prompt for output filename” and “display settings when analyzing data” should be selected.
  9. Select “probe mask” tab. When running a comparison analysis you can select different types of default probe mask as well as design your own. However, until you become familiar with the software we recommend that “NO” mask be selected at this time. Both, “prompt for output filename” and “display settings when analyzing data” should be selected
  10. Select the “baseline” tab, place a check mark next to “Use Baseline Comparison File”
  11. Select “browse”, select the baseline file from the Baseline Comparison File dialog box, and double-click the appropriate file name.
  12. Select “OK”.The Comparison will take a few minutes.After the analysis is finished, the Expression Analysis window (EAW) opens and displays the output analysis results. If the EAW is already open, the results are added to the open window and it may be necessary to scroll down to see the newly added results.
SETTING UP DATA TABLE FOR A COMPARISON ANALYSIS
  1. Select “analysis” from the menu bar (must be in Expression Analysis Window)
  2. Select “options”
  3. Select the “pivot” tab
  4. Select the following under the Statistical Comparison Results:
    Stat Common Pairs
    Signal Log Ratio
    Signal Log Ratio Low
    Signal Log Ratio High
    Change
    Change p-value SAVE “.CHP” FILE AS A MICROSOFT EXCEL DOCUMENT
  1. Select “File” from the menu bar.
  2. Choose “save as”.
  3. Give Pivotdata file a Name
  4. Press “save” button
  5. Open Microsoft Excel Program
  6. Open the appropriate file (Text File) You may have to change the file type to show all files.
  7. A delimiting “wizard” will start, simply press the “next” button to input the data into the spreadsheet.
  8. Press “next” button
  9. Press “finish” button
  10. Adjust columns.
  11. Press “X” button in the upper right hand corner of Microsoft Excel Window
  12. Press “yes” button
  13. Select “Save AS Type”
  14. Choose “Microsoft Excel97& 5.0/98 Workbook.
  15. Press “save” button.
  16. Delete .txt file once the Microsoft Excel File has been created.

SORTING THROUGH A COMPARISON ANALYSIS DATA FILE

  1. Remove all “NC” from report. All other probe sets remain and is sorted “increase” to “decrease”
Back to the top

Expression Report Parameters

  • Probe Pair Threshold:The minimum number of probe pairs a probe set must have in order for the probe set data to be included in the calculation of the report statistics.
  • Alpha1:The significance level for the detection p-value in an absolute analysis.Alpha1 is user-modifiable parameter that is set in the Parameters tab of the Expression Analysis Settings.If the probe set detection p-value < alpha1, the call is present.Default = 0.04
  • Alpha2:The second significance level for the detection p-value in an absolute analysis.Alpha2 is a user-modifiable parameter set in the Parameters tab of the Expression Analysis Settings.
  • If the probe set detection p-value ³ alpha2, the call is absent.If alpha1 £ detection p-value < alpha2, the call is marginal.Default = 0.06
  • Tau:Tau is a user-modifiable parameter that is set in the Parameters tab of the Expression Analysis Settings.Ideally, tau should be set to a value that is a little larger than the median of the discrimination scores of the probe sets whose targets are absent to avoid false detected calls.Default = 0.015
  • Noise (Raw Q):The degree of pixel-to-pixel variation among the probe cells used to calculate the background.
  • Scale Factor:The scale factor specified in the Scaling tab of the Expression Analysis Settings dialog box or computed by the algorithm.
  • TGT Value:The user-specified target signal for scaling of the experiment probe array.The target signal is set in the Scaling tab of the Expression Analysis Settings dialog box.
  • Norm Factor (NF):The normalization factor specified in the Normalization tab of the Expression Analysis Settings dialog box or computed by the algorithm.
  • Gamma1 H:The small significance level for the change calls at high intensities.Gamma1 H is a user-modifiable parameter that is set in the Parameters tab of the Expression Analysis Settings.Default = 0.0025
  • Gamma2 H:The large significance level for the change calls at high intensities.Gamma2 H is a user-modifiable parameter that is set in the Parameters tab of the Expression Analysis Settings.Default = 0.003
  •  Gamma1 L:The small significance level for the change calls at low intensities.Gamma2 Lis a user-modifiable parameter that is set in the Parameters tab of the Expression Analysis Settings. Default = 0.0025
  • amma2 L:The large significance level for change calls at low intensities.Gamma2 L is a user-modifiable parameter that is set in the Parameters tab of the Expression Analysis Settings.Default = 0.003
  • Perturbation:A user-modifiable expression algorithm parameter that is set in the parameters tab of the Expression Analysis Settings.Perturbation influences the p-value computed for a probe set in a comparison analysis.Default = 1.1
  • Baseline Noise (Raw Q):The degree of pixel-to-pixel variation among the probe cells used to calculate the background in the baseline probe array.
  • Baseline Scale Factor (SF):The scale factor specified for the baseline probe array in the Scaling tab of the Expression Analysis Settings dialog box or computed by the algorithm.
  • Background:Minimum, maximum, average, and standard deviation of the background intensity calculated for the probe array.
  • Noise:The minimum, maximum, average, and standard deviation of the noise calculated for the probe array.
  • Corner +:The average cell intensity for the sense probe cells used in the grid alignment process.
  • Corner -:The average cell intensity for the antisense probe cells used in the grid alignment process.
  • Central +: The average cell intensity for the nine probe cells that compose the cross at the center of a sense probe array.
  • Central -:The average cell intensity for the nine probe cells that compose the cross at the center of the antisense probe array.
  • Total Probe Sets:The number of probe sets on the array that exceed the probe pair threshold and are not called No Call.
  • Average Signal:The average signal for all probe sets that exceed the probe pair threshold and are not called No Call.
  • Controls:The expression report includes the signal and call data for the probe sets that correspond to the housekeeping or spike control transcripts.Separate signal and call data are reported for the probe pairs specific to the 5', middle (M'), and 3' regions of the control transcripts.
  • Sig(all):The average signal for all control probe sets.
  • Sig(3'/5'):For a probe set, Sig(3')/Sig(5').
Back to the top

Description of File Types

AFFYMETRIX DATA FILES
*.exp experimental information file.Information about experiment name, sample and probe array are stored in this file.The experiment name then becomes the file name for subsequent files generated in the analysis.
*.dat data file.The image of the scanned probe array is stored in this file.
*.cel cell intensity file.The software derives the *.cel file from a *.dat file and automatically creates it upon opening a *.dat file.It contains a single intensity value for each probe cell delineated by the grid (calculated by the Cell Analysis algorithm).
*.chp analysis output file (chip file).This is the output generated by the analysis of a *.dat or *.cel file.
*.rpt output file (report file).The report generated from the analysis output file *.chip). Expression Report Parameters.

PROBE INFORMATION FILES
  *.cif    chip information file.Contains grid size and parameters for analysis and scanner settings.DO NOT change any information in this file
*.cdf   chip description file.All 50mm arrays have an encrypted .cdf file, which contains names and coordinated for each gene represented on the chip.All 24mm arrays have unencrypted, probeless .cdf files.
*.msk mask file.This is a user defined file, which permits the user to select a subset of probes from analysis or a subset of genes for normalization and/or scaling.

Back to the top
Gene expression is the process by which messenger RNA (mRNA) and subsequently protein is synthesized from the DNA template of each gene. Although protein concentrations ultimately dictate the functional state of the cell, mRNA levels serve as a readily accessible intermediate by which gene expression can be monitored. For most genes, steady state mRNA levels approximate protein levels and therefore quantitation of mRNA levels provide important clues with regards to cellular processes.  Expression levels of genes are altered by a combination of environmental and genetic factors. The hypothesis that these factors and ultimately the resulting changes in gene expression levels dictate the occurrence of human diseases has generated much interest in the area of functional genomics. As a result of the need for high throughput analysis of gene expression, microarray technology has emerged as the foremost technology utilized in functional genomics.

  Microarray assays borrow traditional hybridization techniques utilized on flexible membranes and apply them to a solid surface such as glass. Non-porous surfaces allow for the deposition of small amounts of biochemical material in a precise location, thus providing highly dense arrays. It is this miniaturization of the array that allows for high throughput gene expression analysis of hundreds of genes in a single experiment. In addition to the change in surface compared to traditional methods, fluorescently labeled probes have replaced radioactive labeling. 

Fabrication

Although several technologies for printing arrays have been developed, the core utilized contact printing. Microspotting spins free-floating in a printhead, which is affixed to a robot capable of XYZ motion, uptake material from a source plate (0.25ml) and deposit it on multiple glass slides (0.6nl). 

Biochemical reactions

Total RNA is isolated from a biological source and is used to generate cDNA. Oligo dT primed-reverse transcription reactions incorporate fluorescently labeled nucleotides into the synthesized cDNA.This labeled antisense target can now be hybridized to sense probes, which have been affixed to a substrate as described above. 

Detection and data analysis

The hybridized array is subsequently washed and scanned on a confocal laser scanner. The laser emits light in the excitation range of the fluorescent dye, which in turn emits light at its specific emission range.  The resulting light is capture by a photomultiplier tube (PMT) and converted into an electronic signal, which can be quantitated to generate a corresponding signal intensity.
Back to the top

Overview of Gene Chip Technology

GeneChip hybridization technology makes it possible to assess the relative mRNA expression level of thousands of genes and Expressed Sequence Tags(ESTs) simultaneously.  Various types of GeneChips are currently available and more are being developed.  Human gene arrays, as well as comparable murine, rat, yeast and E.coli are available (see below for current chips we can provide).  GeneChip Array technology is best used for comparative studies and has been successfully employed in genomics discovery programs to highlight differences in gene expression patterns between normal and cancer cells, to detect polymorphisms, facilitate genotyping and to assist in disease management.  Quantitative and reproducible detection of transcripts over a wide range of mRNA expression levels is possible.
 Definitions of Basic Terms

Back to the top

Description of the Methodology

Thousands of target genes are probed by single stranded oligonucleotides constructed on the GeneChips.  Each gene is represented on the array by multiple probe pairs, ranging form 12-16, depending on which array is used.  Each probe pair consists of a perfect match oligonucleotide probe and a single base mismatch oligonucleotide.  The difference in hybridized signal between members of the probe pair is used to identify non-specific hybridization and background signal. Probes are chosen from unique regions at the 3'-end of the gene, which allows for detection of individual transcripts within gene families.  Each gene array also contains probes for reference genes, which may be spiked into the RNA samples as standards, allowing for comparisons between experiments.  For a description of Affymetrix GeneChips, click here.

Core Overview | Custom Arrays | Affymetrix Gene Chips | Sequencing | Forms | Equipment
This site published by Sealy Center for Molecular Medicine
Questions of comments: genomics@scms.utmb.edu
Copyright ©2004  The University of Texas Medical Branch. Please review our privacy policy and Internet guidelines.