|
|
|
cDNA libraries were constructed from polyA+ mRNA in pSport 1 and 2 vectors as indicated. Subtractive hybridization was performed according to manufacturer's protocol (Life Technologies). |
This is a frequency histogram of the number of independent clones sequenced for a given putative transcript. In all, the 6028 randomly picked clones represent 4014 unique (non-overlapping) groups.
Using Poisson statistics it is possible to roughly estimate the total complexity of the library. That is, how many unique transcripts would be found if the library were sequenced to completion. Fitting our distribution to the Poisson distribution:
where k is the number of times a gene transcript is found and m is the Poisson mean. The mean m can be estimated because we know empirically the values of k. If we assume that a clone has an equal chance of being sequenced, then the total complexity for this library is near 11, 500. In other words, we have sequenced about 35% of the library. While the underlying assumption of clones having an equal probability of being sequenced is untrue because the library is not normalized, it is a rough approximation of the true library complexity.
For this analysis, Each individual sequence was compared (by BLAST) against the mouse UniGene, human UniGene, and StemCellDB/StroCDB contig databases. Sequences showing very significant match to the same member of these mouse, human, or StemCellDB/StroCDB databases (in that order) were considered to be part of the same cluster. These analyses were performed using software designed and written by Robert Phillips and modified by Jason A. Hackney.