Using California’s Public Records Act, Tim and his colleague, Richard Sander, obtained a data set about UCLA’s admissions decisions. In his book, Cheating: An Insider’s Report on the Use of Race in Admissions at UCLA, Tim describes more details about the data and how he and Sander were able to obtain it. As Tim notes in the book, UCLA was not entirely cooperative in handing over the data, and the university would not always answer his questions about the data. Accordingly, if you have questions about the data, beyond the notes here and what Tim described in the book, he probably cannot help you. You are probably better off asking UCLA admissions officials to answer your questions.
That said, here is an excel file containing data for the first three years after UCLA implemented a “holistic” admissions system:
The file contains admissions information about students applying to UCLA’s 2007-09 freshmen classes.
The file contains the following variables:
The first three columns are the applicant’s score on the three parts of the SAT exam, math, reading, and writing, These scores were “fuzzed” to protect the privacy of applicants (please see Tim’s book for more details). A “770” indicates a score of 760 or above. A “380” indicates a score of 390 or below. All other numbers represent the midpoint of a 30-point band. E.g., if the file indicates a “710” for the applicant, this really means that he or she scored 700, 710, or 720.
The next 21 variables indicate the applicant’s scores on the SAT’s subject tests. Usually, UCLA applicants take three such tests. The scores were “fuzzed” in the same manner that the SAT scores were fuzzed. Neither Tim nor Sander were told what the abbreviations stand for. Some are obvious (e.g., “sbj_phys_band” stands for the applicant’s score on the physics subject test) but others are not (e.g. neither Tim nor Sander know what “sbj_whis_band” stands for).
The next variable indicates the applicant’s score on the ACT test, if he or she reported scores to UCLA. A “15” indicates that the applicant scored 16 or below. All other scores indicate a four-point band, where the listed score indicates the second-highest possible score in the band. E.g., if the file lists a “31,” this means that the applicant scored a 29, 30, 31, or 32 on the test. If the file lists a “23,” this means that the applicant scored a 21, 22, 23, or 24 on the test.
The next variable indicates a range for the applicant’s “unweighted” high school GPA. These scores indicate the applicant’s GPA on a 4-point scale.
The next variable indicates the applicant’s weighted GPA. Here, if the applicant took an honors or AP class, an A is counted as 5 points. Here again, the data are fuzzed. A “4.3” indicates that the applicant’s weighted GPA was at least 4.25. A “3.5” indicates that the applicant’s weighted GPA was 3.55 or less. All other scores indicate the midpoint of a 0.1 range. E.g., if the file lists “4.1” as the applicant’s weighted GPA, this means that his or her actual weighted GPA was between 4.05 and 4.15.
The next variable indicates the range that the applicant reported as his or her family’s income. The next variable indicates the education level of the applicant’s parents. Specifically, it notes the level of the more educated parent. E.g., an “H” indicates that the more-educated parent was a high school graduate – thus, it means that neither parent attended college. The following abbreviations indicate the highest education level of the more-educated parent:
N = no high school
S = some high school
H = high school graduate
C = some college
T = two-year-college graduate
B = bachelor’s degree
G = graduate degree
The next variable, “ethnic_code”, should be self-explanatory, except the following information should be useful: “Chi” stands for “Chicano,” “Lat” stands for “Latino.” UCLA did not explain the difference in the two terms. “Vietnamese/Filipino” also includes other Southeast Asians, including Cambodians and Laotians. “Other Asian” means Japanese, Korean, Chinese, Indian, or Pakistani.
The next variables should be self-explanatory: the college (within UCLA) to which the applicant applied, outcome, California resident, and enrolled. (For some ethnicities, UCLA did not report whether the applicant was a California resident or not. This was due to the “publicly knowable” problem in maintaining the privacy of the applicants. Please see Tim’s book for more details.)
The next variable, “outreach,” indicates whether the applicant participated in one or more special programs that help him or her prepare for college. According to a document that UCLA sent Richard Sander and Tim, they are: ATDP (Academic Talent Development Program), Cal-SOAP, CAMP (California Alliance for Minority Participation in Science), CBOP (Career Based Outreach Program), COSMOS, Early Academic Outreach Program (EAOP), Educational Guidance Center (EGC), Educational Talent Search, UCSF Internships, GEAR UP, MESA (Mathematics, Engineering, Science Achievement) including MESA Schools Programs, SUCCESS through Collaboration and California Community College Program, PDP (Professional Development Program), Puente Program ,UC College Prep Initiative (UCCP) ,UCSC SAGE (Students Acquiring “A-G” Expectations) , Upward Bound , Young Entrepreneurs at Haas.
The next variable, “api,” stands for academic performance index. California public high schools are rated by the state government on certain factors that indicate the quality of the high school.
Not all high schools receive a rating. Hence, the variable is blank for many applicants. The api number indicates the decile of the high school in terms of its quality. A “10” represents a very high-quality high school, and a “1” indicates a very low-quality high school. The next three variables are identifiers of different readers of the files of the applicants. Please see Tim’s book for more information on this.
“sr_refer” indicates whether at least one of the two initial readers of the application file recommended the applicant for Supplemental Review. (Please see Tim’s book for an explanation of the latter phrase.)
The next three variables indicate the holistic scores that Readers A, B, and C gave to the applicant. According to the instructions UCLA gave admissions readers, the score must be a 1, 2, 2.5, 3, 4, or 5. (Neither Tim nor Richard Sander know why sometimes a score is not one of these numbers.) A “1” indicates a highly-qualified applicant, and a “5” indicates a poorly-qualified applicant. “holistic_rank” indicates the final holistic score for the applicant. Often this is the average of the scores that Reader A and Reader B give to the applicant. However, sometimes an applicant qualifies for a “second chance” round. (Please see Tim’s book for details about these rounds.) If so, the holistic_rank is the score that a senior admissions staff member unilaterally gives to the applicant. As Tim discusses in his book, it is impossible for an applicant to receive a “0” holistic rank. However, many applicants in the data set receive a 0. As Tim discusses, it appears that these are cases of recruited athletes. A “99” indicates a missing score.
The above file contains data for the three years just prior to UCLA’s adoption of the “holistic” system. It contains information about students who applied to UCLA’s 2004-06 freshmen classes.
The first 36 variables, listed in columns A through AJ, are the same as those described in the holistic.xlsx file. Please see the above for a description of these variables.
The next three variables, rdra_id, rdrb_id, and rdrc_id, are identification numbers for Reader A, Reader B, and Reader C. These readers judge the applicant on academic factors. Usually, the applicant receives reads from only two readers. However, if the two readers differ in their academic score by more than a point, the applicant may receive a read from a third reader.
The next three variables, rdra_score, rdrb_score, and rdrc_score, are the scores that the applicant receives from Readers A, B, and C. These readers judge the applicant only on academic factors, such as high school grades and SAT scores. A “0” indicates a missing score. However, as Tim discusses in his book, all of these appear to be cases of recruited athletes. A “1” to “6” indicates the academic quality of the applicant, where 1 indicates a highly qualified candidate, and a 6 indicates a poorly qualified candidate. A “7” indicates a “probably ineligible” applicant, and an “8” indicates “problematic to score (may have missing test scores, etc.)”.
“par_score” indicates the applicant’s personal achievement score, which is based on things like extracurricular activities. A “0” indicates a missing score, a “1” indicates a high-achieving applicant, and a “5” indicates a low-achieving applicant.
“lcl_score” indicates the life-challenge score of the applicant. It is based on things like whether the applicant came from a poor family, lived in a poor neighborhood, went to a low-quality high school, was raised by a single-parent, faced a physical disability, etc. A “0” indicates a missing score. A “1” indicates that the applicant faced many life challenges, and a “5” indicates that the applicant faced few life challenges. “academic_rank” indicates the final academic score of the applicant. Usually it is the average academic score that he or she received from Readers A and B.
“rdr_parlcl_id” is the ID number of the reader who gave the par_score and ldl_score to the applicant.
(In early February, 2023, I replaced the pre-holistic file. The old file did not contain all the data. Specifically, it seems that all data after row 65,535 in the old file were omitted. [I suspect that the problem was that I uploaded or transferred the data using an old version of Excel. Note that 65,535 = 2^16 – 1. Such numbers, 2^n – 1, are often constraint numbers in software programs.] If you downloaded the data before February, 2023, please check the number of rows in the file. The correct file should have 132,742 rows.)