Multiple-Translation Arabic - a new corpus: Multiple-Translation Arabic (MTA) Part 1 supports the development of automatic means for evaluating translation quality. The corpus contains 10 sets of human translations for a single set of Arabic source materials. Additionally, translations from various commercial-off-the-shelf-systems (COTS, including commercial Machine Translation (MT) systems as well as MT systems available on the Internet) are included. There are a total of 2 sets of COTS outputs, and one output set from a TIDES 2002 MT Evaluation participant, which is representative for the state-of-the-art research systems. To determine whether automatic evaluation systems, such as BLEU, track human assessment, human assessments on the two COTS outputs and the TIDES research system were performed. The corpus includes the assessment results for one of the two COTS systems, the assessment result for the TIDES research system, and the specifications used for conducting the assessments. A total of 141 journalistic Arabic text files from the Xinhua and AFP news services were selected for Multiple-Translation Arabic (MTA) Part 1. The corpus is available via ftp transfer. For further information, including online documentation, please visit: http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T18 Institutions that have membership in the LDC during the 2003 Membership Year will be able to receive this corpus free of charge. Nonmembers may license this publication for $600. If you need additional information before placing your order, or would like to inquire about membership in the LDC, please send email to <ldc@ldc.upenn.edu> or call (215) 573-1275. * ------------------------------------------------------------------------------- Linguistic Data Consortium Phone: (215) 573-1275 University of Pennsylvania Fax: (215) 573-2175 3600 Market Street Suite 810 email: ldc@ldc.upenn.edu Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu