Multiple-Translation Arabic - a new corpus:



Multiple-Translation Arabic (MTA) Part 1 supports the development of 
automatic means for evaluating translation quality. The corpus contains 10 
sets of human translations for a single set of Arabic source materials. 
Additionally, translations from various commercial-off-the-shelf-systems 
(COTS, including commercial Machine Translation (MT) systems as well as MT 
systems available on the Internet) are included. There are a total of 2 
sets of COTS outputs, and one output set from a TIDES 2002 MT Evaluation 
participant, which is representative for the state-of-the-art research 
systems.

To determine whether automatic evaluation systems, such as BLEU, track 
human assessment, human assessments on the two COTS outputs and the TIDES 
research system were performed. The corpus includes the assessment results 
for one of the two COTS systems, the assessment result for the TIDES 
research system, and the specifications used for conducting the assessments.

A total of 141 journalistic Arabic text files from the Xinhua and AFP news 
services were selected for Multiple-Translation Arabic (MTA) Part 1. The 
corpus is available via ftp transfer.

For further information, including online documentation, please visit:

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T18

Institutions that have membership in the LDC during the 2003 Membership 
Year will be able to receive this corpus free of charge. Nonmembers may 
license this publication for $600.


If you need additional information before placing your order, or would like 
to inquire about membership in the LDC, please send email to 
<ldc@ldc.upenn.edu>

 or call (215) 573-1275.

*

------------------------------------------------------------------------------- 

Linguistic Data Consortium Phone: (215) 573-1275
University of Pennsylvania Fax: (215) 573-2175
3600 Market Street Suite 810 email: 
ldc@ldc.upenn.edu
Philadelphia, PA 19104-2653 www: 
http://www.ldc.upenn.edu