Automatic Text Simplification. Horacio Saggion

Читать онлайн книгу.

Automatic Text Simplification - Horacio Saggion


Скачать книгу
as well as for texts of peculiar characteristics such as web pages.

      Readability formulas and studies have been proposed for many different languages. For Basque, an agglutinative language with rich morphology, Gonzalez-Dios et al. [2014] recently proposed using a number of Basque-specific features to separate documents with two different readability levels, achieving over 90% accuracy (note that is only binary classification). A readability formula developed for Swedish, the Lix index, which uses word length and sentence length as difficulty factors, has been used in many other languages [Anderson, 1981]. There has been considerable research on readability in Spanish [Anula Rebollo, 2008, Rodríguez Diéguez et al., 1993, Spaulding, 1956] and its application to automatic text simplification evaluation [Štajner and Saggion, 2013a].

       1 http://www.weeklyreader.com

       2 http://literacynet.org/

       3 https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

       4 http://www.helsinki.fi/varieng/CoRD/corpora/FLOB/

       5 http://www.dueparole.it/default_.asp

       6 http://www.repubblica.it/

       7 http://www.corestandards.org/ELA-Literacy/

       8 http://wing.comp.nus.edu.sg/downloads/mwc/

       9 http://www.onestopenglish.com/

      CHAPTER 3

       Lexical Simplification

      Lexical simplification aims at replacing difficult words with easier-to-read (or understand) expressions while preserving the meaning of the original text segments. For example, the sentence “John composed these verses in 1995” could be lexically simplified into “John wrote the poem in 1995” without altering very much the sense of the initial sentence. Lexical simplification requires the solution of at least two problems: first, finding of a set of synonymic candidates for a given word, generally relying on a dictionary or a lexical ontology and, second, replacing the target word by a synonym which is easier to read and understand in the given context. For the first task, lexical resources such as WordNet [Miller et al., 1990] could be used. For the second task, different strategies of word sense disambiguation (WSD) and simplicity computation are required. A number of works rely on word frequency as a measure of both word complexity and word simplicity [Carroll et al., 1998, Lal and Rüger, 2002], others argue that length is a word complexity factor [Bautista et al., 2011], while some use a combination of frequency and length [Bott et al., 2012a, Keskisärkkä, 2012].

      Конец ознакомительного фрагмента.

      Текст предоставлен ООО «ЛитРес».

      Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.

      Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.

/9j/4RUCRXhpZgAATU0AKgAAAAgABwESAAMAAAABAAEAAAEaAAUAAAABAAAAYgEbAAUAAAABAAAA agEoAAMAAAABAAIAAAExAAIAAAAeAAAAcgEyAAIAAAAUAAAAkIdpAAQAAAABAAAApAAAANAALcbA AAAnEAAtxsAAACcQQWRvYmUgUGhvdG9zaG9wIENTNiAoV2luZG93cykAMjAxNzowNDoyNSAyMTow ODo1MQAAA6ABAAMAAAABAAEAAKACAAQAAAABAAAIxqADAAQAAAABAAAK1gAAAAAAAAAGAQMAAwAA AAEABgAAARoABQAAAAEAAAEeARsABQAAAAEAAAEmASgAAwAAAAEAAgAAAgEABAAAAAEAAAEuAgIA BAAAAAEAABPMAAAAAAAAAEgAAAABAAAASAAAAAH/2P/tAAxBZG9iZV9DTQAB

Скачать книгу