Esme Manandise
Intuit Futures, Mountain View, California, USA
Download articlePublished in: Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019), September 30, Turku Finland
Linköping Electronic Conference Proceedings 165:5, p. 33-41
NEALT Proceedings Series 40:5, p. 33-41
Published: 2019-09-30
ISBN: 978-91-7929-997-2
ISSN: 1650-3686 (print), 1650-3740 (online)
The present study contributes to the literature on the language of the tax-and-regulations domain in the context of highly-formatted tax forms published by a federal agency. Content and form analyses rely on a methodology that looks for meaning and patterns in connection to the main purpose of income tax filing, i.e. figuring out calculations to determine whether taxes were overpaid or owed to the United States Internal Revenue Service. Profiling the income-tax forms by spelling out language regularities across the set has at least two advantages. Firstly, profiling contributes to the understanding of how the 2010 Plain Writing Act mandate of ‘clear and simple’ writing is being achieved—if at all. Secondly, profiling a small, unannotated corpus can help determine the Natural Language Processing approach best fitted to extract, represent, and execute automatically tax calculations expressed as arithmetic word problems.
tax-and-regulation domain, automatic annotation, raw text preprocessing, linguistic-feature-based classification, Plain Writing Act mandate, text compression, tabular content structuring, readability