Ledolter/VanderVelde
Analyzing Textual Information
SAGE Publications, Inc.
Chapter 4: Word Distributions: Document-Term Matrices of Word Frequencies and the "Bag of Words" Representation
- combine39.txt: Original text/data source of the Speeches of the 39th Congress
- Chapter4LastNameCorrection1.docx and Chapter4LastNameCorrection2.docx: Background materials explaining the correction of the last names in the 39th Congress data set
- last_name.RData: R data file with names of all speakers (needed for correcting metadata)
- Members of 39th Congress.docx: List of names of the members of 39th Congress
- Chapter4RCode1.docx: Program to process the Speeches of the 39th Congress. R code for entering the text data and stripping off the metainformation. Correcting the metavariables and separating Senate/House speakers with same last name. Output stored in file PrelimData.RData
- PrelimData.RData: Raw R data for 39th Congress
- Chapter4RCode2.docx: R code for the analysis in Sections 4.2 and 4.3
- Chapter4MoreOnZipf.docx: Details on the Zipf Law
- ZipfRCode.docx: Zipf’s Law: R code for simulation and estimation of parameters