My initial workflow was based on this blog post from the Data Sitters Club
Identify the Problem
Identify Objectives
Identify and Identity goals and criteria for success
Create a set of questions for identifying correct dataset
Use WorldCat to find physical copies of the books and arrange to visit the books observing the rules of the various institutions.
Explore Cookbooks
Use https://voyant-tools.org/ to find initial word trends
Use AntConc to start analyzing data such as common word clusters, frequency. Keep notes and start documenting your workflow.
Start comparing cookbooks in AntConc.
Acquiring the Data
Identify the "right" data sets
Visit multiple copies of the book, observing evidence of use, understand that generally library copies are nice, note the physical elements of the book, annotations, take reference photographs if possible. Understand that you may not be able to revisit the book and this may be your only opportunity to work with the physical object.
Import data and set up local or remote data structure
Carefully name your folders and files.
Develop a folder system for the reference photographs and PDF’s.
Determine most appropriated tools to work with data
Use Tropy to manage reference photographs and PDF’s, taking time to input the necessary metadata
Look for a .txt version of the book through Hathi Trust
If only a pdf version exists use Adobe Acrobat or other program to create a txt version of the book, and clean up the resulting file, this takes a long time and don’t panic.
Carefully name your txt files.
Clean up the text
Download AntConc
Import file to AntConc and start looking for errors in the txt document.
Correct the errors in .txt document and re-open to AntConc.
Carefully Copy and Paste the word list and frequency from AntCon Word List tab to Excel, this cannot be done as one action, creating two columns. Each column needs to be copied and pasted separately. This might be a better time to filter out Stop Words.
At this stage each cookbook had its own sheet in an Excel workbook, it was a challenge to keep the column headings consistant across the worksheets.
Create a controlled vocabulary to populate details about the words in the Excel Sheets and develop drop-down menus for the columns.
Work on the Excel spreadsheet with the AntConc application open and a PDF of the cookbook on separate screens if possible
Start cleaning the data, combining duplicate words such as “apple” and “apples,” make sure to update the frequency column as words are combined.
Start coming up with descriptors to use to code the data. I found that using multiple columns to define each word was helpful. Same columns on each sheet -Column headings- and develop a dropdown menu for each column.
Develop a system for managing terms that consist of two or more words such as “corn bread” or “Mrs A.F. Judd” this may involve using the Clusters tab in AntConc
At this stage, I combined the sheets into one sheet and developed a metrics sheet to track my progress.
Add an additional cookbook to the analysis to test the workflow and controlled vocabulary.
Parse the data
Decide what text you want to include in your analysis, Chapter Headers, Index. Delete the unwanted text before saving the file.
As you are exploring other cookbooks, download and save the files as .txt files whenever possible.
Allow yourself to feel occasionally frustrated throughout this whole experience
Be careful of version control of Excel workbook
Mine the Data
Detertmine Sampling methodology and sample data
Format clean, slice, and combine data in Excel
Learn about and decide how to handle Stop Words
Develop a method to describe words you plan to exclude from the analysis.
Document the additional questions you have of the text as you clean up the workbook
Create Necessary derived columns from the data (new data)
Take a break and exercise daily.
Refine the Data
Identify trends and outliers
Apply descriptive and inferential statistics
Document and transform Data
Start coding, don’t panic when the descriptors need to change, make sure to make the same changes in each sheet, dropdown menu useful.
Clear your head as needed.
Research unfamiliar terms, don’t assume that it is a nonsense term, such as “brewis”. It may be an important fish dish.
Don’t panic, you will find more questions the more words you process
Build a Data Model
Select Appropriate Model
Sign up for a Tableau Public account, download application.
Watch instructional videos for Tableau Public
Explore how others use Tableau Public at Gallery-Viz of the Day.
Import the spreadsheet in to Tableau Public, and start making visuals to compare the texts, understand that to save visualizations you need to make them public.
Remember to hydrate
Build Model
Evaluate and Refine model
Find mistakes in your Excel Workbook.
It may be helpful to look at the workbooks in Open Refine to look at the workbook differently
Read training materials on Open Refine to determine if it will help
Upload new spreadsheets to Tableau Public as needed. My page is Here
Don’t Panic
Revisit your excel sheet to make changes, this will be a constant back and forth between Excel and Tableau Public while also looking at the PDF’s of the text and the AntConc xxx of the text.
Continue cleaning up your data
Update your Tableau Dashboards as needed, through linking spreadsheet and updating Tableau Public
Update your Tableau Dashboards as needed, through linking spreadsheet and updating Tableau Public
Learn how to update the viz’s without needing to rebuild all of the viz’s
Eat meals/snacks as needed throughout this process.
Present the Results
Summarize findings with narrative, storytelling techniques
Present limitations and assumptions of your analysis
Identify follow up problems and questions for further analysis
Use Antconc to explore other cookbooks.