Evolving Idealized Cookbook Analysis Workflow

My initial workflow was based on this blog post from the Data Sitters Club

    1. Identify the Problem

      1. Identify Objectives

      2. Identify and Identity goals and criteria for success

      3. Create a set of questions for identifying correct dataset

        1. Use WorldCat to find physical copies of the books and arrange to visit the books observing the rules of the various institutions.

          1. Explore Cookbooks

          2. Use https://voyant-tools.org/ to find initial word trends

          3. Use AntConc to start analyzing data such as common word clusters, frequency. Keep notes and start documenting your workflow.

          4. Start comparing cookbooks in AntConc.

    2. Acquiring the Data

      1. Identify the "right" data sets

        1. Visit multiple copies of the book, observing evidence of use, understand that generally library copies are nice, note the physical elements of the book, annotations, take reference photographs if possible. Understand that you may not be able to revisit the book and this may be your only opportunity to work with the physical object.

      2. Import data and set up local or remote data structure

        1. Carefully name your folders and files.

          1. Develop a folder system for the reference photographs and PDF’s.

        2. Determine most appropriated tools to work with data

          1. Use Tropy to manage reference photographs and PDF’s, taking time to input the necessary metadata

        3. Look for a .txt version of the book through Hathi Trust

        4. If only a pdf version exists use Adobe Acrobat or other program to create a txt version of the book, and clean up the resulting file, this takes a long time and don’t panic.

        5. Carefully name your txt files.

        6. Clean up the text

        7. Download AntConc

        8. Import file to AntConc and start looking for errors in the txt document.

        9. Correct the errors in .txt document and re-open to AntConc.

        10. Carefully Copy and Paste the word list and frequency from AntCon Word List tab to Excel, this cannot be done as one action, creating two columns. Each column needs to be copied and pasted separately. This might be a better time to filter out Stop Words.

        11. At this stage each cookbook had its own sheet in an Excel workbook, it was a challenge to keep the column headings consistant across the worksheets.

        12. Create a controlled vocabulary to populate details about the words in the Excel Sheets and develop drop-down menus for the columns.

        13. Work on the Excel spreadsheet with the AntConc application open and a PDF of the cookbook on separate screens if possible

        14. Start cleaning the data, combining duplicate words such as “apple” and “apples,” make sure to update the frequency column as words are combined.

        15. Start coming up with descriptors to use to code the data. I found that using multiple columns to define each word was helpful. Same columns on each sheet -Column headings- and develop a dropdown menu for each column.

        16. Develop a system for managing terms that consist of two or more words such as “corn bread” or “Mrs A.F. Judd” this may involve using the Clusters tab in AntConc

        17. At this stage, I combined the sheets into one sheet and developed a metrics sheet to track my progress.

        18. Add an additional cookbook to the analysis to test the workflow and controlled vocabulary.

    3. Parse the data

      1. Decide what text you want to include in your analysis, Chapter Headers, Index. Delete the unwanted text before saving the file.

      2. As you are exploring other cookbooks, download and save the files as .txt files whenever possible.

      3. Allow yourself to feel occasionally frustrated throughout this whole experience

      4. Be careful of version control of Excel workbook

    4. Mine the Data

      1. Detertmine Sampling methodology and sample data

      2. Format clean, slice, and combine data in Excel

        1. Learn about and decide how to handle Stop Words

          1. Develop a method to describe words you plan to exclude from the analysis.

        2. Document the additional questions you have of the text as you clean up the workbook

      3. Create Necessary derived columns from the data (new data)

      4. Take a break and exercise daily.

    5. Refine the Data

      1. Identify trends and outliers

      2. Apply descriptive and inferential statistics

      3. Document and transform Data

        1. Start coding, don’t panic when the descriptors need to change, make sure to make the same changes in each sheet, dropdown menu useful.

        2. Clear your head as needed.

      4. Research unfamiliar terms, don’t assume that it is a nonsense term, such as “brewis”. It may be an important fish dish.

      5. Don’t panic, you will find more questions the more words you process

    6. Build a Data Model

      1. Select Appropriate Model

        1. Sign up for a Tableau Public account, download application.

          1. Watch instructional videos for Tableau Public

          2. Explore how others use Tableau Public at Gallery-Viz of the Day.

          3. Import the spreadsheet in to Tableau Public, and start making visuals to compare the texts, understand that to save visualizations you need to make them public.

        2. Remember to hydrate

      2. Build Model

      3. Evaluate and Refine model

        1. Find mistakes in your Excel Workbook.

        2. It may be helpful to look at the workbooks in Open Refine to look at the workbook differently

        3. Read training materials on Open Refine to determine if it will help

        4. Upload new spreadsheets to Tableau Public as needed. My page is Here

        5. Don’t Panic

        6. Revisit your excel sheet to make changes, this will be a constant back and forth between Excel and Tableau Public while also looking at the PDF’s of the text and the AntConc xxx of the text.

        7. Continue cleaning up your data

        8. Update your Tableau Dashboards as needed, through linking spreadsheet and updating Tableau Public

        9. Update your Tableau Dashboards as needed, through linking spreadsheet and updating Tableau Public

          1. Learn how to update the viz’s without needing to rebuild all of the viz’s

        10. Eat meals/snacks as needed throughout this process.

    7. Present the Results

      1. Summarize findings with narrative, storytelling techniques

      2. Present limitations and assumptions of your analysis

      3. Identify follow up problems and questions for further analysis

        1. Use Antconc to explore other cookbooks.