IBM Watson Analytics Academic Program

Bob Hoyt MDDirector, Health Informatics ProgramUniversity of West FloridaDallas Snider PhDAssistant Professor of Computer ScienceUniversity of West FloridaMost people are familiar with the story about IBM Watson's victory playing Jeopardy in 2011, but may not be aware that there are different versions. Watson Health Cloud is an open source cloud-based cognitive computing platform that is well positioned for “big data” analytics; further enhanced by the acquisition of the analytics vendors Explorys and Phytel in 2015. [1] They have also partnered with Apple, Johnson & Johnson and Medtronic, to analyze burgeoning data generated by personal fitness and implantable devices. With this approach, data available in Apple HealthKit and ResearchKit can be mined. [2]Other versions of IBM Watson include Watson Discovery Advisor (hypotheses explorer); Watson Engagement Advisor (end-user interactions); Watson for Wealth Management (financial advice) and Watson for Oncology (cancer treatment recommendations). [3]The most recent addition to this suite of Watson tools is IBM Watson Analytics (IWA), which was released in late 2014 as a free personal version. [4] In May 2015 Watson Analytics Professional was launched. [5] This cloud-based program has four basic sections, Explore (descriptive), Predict (analytic), Assemble (data visualization) and Refine (data preparation). We began with evaluating the fremium version that accepts .csv and Microsoft Excel files from your computer or common data repositories such as DropBox. The user can also upload Twitter feeds from their web site that can be analyzed.We created an Excel spreadsheet based on the 2015 County Health Rankings for the state of Florida. [6] After pre-processing the data we retained the following standard health measures, (% obese, % physically inactive, % smokers, % excessive drinking, teen birth rate, % uninsured, graduation rate, % some college, % unemployed, % single parent households, % days feeling fair-poor, physically unhealthy days and mentally unhealthy days). The rows consisted of data from the 67 Florida counties. The first step in the Watson Analytics process is to upload and save the file for analysis. The user will receive a data quality score on the dataset. Our data score was 91, which was excellent, but the software pointed out several potential outliers and one skewed distribution.When the data was first analyzed using “Explore”, Watson Analytics automatically generated 10 questions based on the data, such as “What is the breakdown of % obese by county”. In seconds, a map of all Florida counties was generated with % obese noted for each county (figure 1).Figure 1 % Obese by CountyUsing natural language processing, a user can enter other questions in a search window. We entered “what is the relationship between % physically inactive and % obese and a graph was automatically generated (figure 2). Mouse over any data point and the county is identified with the raw data. The user can filter the data and augment it with calculations, data groups and hierarchies. In addition to the map view, data can be represented in tree, grid, area, bar, bubble, line, pie and categorical charts. With the free account, you can email the results and with the paid account you can save and share data via a link, image, PowerPoint or PDF. Results can be saved and pinned for future use.Figure 2. The relationship between % physically inactive and % obese by CountyThe next step was to upload the same dataset into “Predict”.  We used % obese as the target and predictions automatically appeared. The top predictor was % physically inactive at 72% but the program interjected that the addition of “% uninsured” increased the predictive ability to 83%. At the top of the Predict window were 14 associations, automatically calculated by Watson Analytics. For example, there was a chart labeled “% obese and % fair-poor health are positively correlated”. Double click on the chart and the user can select the statistical details hyperlink and the results and the statistical methodology appears; e.g. (Pearson correlation, p Figure 3 Predict for factors related to % ObeseThe third step is known as “Assemble” which provides the user with the opportunity to create dashboards and infographics by simply dragging and dropping data into the active panes. Again, multiple choices exist as to how the data can be represented/displayed.  The display is actually interactive, in the sense when Counties is added with a scroll down list, changing the county changed all of the data displayed. (figure 4)Figure 4 DashboardThe four step is "Refine" where users can view and manipulate the raw data, as you would do in any spreadsheet.IBM Watson Analytics has a community page where problems can be shared and discussed, as well as useful hints posted. YouTube videos are available, as are webinars.The free version of Watson Analytics allows 1 user, 100,000 rows, 50 columns and 500MB of storage. The personal version is $30/month and permits 1 user, 1 million rows of data, 256 columns, 2 GB of storage and access to additional data. The professional version is $80/month and permits multiple users, 10 million rows, 500 columns, 100 GB of data storage, access to additional data (like Twitter feeds) and enhanced ability to share data sets.IBM offers the Professional version to universities free of charge for 12 months, if they plan to use Watson Analytics in their teaching programs. All faculty can access the program and up to 100 students can access the software. We are now in the process of evaluating Watson Analytics and comparing it with other data mining packages. The formalWatson academic program (WAP)launched in mid-July 2015.Where does IBM Watson Analytics fit in the overall scheme of data analytics/mining? It is probably too early to say for sure, but clearly IBM is attempting to make analytical tools available to a far wider audience than just data scientists. Traditionally, data scientists had to be mathematicians, statisticians, database experts and domain experts. According to Jeffrey Stanton there are 4 A’s of data: data architecture, data acquisition, data analysis and data archiving. [7] Watson Analytics provides a solution for the last three A’s. This platform is limited to Excel or .csv input and it does not perform clustering or unsupervised learning.In August 2015 IBM added more features: both the Personal and Professional versions support input from relational databases such as MS SQL Server and IBM DB2,  as well as IBM Dash DB, IBM SQLDB for Bluemix, MySQL, IBM Cognos BI, .tsv and .sav files and cloud storage such as MS OneDrive, Box and DropBox. You will see these options when you select "Add" and then "upload data". You can upload sample data sets from the same Add area.Watson Analytic's approach is clearly streamlined, but their results must be validated by other approaches. It does provide a very rapid way to look at preliminary data for a variety of knowledge workers. If trends are detected, further exploration can be achieved with other statistical packages and consultation with data scientists.Dr. Hoyt plans to use Watson Analytics to teach his graduate students in Health Informatics more about data analytics and population health. He is unaware of another similar programs that will easily upload a data file, explore relationships, make predictions and create customizable data visualizations. It is his goal to create a healthcare data analytics course in the next year and use Watson Analytics as one of the analytical tools.Dr. Snider plans to have his computer science students manipulate visualizations of the associations and patterns in their data sets that are used in class projects.  Furthermore, Watson Analytics' natural language processing will assist the students with comprehending the machine learning concepts discussed in the classroomReferences:1.  Mearian L. IBM launches Watson Health global analytics cloud. ComputerWorld April 14 2015. Accessed May 24 2015.2. Campbell M. Apple partners with IBM Watson Health Cloud to bring secure cloud, data analytics to HealthKit and Research Kit. Apple Insider. April 13, 2015. Accessed May 23, 2015.3. Clements D, Myers M. Get the facts on IBM Watson Analytics. Ibmbigdatahub.com Accessed May 25, 2015.4. IBM Watson Analytics http://www.ibm.com/analytics/watson-a... Accessed May 20, 20155. Taft DK. IBM launches Watson Analytics Pro. eWeek. May 11, 2015. Accessed May 25, 20156. County Health Rankings http://www.countyhealthrankings.org/r... Accessed May 20, 20157. Stanton J. Introduction to Data Science. Rutgers University. 2012. https://ischool.syr.edu/media/documen... Accessed May 20, 2015
 •  0 comments  •  flag
Share on Twitter
Published on September 02, 2016 10:56
No comments have been added yet.