This paper describes the design of a visual analysis tool for our Inclusion Processing Framework, called IPFViewer. The tool has been designed in cooperation with a large German steel production facility in order to acquire knowledge from data collected about nonmetallic inclusions and other defects in steel samples. We have highlighted parts of the framework in previous publications in interdisciplinary journals. Here, we describe our contribution in the area of grouped or clustered data items. These data groups are visualized by techniques known from uncertainty visualization to make visible fluctuations and corresponding variations in steel samples. However, our results are also transferable to other ensemble data. To find an optimal way to design algorithms and visualization methods to process the huge data set, we discuss the project-specific requirements regarding memory usage, execution behavior and precision. By utilizing approximate, incremental analysis techniques when needed, the responsiveness of the application is ensured while high precision is guaranteed for queries with fast response times. The design allows workers at the steel production facility to analyze correlations and trends in billions of data rows very quickly and to detect outliers in routine quality control.
[ACH*12] AGARWAL P. K., CORMODE G., HUANG Z., PHILLIPS J., WEI Z., YI K.: Mergeable summaries. In Proceedings of the 31st Symposium on Principles of Database Systems (New York, NY, USA, 2012), PODS ’12, ACM, pp. 23–34. 4
[AMP*13] AGARWAL S., MOZAFARI B., PANDA A., MILNER H., MADDEN S., STOICA I.: Blinkdb: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems (2013), ACM, pp. 29–42. 3
[BBPL14] BÜRGER F., BUCK C., PAULI J., LUTHER W.: Imagebased object classification of defects in steel using data-driven machine learning optimization. Proceedings of VISAPP 2014 -
International Conference on Computer Vision Theory and Applications (1014). 1
[FPDs12] FISHER D., POPOV I., DRUCKER S., SCHRAEFEL M.: Trust me, i’m partially right: Incremental visualization lets analysts explore large datasets faster. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2012), CHI ’12, ACM, pp. 1673–1682. 3, 4
[GK04] GREENWALD M. B., KHANNA S.: Power-conserving computation of order-statistics over sensor networks. In Proceedings of the Twenty-third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (New York, NY, USA,
2004), PODS ’04, ACM, pp. 275–285. 4
[HBT*12] HERWIG J., BUCK C., THURAU M., PAULI J., LUTHER W.: Real-time characterization of non-metallic inclusions by optical scanning and milling of steel samples. Proceedings SPIE (2012). 1
[HHW97] HELLERSTEIN J. M., HAAS P. J., WANG H. J.: Online aggregation. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (New York, NY, USA, 1997), SIGMOD ’97, ACM, pp. 171–182. 3, 4
[JC85] JAIN R., CHLAMTAC I.: The p2 algorithm for dynamic calculation of quantiiles and histograms without storing observations. Communications of the ACM (CACM) 28, 10 (1985), 1076–1085. 4
[KIML11] KERSTEN M. L., IDREOS S., MANEGOLD S., LIAROU E.: The researcher’s guide to the data deluge: Querying a scientific database in just a few seconds. PVLDB Challenges and Visions (2011). 3
[KLM*12] KONYHA Z., LEŽ A., MATKOVI´C K., JELOVI´C M., HAUSER H.: Interactive visual analysis of families of curves using data aggregation and derivation. In Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies (New York, NY, USA, 2012), i-KNOW
’12, ACM, pp. 24:1–24:8. 2
[KMS*08] KEIM D. A., MANSMANN F., SCHNEIDEWIND J., ZIEGLER H., THOMAS J.: Visual analytics: Scope and challenges. Visual Data Mining: Theory, Techniques and Tools for Visual Analytics, Springer, Lecture Notes In Computer Science (lncs) (2008). 6
[MTL78] MCGILL R., TUKEY J. W., LARSEN W. A.: Variations of box plots. The American Statistician 32, 1 (1978), 12–16. 5
[PKRJ10] POTTER K., KNISS J., RIESENFELD R., JOHNSON C. R.: Visualizing summary statistics and uncertainty. Computer Graphics Forum (Proceedings of Eurovis 2010) 29, 3 (2010), 823–831. 3
[PWB*09] POTTER K., WILSON A., BREMER P.-T., WILLIAMS D., DOUTRIAUX C., PASCUCCI V., JOHNSON C.: Ensemblevis: A framework for the statistical visualization of ensemble data. In Data Mining Workshops, 2009. ICDMW ’09. IEEE International Conference on (2009), pp. 233–240. 3
[TBL14] THURAU M., BUCK C., LUTHER W.: IPFViewer - A Visual Analysis System for Hierarchical Ensemble Data. International Conference on Information Visualization Theory and Applications, IVAPP 2014 (1014). 1, 2, 3, 5
[TC06] THOMAS J. J., COOK K. A.: A visual analytics agenda. Computer Graphics and Applications, IEEE 26, 1 (2006), 10–13. 2, 3
[Tuf90] TUFTE E.: Envisioning information. Graphics Press, Cheshire, CT, USA, 1990. 5
[WBWK00] WANG BALDONADO M. Q., WOODRUFF A., KUCHINSKY A.: Guidelines for using multiple views in information visualization. In Proceedings of the Working Conference on Advanced Visual Interfaces (New York, NY, USA, 2000), AVI ’00, ACM, pp. 110–119. 5
[WP09] WILSON A. T., POTTER K. C.: Toward visual analysis of ensemble data sets. In Proceedings of the 2009 Workshop on Ultrascale Visualization (New York, NY, USA, 2009), UltraVis ’09, ACM, pp. 48–53. 2, 3
[ZW07] ZHANG Q., WANG W.: A fast algorithm for approximate quantiles in high speed data streams. In Scientific and Statistical Database Management, 2007. SSBDM ’07. 19th International Conference on (July 2007), pp. 29–29. 7