Call Us: US - +1 845 478 5244 | UK - +44 20 7193 7850 | AUS - +61 2 8005 4826

Effective analysis;Confusing fact and opinion

Stability of results[edit]

It is important to obtain some indication about how generalizable the results are.[39] While this is often difficult to check, one can look at the stability of the results. Are the results reliable and reproducible? There are two main ways of doing that.

  • Cross-validation. By splitting the data into multiple parts, we can check if an analysis (like a fitted model) based on one part of the data generalizes to another part of the data as well. Cross-validation is generally inappropriate, though, if there are correlations within the data, e.g. with panel data. Hence other methods of validation sometimes need to be used. For more on this topic, see statistical model validation.
  • Sensitivity analysis. A procedure to study the behavior of a system or model when global parameters are (systematically) varied. One way to do that is via bootstrapping.

Free software for data analysis[edit]

Notable free software for data analysis include:

  • DevInfo – a database system endorsed by the United Nations Development Group for monitoring and analyzing human development.
  • ELKI – data mining framework in Java with data mining oriented visualization functions.
  • KNIME – the Konstanz Information Miner, a user friendly and comprehensive data analytics framework.
  • Orange – A visual programming tool featuring interactive data visualization and methods for statistical data analysis, data mining, and machine learning.
  • Pandas – Python library for data analysis
  • PAW – FORTRAN/C data analysis framework developed at CERN
  • R – a programming language and software environment for statistical computing and graphics.
  • ROOT – C++ data analysis framework developed at CERN
  • SciPy – Python library for data analysis