Call Us: US - +1 845 478 5244 | UK - +44 20 7193 7850 | AUS - +61 2 8005 4826

the Computing Machinery

 Any data that conveys a meaningful message becomes information. On a high level, unprocessed data takes the following forms to translate into exact message: noisy data; relevant and irrelevant data, filtered data; only relevant data, information; data that conveys a vague message, knowledge; data that conveys a precise message, wisdom; data that conveys exact message and reason behind it. To derive wisdom from an unprocessed data, we need to start processing it, refine the dataset by including data that we want to focus on, and organize data to identify information. In the context of social media analytics, data identification means “what” content are we interested in, in addition to the text of content, we want to know: who wrote the text? Where was it found or on which social media venue did it appear? Are we interested in information from a specific locale? When did someone say something in social media?[5]

Attributes of data that need to be considered are as follows:

  • StructureStructured data is a data that has been organized into a formatted repository – typically a database – so that its elements can be made addressable for more effective processing and analysis. The unstructured data, unlike structured data, is the least formatted data.[6]
  • Language: Language becomes significant if we want to know the sentiment of a post rather than number of mentions.
  • Region: It is important to ensure that the data included in the analysis is only from that region of the world where the analysis is focused on. For example, if the goal is to identify the clean water problems in India, we would want to make sure that the data collected is from India only.