Call Us: US - +1 845 478 5244 | UK - +44 20 7193 7850 | AUS - +61 2 8005 4826

Internet Social Networking & Virtual Communities,

 the process of identifying the subsets of available data to focus on for analysis. The data by itself is useless unless it’s interpreted, once we start analyzing the data it begins to become useful as it conveys a message. Any data that conveys a meaningful message becomes information. On a high level, unprocessed data takes the following forms to translate into exact message: noisy data; relevant and irrelevant data, filtered data; only relevant data, information; data that conveys a vague message, knowledge; data that conveys a precise message, wisdom; data that conveys exact message and reason behind it. To derive wisdom from an unprocessed data, we need to start processing it, refine the dataset by including data that we want to focus on, and organize data to identify information. In the context of social media analytics, data identification means “what” content are we interested in, in addition to the text of content, we want to know: who wrote the text? Where was it found or on which social media venue did it appear? Are we interested in information from a specific locale? When did someone say something in social media

Attributes of data that need to be considered are as follows:

  • StructureStructured data is a data that has been organized into a formatted repository – typically a database – so that its elements can be made addressable for more effective processing and analysis. The unstructured data, unlike structured data, is the least formatted data.[6]
  • Language: Language becomes significant if we want to know the sentiment of a post rather than number of mentions.
  • Region: It is important to ensure that the data included in the analysis is only from that region of the world where the analysis is focused on. For example, if the goal is to identify the clean water problems in India, we would want to make sure that the data collected is from India only.
  • Type of Content: The content of data could be, Text; written text that is easy to read and understand if you know the language, Photos; drawings, simple sketches, or photographs, Audio; audio recordings of books, articles, talks, or discussions, Videos; recording, live streams.
  • Venue: The social media content is getting generated in a variety of venues such as news sites, social networking sites (e.g. FacebookTwitter). Depending on the type of project the data is collected for, the venue becomes very significant.
  • Time: It is important to collect data is posted in the time frame that is being analyzed.
  • Ownership of Data: Is the data private or publicly available? Is there any copyright in the data? These are the important questions to be addressed before collecting data.