Data-mining analysis of media frame effects on social perception of schizophrenia renaming in Korea | BMC Psychiatry
Online news articles
Online news articles were in advance collected to explore media reports and frames that reflect the social perception of schizophrenia patients. We specifically targeted the words of the news articles reported by about 800 media companies, which were comprehensively provided by the Korea’s gigantic online searching engine ‘Naver.com’, during a study period ranging from January 1, 2005 to December 31, 2018. Naver.com is the Korean media brand with the highest domestic utilization rate of above 65%, and the average political inclination of users encompassing progressive, moderate, and conservative . As such, Naver.com’s comprehensive domestic usage and broad political spectrum collectively contribute to support its data resource’s validity and reliability to generalize the study results. The search terms were ‘Jungshinbunyeolbyung’ for mind split disorder and ‘Johyeonbyung’ for attunement disorder, and were automatically collected using Python. The dataset was divided into news articles dealing with ‘Jungshinbunyeolbyung’ as mind split disorder before and after the disease renaming (Jan 1, 2005 ~ Dec 31, 2010 and Jan 1, 2012 ~ Dec 31, 2018) and news articles dealing with ‘Johyeonbyung’ as attunement disorder after the disease renaming (Jan 1, 2012 ~ Dec 31, 2018) to investigate differences in the social perception between the disease names before and after the revision. Of the articles collected, overlapping articles between the different media reports were excluded from the analysis. As results, the total numbers of articles used in the analysis were 2,743 (mind split disorder before the renaming), 3,114 (mind split disorder after the renaming), and 3,068 (attunement disorder).
Number of patients who have admitted to psychiatric hospitals with a diagnosis code of schizophrenia
Data for each month from January 2010 to July 2018 on the number of people who have admitted to psychiatric wards with schizophrenia diagnosis (ICD-10 code F20) was collected from Korea’s healthcare big data system (http://opendata.hira.or.kr).
Media coverage of general crimes committed by schizophrenia patients
For media coverage, we only considered the three terrestrial TV networks in Korea (i.e., KBS, MBC, and SBS) mainly because those networks exert more significant influences on audiences than other types of media as suggested by prior studies. The number of news articles published by each TV network about general crimes committed by patients with schizophrenia each month between January 2010 and July 2018 was counted using the most popular news aggregator in Korea, which is Naver.com.
Latent dirichlet allocation topic modeling
We analyzed the social perception of the mind split disorder and attunement disorder before and after the schizophrenia renaming, by collecting online news articles related to schizophrenia and performing various text mining techniques. At First, the overall characteristics of online news articles were examined using Latent Dirichlet Allocation (LDA) topic modeling for a macroscopic language analysis. LDA is the most widely used topic modeling technique that calculates the probability distribution of infeasible terms at each of topic groups that are extracted from the article collections [18,19,20]. In this study, LDA topic modeling was performed to investigate the difference in the media topics related to the disease names before and after the renaming.
LDA topic modeling was performed on a dataset divided into news articles by period (before/after the revision of disease name) and by disease names (‘Jungshinbunyeolbyung’ for mind split disorder and ‘Johyeonbyung’ for attunement disorder). We tried to utilize the perflexity values to determine the analyzable number of topics, but the values decreased monotonically in all sections. Thus, in this study, we determined the analyzable number of topics as 30, which were found appropriate to interpret due to high similarity between major keywords in the same topic and low similarity between topics, after performing topic modeling by setting the number of topics to 10, 20, 30, and 40. The authors extracted 20 keywords per topic, and annotated each of the topics based on the association between keywords within the topics. To evaluate the classifications, two independent psychologists were invited to engage, and they qualitatively confirmed the reliability of the suggested terms representing the different media frames . The inter-investigator agreement scores in both loose and strict matches were calculated by dividing the number of consistently agreed topics by all the topics. To further validate the inter-investigator agreement scores, we additionally adopted Krippendorff’s alpha  to correct any potential biases arising from the redundancy of media frames and participating number of investigators. For analysis the Gensim function  in Python modules was used, and the LDA topic modeling results were visualized with pyLDAvis.
Term frequency-inverse document frequency weight analysis
For a microscopic language analysis, the relationship and contextual features between articles were examined using Term Frequency-Inverse Document Frequency (TF-IDF) weight model . The TF-IDF is a linguistic analysis approach to evaluate how important a word inside an article is for text mining. The larger the TF-IDF value, the more likely the word is to determine the topic of the article to which it belongs, thereby suggesting a measurement to extract key keywords . In the TF-IDF analysis process, TF-IDF values of the top five words per article were calculated for each of the dataset. Then, the words of varying significances were arranged with descending order based on TF-IDF values, for which the top 20 words and the bottom 3 words were compared.
Quantitative epidemiologic analysis
We investigated the effects of the media coverage for crimes committed by schizophrenia patients on the nationwide medical use patterns of patients with the disease, by analyzing epidemiological data under a linear regression model. In order to see the relationship between the number of news articles and the changes in the number of patients admitting to psychiatric wards, we used the following regression model:
t reflects the change rate of the average number of people who have admitted to psychiatric wards with a diagnosis of schizophrenia per day in month t over that in the preceding month. For this, the number of average daily patients was first calculated by dividing the number of monthly patients with the number of days in the month. Then, the Daily_patients_change_rate
t was calculated as follows:
t refers to the number of news articles published by the three TV networks about general crimes committed by schizophrenia patients in month t. We also included a variable representing such media coverage in the preceding month (i.e., in month t-1) due to the possible lagging effects of the media coverage. Regarding month_effects, since it is likely that the number of people admitting to psychiatric wards varies by month, the month effects were controlled for by incorporating month dummy variables into the regression model. Regarding year_effects, since it is possible that the value of the dependent variable increased over time, the year effects were controlled for by including year-related dummy variables. To further convince the significance of the findings from the regression model, we simultaneously controlled the month and year effects using the both dummy variables.