Negative-Worded Items Functioning as Method Artifacts in the Chemistry Identity Scale: Evidence from Exploratory, Confirmatory, and Bifactor Analyses

Authors

DOI:

https://doi.org/10.37251/isej.v7i3.2960

Keywords:

Chemistry Identity, Method Bias, Negative-Worded Items, Psychometric Modeling, Self-Report Measurement

Abstract

Purpose of the study: Chemistry identity is an important affective construct in science education because it is associated with learning engagement, academic persistence, and STEM career aspirations. This study aims to evaluate whether negatively worded items represent substantive dimensions of the construct or merely methodological artifacts.

Methodology: This study involved 300 senior high school students in Indonesia who completed the Chemistry Identity Scale, consisting of 27 items, including five negatively worded items. Data were analyzed using a comprehensive psychometric approach that incorporated exploratory factor analysis (EFA), confirmatory factor analysis (CFA), and bifactor modeling to distinguish substantive construct variance from method variance attributable to item wording.

Main Findings: The findings showed that negatively worded items tended to form a distinct cluster during the exploratory stage, indicating shared method variance. The best-fitting CFA model was the four-factor model with an additional negative wording method factor. Bifactor analysis revealed the dominance of a general chemistry identity factor; however, negatively worded items contributed minimally to the general construct, suggesting that these items function more as sources of method variance than as substantive indicators.

Novelty/Originality of this study: The novelty of this study lies in its comprehensive evaluation of wording effects in chemistry identity measurement through the integration of EFA, competitive CFA, and bifactor modeling. These findings have practical implications for educational instrument developers, highlighting the need for greater caution when using negatively worded items, as they may affect score interpretation and lead to less accurate evaluative decisions.

Author Biographies

  • Yuleks Juru Mudi, Yogyakarta State University

    Study Program of Educational Research and Evaluation, Yogyakarta State University, Yogyakarta, Indonesia

  • Aeda Kasrianti, Yogyakarta State University

    Study Program of Educational Research and Evaluation, Yogyakarta State University, Yogyakarta, Indonesia

  • Sefthy P B Syahailatua, Yogyakarta State University

    Study Program of Educational Research and Evaluation, Yogyakarta State University, Yogyakarta, Indonesia

  • Nurul Isnaini, Yogyakarta State University

    Study Program of Educational Research and Evaluation, Yogyakarta State University, Yogyakarta, Indonesia

  • Balthasar Eba, Yogyakarta State University

    Study Program of Educational Research and Evaluation, Yogyakarta State University, Yogyakarta, Indonesia

References

[1] X. Guo, W. Deng, K. Hu, W. Lei, S. Xiang, and W. Hu, “The effect of metacognition on students’ chemistry identity: the chain mediating role of chemistry learning burnout and chemistry learning flow,” Chem. Educ. Res. Pract., vol. 23, no. 2, pp. 408–421, 2022, doi: 10.1039/D1RP00342A. DOI: https://doi.org/10.1039/D1RP00342A

[2] K. N. Hosbein and J. Barbera, “Development and evaluation of novel science and chemistry identity measures,” Chem. Educ. Res. Pract., vol. 21, no. 3, pp. 852–877, 2020, doi: 10.1039/C9RP00223E. DOI: https://doi.org/10.1039/C9RP00223E

[3] Z. Jiang, B. Wei, S. Chen, and L. Tan, “Examining the formation of high school students’ science identity,” Sci. Educ., vol. 33, no. 1, pp. 135–157, Feb. 2024, doi: 10.1007/s11191-022-00388-2. DOI: https://doi.org/10.1007/s11191-022-00388-2

[4] Z. Hazari, G. Sonnert, P. M. Sadler, and M.-C. Shanahan, “Connecting high school physics experiences, outcome expectations, physics identity, and physics career choice: A gender study,” J. Res. Sci. Teach., vol. 47, no. 8, pp. 978–1003, 2010, doi: 10.1002/tea.20363. DOI: https://doi.org/10.1002/tea.20363

[5] V. B. Arias and B. Arias, “The negative wording factor of Core Self-Evaluations Scale (CSES): Methodological artifact, or substantive speci fi c variance ?,” Pers. Individ. Dif., vol. 109, pp. 28–34, 2017, doi: 10.1016/j.paid.2016.12.038. DOI: https://doi.org/10.1016/j.paid.2016.12.038

[6] H. C. Bulut and O. Bulut, “Item wording effects in self-report measures and reading achievement: Does removing careless respondents help?,” Stud. Educ. Eval., vol. 72, pp. 101126, 2022, doi: 10.1016/j.stueduc.2022.101126. DOI: https://doi.org/10.1016/j.stueduc.2022.101126

[7] M. İlhan, N. Güler, G. T. Teker, and Ö. Ergenekon, “The effects of reverse items on psychometric properties and respondents’ scale scores according to different item reversal strategies,” Int. J. Assess. Tools Educ., vol. 11, no. 1, pp. 20–38, 2024, doi: 10.21449/ijate.1345549. DOI: https://doi.org/10.21449/ijate.1345549

[8] C. Tang, B. Yang, and H. Tian, “Examination of the wording effect in the new ecological paradigm scale in China: a bi-factor modeling approach,” Curr. Psychol., vol. 43, no. 7, pp. 5887–5900, 2024, doi: 10.1007/s12144-023-04801-z. DOI: https://doi.org/10.1007/s12144-023-04801-z

[9] J. García-Fernández, Á. Postigo, M. Cuesta, C. González-Nuevo, Á. Menéndez-Aller, and E. García-Cueto, “To be Direct or not: Reversing likert response format items,” Span. J. Psychol., vol. 25, p. e24, Oct. 2022, doi: 10.1017/SJP.2022.20. DOI: https://doi.org/10.1017/SJP.2022.20

[10] F. A. Setiawati, S. R. Nurhayati, R. N. Amelia, and A. A. Darojat, “Study on the threats of reverse-worded items to the psychometric properties of the marital quality scale, The Open Psychology Journal, vol. 15, no. 1, pp. 1–8, 2022, doi: 10.2174/18743501-v15-e2208150. DOI: https://doi.org/10.2174/18743501-v15-e2208150

[11] C. C. Koutsogiorgi and M. P. Michaelides, “Response tendencies due to item wording using eye-tracking methodology accounting for individual differences and item characteristics,” Behav. Res. Methods, vol. 54, no. 5, pp. 2252–2270, 2022, doi: 10.3758/s13428-021-01719-x. DOI: https://doi.org/10.3758/s13428-021-01719-x

[12] D. Elek, H. Cígler, D. J. Grüning, and S. Ježek, “Advancing the psychometrics of reverse-keyed items: enriching cognitive theory by a logical and linguistic perspective,” Front. Psychol., vol. 16, 2025, doi: 10.3389/fpsyg.2025.1684612. DOI: https://doi.org/10.3389/fpsyg.2025.1684612

[13] F. Antoniou and M. H. Alghamdi, “Confidence in mathematics is confounded by responses to reverse-coded items,” Front. Psychol., vol. 15, 2024, doi: 10.3389/fpsyg.2024.1489054. DOI: https://doi.org/10.3389/fpsyg.2024.1489054

[14] S. Chen and B. Wei, “Development and validation of an instrument to measure high school students’ science identity in science learning,” Research in Science Education, vol. 52, no. 11, pp. 111-126, 2020, doi: 10.1007/s11165-020-09932-y. DOI: https://doi.org/10.1007/s11165-020-09932-y

[15] L. Avraamidou, “Science identity as a landscape of becoming: rethinking recognition and emotions through an intersectionality lens,” Cult. Stud. Sci. Educ., vol. 15, no. 2, pp. 323–345, 2020, doi: 10.1007/s11422-019-09954-7. DOI: https://doi.org/10.1007/s11422-019-09954-7

[16] A. Venta et al., “Reverse-Coded items do not work in Spanish: Data from four samples using established measures,” Front. Psychol., vol. 13, 2022, doi: 10.3389/fpsyg.2022.828037. DOI: https://doi.org/10.3389/fpsyg.2022.828037

[17] B. Zeng, M. Jeon, and H. Wen, “How does item wording affect participants’ responses in Likert scale? Evidence from IRT analysis,” Front. Psychol., vol. 15, 2024, doi: 10.3389/fpsyg.2024.1304870. DOI: https://doi.org/10.3389/fpsyg.2024.1304870

[18] R. Komperda, K. N. Hosbein, and J. Barbera, “Evaluation of the influence of wording changes and course type on motivation instrument functioning in chemistry,” Chem. Educ. Res. Pract., vol. 19, no. 1, pp. 184–198, 2017, doi: 10.1039/C7RP00181A. DOI: https://doi.org/10.1039/C7RP00181A

[19] A. Rodriguez, S. P. Reise, and M. G. Haviland, “Evaluating bifactor models: Calculating and interpreting statistical indices.,” Psychol. Methods, vol. 21, no. 2, pp. 137–150, 2016, doi: 10.1037/met0000045. DOI: https://doi.org/10.1037/met0000045

[20] M. Prokofieva, D. Zarate, A. Parker, O. Palikara, and V. Stavropoulos, “Exploratory structural equation modeling: a streamlined step by step approach using the R Project software,” BMC Psychiatry, vol. 23, no. 1, p. 546, 2023, doi: 10.1186/s12888-023-05028-9. DOI: https://doi.org/10.1186/s12888-023-05028-9

[21] V. Swami, C. Maïano, and A. J. S. Morin, “A guide to exploratory structural equation modeling (ESEM) and bifactor-ESEM in body image research,” Body Image, vol. 47, pp. 101641, 2023, doi: 10.1016/j.bodyim.2023.101641. DOI: https://doi.org/10.1016/j.bodyim.2023.101641

[22] J. Koran, “Indicators per factor in confirmatory factor analysis: more is not always better,” Struct. Equ. Model. A Multidiscip. J., vol. 27, no. 5, pp. 765–772, 2020, doi: 10.1080/10705511.2019.1706527. DOI: https://doi.org/10.1080/10705511.2019.1706527

[23] T. A. Kyriazos, “Applied psychometrics: Sample size and sample power considerations in factor analysis (EFA, CFA) and SEM in general,” Psychology, vol. 09, no. 08, pp. 2207–2230, 2018, doi: 10.4236/psych.2018.98126. DOI: https://doi.org/10.4236/psych.2018.98126

[24] S. Liu, S. Xu, Q. Li, H. Xiao, and S. Zhou, “Development and validation of an instrument to assess students ’ science , technology , engineering , and mathematics identity,” Phys. Rev. Phys. Educ. Res., vol. 19, no. 1, p. 10138, 2023, doi: 10.1103/PhysRevPhysEducRes.19.010138. DOI: https://doi.org/10.1103/PhysRevPhysEducRes.19.010138

[25] J. Suárez-Álvarez, I. Pedrosa, L. Lozano, E. García-Cueto, M. Cuesta, and J. Muñiz, “Using reversed items in Likert scales: A questionable practice,” Psicothema, vol. 2, no. 30, pp. 149–158, May 2018, doi: 10.7334/psicothema2018.33. DOI: https://doi.org/10.7334/psicothema2018.33

[26] N. Menold, “How Do Reverse-keyed Items in Inventories Affect Measurement Quality and Information Processing?,” Field methods, vol. 32, no. 2, pp. 140–158, May 2020, doi: 10.1177/1525822X19890827. DOI: https://doi.org/10.1177/1525822X19890827

[27] F. Kiwanuka, J. Kopra, N. Sak-Dankosky, R. C. Nanyonga, and T. Kvist, “Polychoric Correlation With Ordinal Data in Nursing Research,” Nurs. Res., vol. 71, no. 6, pp. 469–476, Nov. 2022, doi: 10.1097/NNR.0000000000000614. DOI: https://doi.org/10.1097/NNR.0000000000000614

[28] S. Lim and S. Jahng, “Determining the number of factors using parallel analysis and its recent variants.,” Psychol. Methods, vol. 24, no. 4, pp. 452–467, 2019, doi: 10.1037/met0000230. DOI: https://doi.org/10.1037/met0000230

[29] C. J. Gaskin and B. Happell, “On exploratory factor analysis: A review of recent evidence, an assessment of current practice, and recommendations for future use,” Int. J. Nurs. Stud., vol. 51, no. 3, pp. 511–521, 2014, doi: 10.1016/j.ijnurstu.2013.10.005. DOI: https://doi.org/10.1016/j.ijnurstu.2013.10.005

[30] J. W. Osborne, “What is rotating in exploratory factor analysis?,” Pract. Assessment, Res. Eval., vol. 20, no. 2, pp. 1–7, 2015, doi: 10.7275/hb2g-m060.

[31] P. Rogers, “Best practices for your confirmatory factor analysis: A JASP and lavaan tutorial,” Behav. Res. Methods, vol. 56, no. 7, pp. 6634–6654, 2024, doi: 10.3758/s13428-024-02375-7. DOI: https://doi.org/10.3758/s13428-024-02375-7

[32] D. Shi, C. DiStefano, A. Maydeu-Olivares, and T. Lee, “Evaluating SEM model fit with small degrees of freedom,” Multivariate Behav. Res., vol. 57, no. 2–3, pp. 179–207, 2022, doi: 10.1080/00273171.2020.1868965. DOI: https://doi.org/10.1080/00273171.2020.1868965

[33] D. Shi and A. Maydeu-Olivares, “The effect of estimation methods on SEM fit indices,” Educ. Psychol. Meas., vol. 80, no. 3, pp. 421–445, 2020, doi: 10.1177/0013164419885164. DOI: https://doi.org/10.1177/0013164419885164

[34] S. P. Reise, W. Bonifay, and M. G. Haviland, “Bifactor modelling and the evaluation of scale scores,” The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development, pp. 675–707, 2018, doi: 10.1002/9781118489772.ch22. DOI: https://doi.org/10.1002/9781118489772.ch22

[35] K. S. Taber, “The use of cronbach’s alpha when developing and reporting research instruments in science education,” Res. Sci. Educ., vol. 48, no. 6, pp. 1273–1296, 2018, doi: 10.1007/s11165-016-9602-2. DOI: https://doi.org/10.1007/s11165-016-9602-2

[36] J. Wang, X. Xin, Y. Huo, Y. Li, Y. Han, and F. Kong, “Bifactor modelling, reliability, and validity of the material values scale in Chinese youth,” Psychol. Rep., vol. 127, no. 1, pp. 465–484, 2024, doi: 10.1177/00332941221114407. DOI: https://doi.org/10.1177/00332941221114407

[37] M. S. Bartlett, “A note on the multiplying factors for various χ2 approximations,” J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 16, no. 2, pp. 296–298, 1954, doi: 10.1111/j.2517-6161.1954.tb00174.x. DOI: https://doi.org/10.1111/j.2517-6161.1954.tb00174.x

[38] H. F. Kaiser, “An index of factorial simplicity,” Psychometrika, vol. 39, no. 1. 1974. doi: 10.1007/BF02291575. DOI: https://doi.org/10.1007/BF02291575

[39] M. W. Watkins, “Exploratory factor analysis: A guide to best practice,” J. Black Psychol., vol. 44, no. 3, pp. 219–246, 2018, doi: 10.1177/0095798418771807. DOI: https://doi.org/10.1177/0095798418771807

[40] T. Zhang, C. Yin, Y. Geng, Y. Zhou, S. Sun, and F. Tang, “Development and validation of psychological contract scale for hospital pharmacists,” J. Multidiscip. Healthc., vol. 13, pp. 1433–1442, 2020, doi: 10.2147/JMDH.S270030. DOI: https://doi.org/10.2147/JMDH.S270030

[41] C.-H. Li, “Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares,” Behav. Res. Methods, vol. 48, no. 3, pp. 936–949, 2016, doi: 10.3758/s13428-015-0619-7. DOI: https://doi.org/10.3758/s13428-015-0619-7

[42] J. Revuelta, C. Ximénez, and N. Minaya, “Overfactoring in rating scale data: A comparison between factor analysis and item response theory,” Front. Psychol., vol. 13, 2022, doi: 10.3389/fpsyg.2022.982137. DOI: https://doi.org/10.3389/fpsyg.2022.982137

[43] P. J. Ferrando and U. Lorenzo-Seva, “Assessing the quality and appropriateness of factor solutions and factor score estimates in exploratory item factor analysis,” Educ. Psychol. Meas., vol. 78, no. 5, pp. 762–780, 2018, doi: 10.1177/0013164417719308. DOI: https://doi.org/10.1177/0013164417719308

[44] W. R. da Silva, G. S. Donofre, A. N. Neves, J. Marôco, P. A. Teixeira, and J. A. D. B. Campos, “Investigating method effects associated with the wording direction of items of the social physique anxiety scale,” Eat. Weight Disord. - Stud. Anorexia, Bulim. Obes., vol. 27, no. 7, pp. 2857–2867, 2022, doi: 10.1007/s40519-022-01439-x. DOI: https://doi.org/10.1007/s40519-022-01439-x

[45] S. Savahl, F. Casas, and S. Adams, “Considering a bifactor model of children’s subjective well-being using a multinational sample,” Child Indic. Res., vol. 16, no. 6, pp. 2253–2278, 2023, doi: 10.1007/s12187-023-10058-6. DOI: https://doi.org/10.1007/s12187-023-10058-6

[46] C. C. S. Kam, “Why do regular and reversed items load on separate factors? response difficulty vs. item extremity,” Educ. Psychol. Meas., vol. 83, no. 6, pp. 1085–1112, 2023, doi: 10.1177/00131644221143972. DOI: https://doi.org/10.1177/00131644221143972

[47] M. Fokkema and S. Greiff, “How performing PCA and CFA on the same data equals trouble,” Eur. J. Psychol. Assess., vol. 33, no. 6, pp. 399–402, Nov. 2017, doi: 10.1027/1015-5759/a000460. DOI: https://doi.org/10.1027/1015-5759/a000460

[48] I. Etikan, “Comparison of convenience sampling and purposive sampling,” Am. J. Theor. Appl. Stat., vol. 5, no. 1, p. 1, 2016, doi: 10.11648/j.ajtas.20160501.11. DOI: https://doi.org/10.11648/j.ajtas.20160501.11

[49] G. D. Valenti, R. Bottaro, and P. Faraci, “Assessing the two sources of construct-relevant psychometric multidimensionality of the nomophobia questionnaire: The integrated framework of bifactor exploratory structural equation modeling,” Eval. Health Prof., vol. 47, no. 1, pp. 52–65, 2024, doi: 10.1177/01632787231203380. DOI: https://doi.org/10.1177/01632787231203380

[50] R. E. Davis, S. Lee, T. P. Johnson, W. Yu, L. I. Reyes, and J. F. Thrasher, “Individual-level cultural factors and use of survey response styles among latino survey respondents,” Hisp. J. Behav. Sci., vol. 44, no. 3, pp. 216–242, 2023, doi: 10.1177/07399863231183023. DOI: https://doi.org/10.1177/07399863231183023

[51] A. Alamer, “Exploratory structural equation modeling (ESEM) and bifactor ESEM for construct validation purposes: Guidelines and applied example,” Res. Methods Appl. Linguist., vol. 1, no. 1, pp. 100005, 2022, doi: 10.1016/j.rmal.2022.100005. DOI: https://doi.org/10.1016/j.rmal.2022.100005

[52] D. Bolt, Y. C. Wang, R. H. Meyer, and L. Pier, “An IRT mixture model for rating scale confusion associated with negatively worded items in measures of social-emotional learning,” Appl. Meas. Educ., vol. 33, no. 4, pp. 331–348, 2020, doi: 10.1080/08957347.2020.1789140. DOI: https://doi.org/10.1080/08957347.2020.1789140

Downloads

Published

2026-05-30

Issue

Section

Articles

How to Cite

[1]
“Negative-Worded Items Functioning as Method Artifacts in the Chemistry Identity Scale: Evidence from Exploratory, Confirmatory, and Bifactor Analyses”, In. Sci. Ed. J, vol. 7, no. 3, pp. 459–469, May 2026, doi: 10.37251/isej.v7i3.2960.

Most read articles by the same author(s)