Students’ ability in science: Results from a test development study

Student’s ability to use and manipulate scientific concepts has been widely explored; however there is still a need to define the characteristics and nature of science ability. Also, the tests and performance scales that require minimal conceptual knowledge to measure this ability are relatively less common. The aim of this study was to develop an objective measure of science ability of gifted middle school students. In order to assess this ability, Science Ability Test Battery was developed by the researchers. The test battery was divided into two sub scales containing; a multiple choice questions achievement test (Science Ability Test) and a performance assessment (Science Performance Test). The initial Science Ability Test consisted of 23 multiple choice items with one correct answer that required students to use science process skills and reasoning. In the study, stratified sampling was used. The test was administered to 280 middle school students in Turkey and the missing data from 26 students were excluded. In order to obtain a proof of content validity, the researchers elicited feedback from five experts in the field of science education and gifted education and necessary corrections were made in accordance of their views and suggestions. This study will be followed by another research to further analyse validity and reliability of the test. Keywords: science ability, talent, gifted education


Introduction
Society values the quality of science, technology, engineering and mathematics (STEM) education as a predictor of economic competitiveness and growth hence many countries emphasize science education for intellectual capital in order to keep pace in global markets. Another significant area that receives special attention is gifted education because the main purpose of gifted education is to foster intellectual development (Renzulli, 2012;Subotnik & Rickoff, 2010) and identifying and serving children who have the potential to be creative and talented in science, then providing them with appropriate provision is a necessity. Not many gifted students excel in all areas of human intellect. Clearly, some students display different talent levels at different times and in different areas (Renzulli, Siegle, Reis, Gavin & Sytsma, 2009). As a result, if the instruction method does not match with students' particular needs and interests, a student may not be able to reach her/his potential. Whether giftedness is domain specific or not has been long argued. While some researchers (Jensen, 1998) suggested giftedness is a single trait in the past, modern perspectives about giftedness embrace that giftedness is domain specific. Van Tassel-Baska (2005) defines giftedness as "the manifestation of general intelligence in a specific domain of human functioning at a level significantly beyond the norm such as to show promise for original contributions to a field of endeavor". Therefore, it is important to handle individuals with domain specific abilities as gifted rather than identifying them just with one criterion such as their intelligence quotient. Heller (1993) defined scientific giftedness as a scientific thinking potential or a special talent for excellence in the natural sciences. Innamorato (1998) defined giftedness in science with abilities in creativity, problem solving and manipulating data. Hoover and Feldhusen (1990) argued that formulating reasonable hypotheses is an important trait for giftedness in science. Gifted students in science are expected to show higher scientific reasoning ability and interest in science (Shim & Kim, 2003;Van Tassel-Baska, 2005). Taber (2007) suggested gifted students in science are able to reach high attainment levels in all or some aspects of normal school science curriculum and if given appropriate support they may be able to reach levels above than school's requirement in science related tasks; thus, gifted science learners demonstrate curiosity, leadership, high-level cognitive ability and metacognitive maturity. Moreover, scientifically gifted students are able to transfer knowledge in new situations and intuition plays a significant role in their learning of science concepts (Gilbert & Newberry, 2007;Ngoi & Vondracek, 2004).
According to Taber (2007) if we aim to identify the gifted in science, then we should determine their aptitude of learning from challenging science instruction, not just their high scores on existing tests. Hence, the gifted students in science are not necessarily high performers on formal tests or those who excel at recall of information thus they may not be high achievers in science (Watters& Diezmann, 2003) because they have special needs that require special provision. In underachievers' case, their talent is obscured by their inappropriate classroom behavior and they underperform in standardized tests therefore most of them do not have a chance to get accepted to gifted programs (Cooper, Baum & Neu, 2004). Even if they are accepted to a gifted program, they may not be able to receive educational attainments in line with their cognitive and social-emotional development. Taber (2007a) suggested that the desirable aims of educational activity themes for gifted are; higher level thinking, creativity, independence in learning, group work and inquiry skills. However, current studies imply that gifted students are not satisfied with their schooling in science. A study conducted by Cross and Coleman (1992) with gifted high school students, revealed that their major problem with science instruction was the slow pace of instruction and course content. One of the issues contributing to these problems could be the assessment and screening methods used in gifted programs. Gifted students' learning processes should be examined with appropriate assessment approaches (Van Tassel-Baska & Stambaugh, 2006). As a matter of fact, Tal and Miedijensky (2005) found that embedded assessment types contribute gifted students' learning. In their further research, they have found that their model of assessment in a science course, gifted student views reflected that it was indeed effective in creating a positive influence in their learning (Miedijensky & Tal, 2009). Thus, assessment types during the identification and screening of gifted is highly important for it could cause problems in learning processes and lack of interest.
Because of their curiosity and imagination, gifted children engage with science in the form of nature study from the early stages of development (Smutny & Von Fremd, 2004). However, gifted students' attitude and career interest in science is unexpectedly low (Andersen & Cross, 2014;Lubinski & Benbow, 1992). Therefore, more efforts should be made in order to identify and serve gifted students in science. Although there is a large quantity of research to identify general academic ability, domain specific identification research is relatively less common and practices vary. For example in USA, states require different criteria for identification however many of them require IQ testing (McClain & Pfeiffer, 2012). Many European countries use domain specific identification and motivation (D'Alessio, 2 0 0 9 ; M o nks & Pfluger, 2 0 0 5 ). T he Republ ic o f K o rea is a mo ng t he Asia-Pac ific c o unt ries t hat effectively use identification in STEM. The government supported action plans about gifted education suggests; teacher recommendations, IQ and creativity tests, special academic talent tests and personal interviews in identification protocol (KEDI, 2011;Cho, 2016). In spite of the presence of university based projects like Education Programs for Talented Students-UYEP (Sak, 2011), Science and Art Centers-BILSEM and special schools; science education for gifted students is still lacking in Turkey. Currently, there are problems with policies, identification, education materials, and teachers and so on. In addition, the term 'scientifically gifted' is an underexplored phenomenon. Consequently, there is a need to define the characteristics and nature of science ability with tests that require minimal conceptual knowledge to help identify and screen the gifted students who continue gifted programs. Therefore, the aim of this study is to develop an objective measure of science ability of gifted middle school students. In order to assess this ability, Science Ability Test Battery was developed by the researchers. The test battery was divided into two sub scales containing; a multiple choice questions achievement test (Science Ability Test) and a performance assessment (Science Performance Test). The initial Science Ability Test consisted of 23 multiple-choice items with one correct answer that required students to use science process skills and reasoning. In this study, we examine the results from the pilot administration of Science Ability Test only.
This study has sought to answer these questions: 1. Is science ability test a valid and reliable instrument to measure middle school students' ability in science related content?
2. Is the test effective in distinguishing non-gifted and gifted students in science?
3. Is science ability test an objective measure of ability without causing gender differences?
4. Do test scores differ with grade level?

Method
To provide a proof of validity and reliability of the newly developed test, this study was carried out in Science and Art Centers and middle schools.

Sample
Convenient sampling method was used while choosing the students for the study. Firstly the regular schools were divided into socio-economic level and achievement groups. Then, same number schools from poor, middle class and upper level districts were chosen. The researchers conducted the study only in selected schools and Science and Art Centers of the city. The study group is composed of 280 students enrolled in 6 schools in Amasya and Tokat who were chosen with stratified sampling method. 26 missing cases were excluded from the data set. Data from 16 5th grade, 52 6th grade, 149 7th grade and 46 8th grade students in total 254 students (58% female, 42% male) were gathered. Information about the study sample is represented in Table 1.

Data Analysis
Data analysis was done by using SPSS 17.0 and MS Excel software. Descriptive statistics, item analysis and t-test were conducted to establish validity of the scale. In order to support validity findings Mann-Whitney U tests were used. For testing reliability, KR20 internal consistency coefficient was calculated. Analyses are based on 95% confidence interval, and 5% significance level.

Science Ability Test Development Procedure
In order to identify and screen gifted students; Science Ability Test was developed by the researchers. The test development stages we initiated include; reviewing literature, defining the content area, creating item pool, gaining expert reviews, administrating the test and item analysis. Firstly, researchers reviewed the literature and defined content limitations and learning objectives in line with the research purposes. Tests which were related with scientific ability and used for identification purposes -Lawson Classroom Test of Scientific Reasoning (1978) and Iowa Assessment Science sub tests were reviewed for their style and grading criteria. Then researchers generated as many items as possible based on learning areas for gifted. In order to validate the test; item pool was reviewed by five gifted education and science education experts according to specific criteria. 30 multiple choice items were developed for the test however seven of them were excluded due to expert views and necessary corrections were made in accordance of their views and suggestions. Two of the items were based on Lawson's Classroom Test of Scientific Reasoning formal operations items that allowed abstract thinking because this kind of thinking is highly valued in scientific ability. Three cognitive domains of scientific thinking (Piekny & Maehler, 2013;Siegler & Liebert, 1975) which are making scientific explanations, evaluating and designing experiments and scientifically interpreting data were used to assess scientific ability. All of the items required scientific thinking. Items' distribution to attainment area was presented respectively in Table 2.

Reliability and Validity Analysis
First ly, t he dat a gat hered fro m t he pilo t administ rat io n o f Sc ienc e Abilit y T est was used t o determine the item difficulty, item discrimination and distractor analysis. Item difficulty, item discrimination indices and distractor analysis can be seen on Table 3. According to these results, it was decided that two items (9 and 19) should be excluded from the test because their item discrimination indices were lower than 0.20. The average item difficulty for the test was 0.55. Item-total item correlations were higher than 0.36 for all of the items (see Table 3). KR 20 internal consistency coefficient of the test was calculated as 0.85. The mean and standard deviation for all students on the 21 items were 11.74 and 5.14 respectively. .368*** 5,210*** 3 .568*** 11,052*** 4 .436*** 6,439*** 5 .605*** 12,923*** 6 .573*** 9,459*** 7 .421*** 6,585*** 8 .557*** 13,782*** 9 .442*** 7,409*** 10 .627*** 14,168*** 11 .363*** 5,721*** 12 .632*** 15,686*** 13 .494*** 9,397*** 14 .396*** 6,605*** 15 .496*** 9,656*** 16 .529*** 10,449*** 17 .510*** 9,210*** 18 .567*** 10,509*** 19 .556*** 10,040*** 20 .516*** 9,397*** 21 .546*** 10,884*** N=254 n 1 =n 2 =69 P<.001 Whether the test was effective in distinguishing non-gifted and gifted students was identified with Mann-Whitney U test (see Table 5). 190 non-gifted and 64 gifted middle school students' scientific ability scores were ranked then ability scores were compared with the analysis. According to Mann-Whitney U test results; it was found that the ability scores differentiated significantly between normal and gifted students (U=4794, 5; p<0.05). Sum of ranks was calculated as 120.74 and 147.57 for nongifted and gifted students respectively. Therefore Mann-Whitney U test results can be accepted as evidence supporting the validity of Science Ability Test. In order to investigate whether if there is any gender difference in Science Ability Test scores was analyzed with Mann Whitney U test. 105 male and 149 female middle school students' scientific ability scores were compared with the analysis (see Table 6). The test results indicated that there is no significant difference between both gender groups' science ability scores (U=7751; p>0.05). Sum of ranks was calculated as 128.18 and 127.02 for males and females respectively. Because ability has a developmental nature, it is expected that performance in Science Ability Test should be correlated with age or grade level. This hypothesis was investigated with multiple comparisons of Mann-Whitney U test and is verified in our study. The test results are presented in Table 7. The contents on the Table 6 demonstrate that there is a significant difference in scientific ability of middle school students across grades (c 2 (3)=14,648,p<0.05). Multiple comparisons show that these differences are between 5 th and 6 th , 5 th and 7 th , 5 th and 8 th and 7 th and 8 th grade students. Effect size (eta-square) was calculated as h2=0.06 .It could be said that the effect of grade level (age) on scientific ability is moderate (Cohen, 1988). Because relationship with grade level is a generally acknowledged trait of ability, this data supports the validity of Science Ability Test.

Discussion and Conclusions
The aim of this study was to develop a test to measure middle school students' ability in science. In the light of item analysis, two items were excluded from the test. The rest of the items have item difficulty and item discrimination indices within the acceptable range. Also, KR20 internal consistency coefficient and item-total correlations were consistent with statistical criteria.
Assessment practices can be effective to close the gap of disparities in boys and girls science experiences (Jovanovich & King, 1998). Sadker (1999) stated that although gifted girls are identified in equal or greater numbers they tend to drop out of gifted programs at rates greater than boys. Because girls and boys learn in different ways, it is important to offer appropriate opportunities and gender equitable learning environments within the coeducational setting (Kommer, 2006). In our study, Science Ability Test did not indicate any gender differences which is an important issue of ability testing and was effective in distinguishing normal students and gifted students who were identified before.
There is a significant difference in science ability scores across grades. Since ability is developmentally related with age, this finding is consistent with literature (Allaire & Marsiske, 1999;Riley, Greeno & Heller, 1984). However, it should be noted that the science ability scores did not differ between 6 th grade and 7 th grade while there was a difference in all of the other comparisons. A reason for this finding could be that not all students are able to use formal operations by the end of 6 th grade. Similar findings have been found in past research. Piaget (1972) has suggested that the acquisition of formal operations depends in part on educational/cultural factors which foster a particular aptitude for such thinking (as cited in Douglas and Wong, 1977). Kıncal and Yazgan (2010) studied the formal operational thinking of 7 th and 8 th grade students and found that 60,9% of students were in concrete level although most of them were 11 years and older. Cepni, Ozsevgec and Cerrah (2004) determined middle school students' cognitive development levels found majority of the students are at the concrete level. Bursal (2013) found that national science assessment grades of 7 th grade were lower than previous grades. Other national studies represent similar findings (EARGED, 2009). Comparative studies' results show that operations stages vary with individual differences across grades (Valanides & Markoulis, 2000). Additionally, Pienky and Maehler (2013) found that hypothesis generation, experimentation, and evidence evaluation cognitive components of domain general scientific reasoning emerge asynchronously which supports the gifted students' characteristics.
Further studies could examine the factors affecting the use of scientific reasoning of 6 th and 7 th grade students. In conclusion, findings from the analysis showed that Science Ability Test seems to be an objective measure of middle school students' ability in science. This test will be used with Science Performance Test in the test battery so the next step will be investigating the Science Ability Test Battery as whole.