COMPARISON OF POISSON, ZIP, ZINB, HURDLE AND ZIGP REGRESSION ANALYSIS METHODS IN SCHOOL-AGED SMOKING CASE MODELING IN KUDUS DISTRICT, CENTRAL JAVA

ABSTRACT


INTRODUCTION
Smoking behavior has penetrated among teenagers and even children, especially boys.The average age of adolescents under 18 years is a senior high school student and below.Almost all schools apply disciplinary punishment to their students who are caught smoking.However, this punishment did not make them stop trying to smoke, they did it in secret.ResultsGlobal Youth Tobacco Survey (GYTS) in Indonesia in 2019 stated that 19.2 percent of students aged 13-15 years were found to be smoking and 22.6 percent of students aged 16-17 years were found to be smoking (Global Youth Tobacco Survey, 2020).According to the 2018 Riskesdas results, the prevalence of smoking aged 10-18 years was 9.1 percent, an increase compared to previous years (Health Research and Development Agency, 2018).BPS data shows that in 2018 there was an increase in the percentage of smoking among adolescents aged 10-18 years, but after that period it showed a decrease.The development of the percentage of adolescent smoking by age group can be seen in Figure 1.

Figure 1. Percentage of Smoking in Population Age ≤ 18 Years
Source: Susenas KOR BPS (BPS, 2020) (processed) Cigarettes contain substances that are harmful in addition to damaging health as well as damaging the morale of teenagers.Smoking behavior among adolescents is a threat to the future of the nation.The government seriously responds to these problems through government regulations.The use of cigarettes has been regulated in RI Government Regulation Number 109 of 2012.Cigarette product packaging must include "prohibited from selling or giving to children under 18 years of age and pregnant women".Decline in the prevalence of smoking in children and adolescents is also one of the targets of the RPJMN, in 2024 it is targeted that the prevalence of smoking at the age of 10-18 years is 8.7 percent ofbaseline 9.1 percent.Several developed countries such as the United States, California and Singapore have raised the minimum age limit for buying cigarettes to 21 years.By increasing the minimum limit, it is hoped that it will reduce young smokers.
One of the causes of smoking in adolescence is the desire to experiment.Adolescence is a period of searching for identity psychologically unstable.At this time the influence of peers is very dominant.For male adolescents smoking is a symbol of maturity, strength, leadership and attractiveness to the opposite sex (Bringham, 1991 in Komasari andHelmi, 2000).Peers and family are the parties that cause adolescents to smoke.Peers who smoke will attract others to join in smoking for reasons of cohesiveness.The family is an example or role model for children.If there is someone in the family who smokes, the child will follow the example.McGee's research in 2015 stated that parents, siblings and peers who smoke are significant risk factors for adolescents smoking (McGee et al., 2015).Basically smoking behavior comes from within oneself (internal) and environmental factors (external).Internal factors can be in the form of physical and mental conditions of adolescents.
This study will model the number of cigarettes consumed by male adolescents.Factors that influence adolescent smoking are grouped into internal and external factors.In this study the internal factors to be examined consisted of age, level of education completed, and activities last week.While the external factors are the area of residence and the presence of parents who smoke.In adolescence, as you get older, the prevalence of smoking also increases.BPS data for 2020 shows that the percentage of the population aged 10-12 years who smoke is 0.13 percent, the population aged 13-15 years is 1.64 percent, and the population aged 16-18 years is 10.07 percent.Previous research conducted by Utami (2020) also showed that the addition of teenage age has a significant effect on smoking behavior.There is a tendency that school level influences smoking behavior.The percentage of high school level adolescents who smoke is more than that of junior high school and even elementary school.The past week's activities were grouped into work, school and other in addition to these two.This grouping aims to capture working youth.Adolescents who have worked tend to smoke higher than adolescents who are still at school.Area of residence is considered as an environmental factor that influences smoking behavior, where the urban environment is worse than the rural environment.Meanwhile, the existence of a smoking family becomes a role model for adolescent smoking behavior.Smoking has a significant effect on smoking behavior in adolescents.Kudus Regency is a district that is the largest cigarette producer in Indonesia, so it was used as a research locus.
The incidence of smoking at school age is a rare event and the number of cigarette consumption data in sticks is the datacount, so that it can be modeled by Poisson regression.An important characteristic of the Poisson distribution is that it has the same mean and variance (equidsipersion).In practice these conditions are rarely met, the variance is often greater than the average or calledoverdispersion.The implication of not fulfilling equidisperson is that Poisson regression will produce biased parameter estimates.One way to overcome this is with a generalized Poisson model.Zeileis et al (2008) stated that one way to overcome overdispersion is with a negative binomial model, which can appear as a Poisson gamma mixture distribution.Through the negative binomial model approach, the variance may not equal the mean.Previously Consul and Shoukri submitted a modelGeneralized Poisson to overcome the problem of overdispersion (Shoukri, 1984).
Because the portion of adolescents who do not smoke is greater, the number of cigarettes consumed is mostly 0 so that it can be saidexcess zero.models withexcess zero can be overcome by the Zero-Inflated model (Zeileis et al., 2008).Zero-inflated The model can be applied to count data models such as Poisson, Negative Binomial and Geometric.Zero-inflated earlier models introduced by (Mullahy, 1986) and (Lambert, 1992).Poisson model with overdispersion conditions andexcess zero can be solved withZero Inflated Generalized Poisson (ZIGP) andZero Inflated Negative Binomial (ZINB).Another alternative can use the hurdle model to overcome these two conditions.The Hurdle model was first introduced by Cragg (1971).Hurdle models differ fromzero-inflated models that contain componentsmixture, zero andnon zero.The Hurdle model combines the left-truncated count data model with the zero hardle model (Zeileis et al., 2008).This study will look at comparisons between

METHODS Model Regresi Poisson
The Poisson regression model is a regression model in which the response variable has a Poisson distribution and is in the form of discrete data (Cahyandari, 2014).Example ~() for probability density function for  is:

Regression ModelsGeneralized Poisson (GPR)
Generalized Poisson Regression is a Poisson regression model which assumes the random component is a Generalized Poisson distribution or it can be interpreted that the GPR is a Poisson regression model without assumingequidisperse (Ismail & Jemain, 2007).For example   ~(  , );  = 1,2,3, ⋯ ,  for probability density function to   is: with   =      ;    is the predictor variable vector and  is the regression parameter vector.So the model for GPR is:

Regression ModelsZero Inflated Poisson (ZIP)
Regression modelsZero Inflated Poisson (ZIP) is used to perform analysis on observations that are mostly zero.Example   independent random variable ZIP distribution, the zero value is thought to appear becausezero state with The model introduced by Famoye and Singh (2003) was later developed by Czado (2007) to become the ZIGP regression model ( ,,    ) which has three parameters, the parameter mean (m), parameter overdispersion (  ) and parameterszero-inflation (  ) (Czado et al., 2007).Then each parameter in the ZIGP distribution has a linear relationship to the ZIGP regression parameters in the following equation:

Model Hurdle
Model Hurdle is a model introduced by John G. Cragg in which a random variable is modeled using two parts, namely those containing zero probabilities and non-zero values (non zero) or can be written as: ( = 0) =  ( ≠ 0) =  ≠0 () dimana  ≠0 () is a functiontruncated probability distribution at 0. Valuenon zero modeled using the Normal model and zero values modeled with the Probit model (Cragg, 1971).Next modelsHurdle developed forcount data with Poisson, Geometric, and Negative Binomial models fornon-zero count (Mullahy, 1986).

Overdispersion Test
The poisson regression model requiresequidisperse, namely the condition in which the mean and variance of the response variable are the same.However, sometimes phenomena occuroverdispersionin data modeled by Poisson distribution.Conditionoverdispersionis a condition where the avariance is greater than the mean which indicates that the model is not suitable for the data.The procedure to be carried out for the test is as follows.

Formulation of the hypothesis
H0 : There is no overdispersion H1 : There is overdispersion 2. Test Statistic The significance level is α, then reject H0 if || >  /2 4. Conclusion IF H0 is Rejected, then there is overdispersion in the model, so it can be said that the model is not appropriate (Cahyandari, 2014).

Uji Vuong
Vuong's test aims to find out which model is better.For example(yi|xi) is the probability predicted from the fourth observationi, so it can be defined as follows (Vuong, 1989).Reject H0 if value V > Zα or value V < p-value, which shows that model 1 is better than model 2.

Data Sources and Variables Data Sources and Variables
Data sourced from the 2020 National Socioeconomic Survey (Susenas) KOR BPS Kudus Regency.The research sample is a subset of the Susenas KOR sample for the Regency Kudus, with criteria of age 13-18 years and male gender.The variables used are summarized in Table 1

Data Expolorer
The number of research respondents was 114 people, with smoking conditions as many as 16 people and 98 people who did not smoke.Respondents aged 13-15 years were 57 people, and those aged 16-18 years were also 57 people.Of the respondents who smoked, there was 1 person aged 13-15 years, and the rest aged 16-18 years.The amount of cigarette consumption by age of the respondents can be seen in Table 2.At the age of adolescents, the higher the education level, the more cigarette consumption is suspected.Respondents with the highest elementary school diploma were 47 people, 58 people with the highest junior high school diploma, and 9 people with the highest high school diploma.Respondents with the highest elementary school diploma means that if they do not drop out of school, the respondent is at the junior high school level.Respondents with the highest junior high school diploma, meaning that if they did not drop out of school, the respondent was at the high school level.Respondents with the highest high school diploma mean that they have completed high school education.From the research sample, it was found that there were 4 smokers with the highest elementary school diploma, 7 smokers with the highest junior high school diploma, and 6 smokers with the highest high school diploma.The amount of cigarette consumption according to the highest level of education completed can be seen in Table 3 Respondents with activities a week ago were working as many as 15 people, activities a week ago going to school were 89 people, and other activities besides work and school were 10 people.Respondents who worked and smoked were 9 people, who attended school and smoked were 2 people, and in other activities (not working & not going to school) and smoking were 5 people.The amount of cigarette consumption according to activity a week ago can be seen in Table 4. Living environment is thought to influence smoking behavior in adolescents.There were 18 respondents who lived in villages and 96 people who lived in cities.From the research sample obtained 5 people who live in villages and smoke, and 11 people who live in cities and smoke.The amount of cigarette consumption by residence status can be seen in Table 5.The condition of parents who smoke is considered as one of the triggers for adolescents to smoke.The research sample showed that 63 people with parental conditions did not smoke, and 51 people with smoking parents.From the group whose parents did not smoke, 5 of them were teenagers who smoked.Meanwhile, from the group whose parents smoked, 11 of them also smoked.The amount of cigarette consumption according to the condition of the parents can be seen in Table 6.