Я рассчитываю средний уровень занятости для разных групп с 1995 по 2015 год. А затем вычисляю разницу средних уровней занятости между группами.

Это следует заказывать ежегодно.

Большую часть времени я пытался использовать функцию суммирования в dplyr, но безуспешно.

Код ниже - это то, что я настроил.

diff_in_diff <- Cps_total %>% 
  filter(age >= 19 & age <= 44) %>% 
  mutate(women_and_black_men = ifelse(female == 1 & marstat != 1 & nfchild == 0, "Single without children",
                                 ifelse(female == 1 & marstat != 1 & nfchild > 0, "Single with children",
                                    ifelse(female == 1 & marstat == 1 & nfchild == 0, "Married without children",
                                       ifelse(female == 1 & marstat == 1 & nfchild > 0, "Married with children",
                                          ifelse(female == 0 & wbhao == 2, "Black Men", "Otherwise Men"))))))


diff_in_diff_2 <- diff_in_diff %>% 
  filter(!is.na(empl)) %>% 
  group_by(year, women_and_black_men) %>% 
  summarize(mean_empl=mean(empl))
year |  women_and_black_men      |      mean_empl

1995 |  Black Men                |      0.8772406       
1995 |  Married with children    |      0.6810999       
1995 |  Married without children |      0.8227718       
1995 |  Otherwise Men            |      0.9048232       
1995 |  Single with children     |      0.8330486       
1995 |  Single without children  |      0.8927759       
1996 |  Black Men                |      0.8415265       
1996 |  Married with children    |      0.6800505       
1996 |  Married without children |      0.8188101       
1996 |  Otherwise Men            |      0.9035344   

Вот что я нашел.

Однако я хочу найти значение разницы между Single with children minus Black men, Single with children minus Single without children, Single with children minus Married with children, Single with children minus Married without children и Single with children minus Otherwise Men

Поэтому мое ожидание:

year |  Single_with_children_vs      |      diff_in_diff

1995 |  vs_Married with children     |      0.031230201
1995 |  vs Married without children  |     -0.130002012
1995 |  vs Single_without_children   |     -0.190230201
1995 |  vs Black Men                 |      0.002030210
1996 |
.
.
.

Такие вещи.

0
Girim Ban 28 Окт 2019 в 16:38

1 ответ

Возможно, не самое элегантное решение, но вот быстрое решение:

    # I created a basic dataset similar to yours
    diff_in_diff <- data.frame(year=rep(1995:1996,8)
                        , women_and_black_men = rep(c("married with children", "married 
  without children", "otherwise men", "single with children", "single without children", "black men", "married with children", "otherwise men"), 2)
                        , empl = abs(rnorm(16, 0, 0.5))

    ) %>% arrange(year)


    # create a dataframe that is just single with children
      diff_in_diff_single <- diff_in_diff %>% 
      filter(women_and_black_men == "single with children") %>% 
      dplyr::rename("single.emp" = empl)

     # join with our original dataframe and take the difference
     diff_in_diff %>% 
     full_join(diff_in_diff_single, by = c("year")) %>% 
     drop_na() %>% 
     group_by(year, women_and_black_men.x) %>% 
     mutate(diff = empl - single.emp)
1
knawhatimean 28 Окт 2019 в 17:58
Большое спасибо Джейкоб. Это достаточно ясно, чтобы пролить свет на правильный путь :)
 – 
Girim Ban
28 Окт 2019 в 23:51
Большой! Да, похоже, у вас достаточно сообразительного кода, чтобы строить из этого. Не стесняйтесь отмечать это как решенное.
 – 
knawhatimean
30 Окт 2019 в 17:40