Hasty Briefsbeta

Learning from Heuristics

11 days ago
  • #data programming
  • #machine learning
  • #weak supervision
  • Data programming is a weak supervision paradigm that uses maximum likelihood estimation to generate soft labels from heuristics.
  • Labeling functions in data programming can abstain or provide incorrect labels, with rates α (correct) and β (abstain).
  • The method estimates the likelihood function assuming labeling functions are independent and class probabilities are uniform.
  • Soft labels are derived using conditional probability, enabling training of models without true labels.
  • A linear probability model with L2 regularization is suggested to prevent overfitting to noisy soft labels.
  • An example using the BreastCancer dataset demonstrates the method's effectiveness with domain-inspired labeling functions.
  • The approach is useful when true labels are scarce but domain knowledge allows for heuristic labeling functions.
  • Snorkel is a Python package that provides advanced data programming features.