Fix DailyRepo Semantic keyword analysis

Problem

The “TOPICS BY PROGRAMMING LANGUAGE” section on the website was empty. The service that computes weekly topics was failing, so no new weekly data was saved for recent weeks.

Diagnosis

How I Tested

Findings

Solution Implemented

  1. Weekly scope: only include repos whose trendingDate is inside the current UTC week.
  2. Per-language clustering:
    • Build language -> topics[] first.
    • If a language has fewer than 20 topics, skip clustering and use raw counts.
    • Otherwise, cluster per language with HF.
  3. Fallbacks:
    • If HF fails for a language, use raw topic counts for that language only.
    • If the current week is empty, fall back to the latest non-empty week.
  4. Operational tuning:
    • Batch size reduced to 4.
    • Concurrency limited to 3 languages at a time.
    • Log summary of languages that fell back to raw counts.

Result of the New Approach

Trade-offs

Accepted Solution and Conclusion

This approach is accepted as a pragmatic fix for free hosting:

If the project moves to paid or dedicated inference later, the clustering can be made fully reliable and optionally cross-language comparable again.