Week 53 in december 2015 counted within week 1 of january 2016

Solved!
casper
Level 2
Week 53 in december 2015 counted within week 1 of january 2016

Hi everyone,



I was plotting some data of a webhop sales order dataset. When I plotted the totals by week, there was one week that sorely stood out of all the other weeks in 4 years of time, which was week 1 of 2016. The total amount was a little more than double of the second highest week and thus a huge peak compared to the others in the graph.



I immediately classified this graph point as an outlier so I started investigating this particular week by narrowing the graph down to this week. Surprisingly, when I explored this week ordered by day, no outlier or total amount was equal to that given the graph. So then there must've been some other error. When I took a few more dates before and after this week, I noticed that week 1 was growing the more days I added before week 1 as stated by the calendars (January 4th until January 10th).



It was then that I noticed that week 52 of 2015 only appeared as soon as I included December 27th. When I went back to the limit of December 28th until January 10th, all data was classified as week 1. It was then that I noticed December of 2015 had a week 53.





So there we have it. The 53th of December 2015, which is December 28th until January 3th is grouped with week 1, skewing my data graph. But I can't throw the sales of this week because it is relevant and true data. The high peak and the week being outlier now isn't that surprising because between December and February are the busiest weeks for this organisation.



How am I supposed to deal with this 53th week so my graph is correct without throwing away the data of the 53th week? Am I doing something wrong or is deleting these rows of a week of sales really the only solution?



Edit, chart grouped by week:





 



 



Thanks in advance,



 



Casper

0 Kudos
1 Solution
Alex_Combessie
Dataiker Alumni

Hello,



Thank you for sending a sample of the data. I was able to reproduce the issue. It is a bug, which I reported to our R&D team.



In the meantime let me suggest the following workaround: add a Python processing step in a Prepare recipe with the following code:




from datetime import datetime

def process(row):
python_date = datetime.strptime(row["created_at"], "%Y-%m-%dT%H:%M:%S.000Z")
year = python_date.isocalendar()[0]
weeknumber = python_date.isocalendar()[1]
return("%s-W%s" % (year, weeknumber))


This will create a new column with the expected year and week number, which you can use for aggregation in the charts.



Here is a screenshot if that helps:





Best regards,



Alex

View solution in original post

3 Replies
Alex_Combessie
Dataiker Alumni

Hello,



Thank you for sending a sample of the data. I was able to reproduce the issue. It is a bug, which I reported to our R&D team.



In the meantime let me suggest the following workaround: add a Python processing step in a Prepare recipe with the following code:




from datetime import datetime

def process(row):
python_date = datetime.strptime(row["created_at"], "%Y-%m-%dT%H:%M:%S.000Z")
year = python_date.isocalendar()[0]
weeknumber = python_date.isocalendar()[1]
return("%s-W%s" % (year, weeknumber))


This will create a new column with the expected year and week number, which you can use for aggregation in the charts.



Here is a screenshot if that helps:





Best regards,



Alex

casper
Level 2
Author
Great stuff, thanks. 🙂 Hope to see this fixed in an upcoming release.
0 Kudos
vho
Level 2

hello, just wanted to say this bug still exists when extracting weeks @Alex_Combessie Screenshot 2020-10-27 at 12.00.49.png

0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku