First compliments on the three-part Stark explainer, great work.
I finally came around downloading the python code to look at its inner workings and I think I found a weak spot.
At every stage of the recursion you sample the rows that are checked based on the merkle root of the values of the column. Because we sample only 40 rows, we can get away with quite some changes in the column without getting caught. For example, if we change 10% of the numbers in the column and do 40 samples, we have a chance of 1.4% of not getting caught. We can easily try multiple variants of the column to beat the odds.
When we start with a high degree polynomial and have 10 levels of recursion, it is possible to change all points in such a way that they are all on the same low degree polynomial once we reach the final check.
To make this attack harder we could sample the rows of all levels based on the last merkle root. In that case we have a chance in the order of 10^{-19} which is much better. Turning it into an interactive proof via the blockchain is also possible, but might be less practical in some cases.
To avoid this attack altogether and also make the proof smaller, we should follow the sampled points from the base layer all the way to the final polynomial. So instead of sampling just one value on the column per check, we should sample the 3 sibling values as well, to use in the next recursion step. Skipping the second lookup per merkle tree can reduce the total size of the proof significantly.
By following these “point-traces” all the way, we make sure they have not been tampered with. You could still try to tamper with the sibling points, but this is not useful without knowing the column that will be chosen next.