00:05imirkin: Jayhost: indeed it was, thank you!
00:14imirkin: Jayhost: btw, 11.2.0 is out now
00:17karolherbst: friggin nvidia, mupuf there is indeed a third voltage max entry...
00:18imirkin: why only have 2 when you can have 3 :)
00:18karolherbst: ohh wait
00:19karolherbst: I think I overwrite one
00:19karolherbst: k, there is still a third one
00:20karolherbst: now I get a bad feeling
00:24karolherbst: there are more
00:25karolherbst: I think there are actually 6
00:25karolherbst: and there is a byte to tell the driver how many to use
00:28imirkin: or maybe which one?
00:28imirkin: it's pretty common to have a list of values, and then something to tell you which thing in the list to use
00:29karolherbst: I am not quite sure yet
00:29karolherbst: it behaves a little odd
00:58karolherbst: this table is a lot more complicated than I though
01:08karolherbst: I can change the calculated voltage with a byte from the header
01:15karolherbst: enough for today :/
01:15karolherbst: this is getting even more complicated...
01:16karolherbst: it is much better than stock nouveau already though
07:01martm: skeggsb: prolly my questions are too complex and not too specific to be answered, but i think the idea you should had understood, if you understand that basic idea, then you knowing the internals imo should be able to fix the multithreading up
07:06martm: i suspect you do it in the fashion demonstrated there, but from nouveau_mm.c you see that it should possible to test like when you pull cache context you compare if the address or contents equals the last slot of the cache, and then using the logic
07:06martm: here http://www.fsfla.org/svn/fsfla/software/linux-libre/freed-ora/tags/f15/2.6.38-1.fc15/drm-nouveau-updates.patch
07:07martm: you'd kinda pause the pushbuf/pfifo...and reorder the cache and execute it and signal unpause after
07:12martm: YES, i think it should work according to comments i have seen, the pushbuf will continue again after the signal from where they left in the bo's what they were fed
07:17martm: if you have troubles with detecting the cache's last slot, then test it if it is valid or not instead, when the physical tag can not be fetched and compared
07:18martm: i.e it starts as invalid, when it comes valid..move the last slot into scratch reg, and make it invalid again, when it comes valid again, then pause the fifo, reorder the cache and execute it
07:22martm: skeggsb: i think it is very with current infrastructure in contrast to others opinions, maybe 100lines to correct spots in the code
07:25martm: skeggsb: but it would work only for fermi kepler if you have some sort if gpu pipeline coherency issues
07:30martm: i got sirens here, must me mistaken with the last theory then
07:32martm: so what cache is l1 the non coherent one, the one in pfifo, if reorder that one as instruction cache..maybe it is impossible to get a deadlock
07:41martm: yeah , it is impossible if that happens to be l1 cache what pfifo uses, it is round about 64kb, well that is a bonus
08:21martm: well it should not leak, cause the execution of the commands is probably lot faster then uploading them, it would continuously catch up
08:26martm: for example/instance if gallium queues the commands with 1000units you pause the fifo for 10units probably due to executing them something like that, could be done in paper
08:26martm: on paper
08:37martm: so i think imirkin was right, one would not want to block the gallium, just the hypothesis, if there is a faster runner and slower runner, faster runner can rest and give a handicap to slower one, and theyd finnish the same time, allthough faster one had a rest in the middle of the race
09:58szt: lissyx: +1 for ipv6
10:01lissyx: szt, ?
10:03szt: you're using ipv6
10:41RSpliet: karolherbst: could the different max voltages be linked to the PCIe power budget?
11:22martm: RSpliet: started with beers, managed to overlook the idea with almost sober head, i dunno..i think your overreacting about/overstating the complexity, i think this is really easy
11:23martm: but i won't go any further, i am resting with peaceful soul today, and i have bit other things to do, when i get back at some point, i may go trying to fix the multithreading
11:23RSpliet: martm: I hope you're right :-)
12:07martm: But arbitrating the pushbuf writes, must be handled, it seems to be an old conversation , wether to give each thread a separate channel, or do it some other way, anyone who remembers it could kick into this monologue:)!
12:33martm: peeked into trello, there is mlankhorst patch that needs a review and testing, what i think is, depends what it does when all pushbufs are occupied, if it will arbitrate then make the correct context to wait, it would sounds as good as multiple contexts per screen using single pushbuf
12:38martm: i can't make much out of the patch, but let's say there are 127channels, if all are occupied, what would the code do..this one https://cgit.freedesktop.org/~mlankhorst/mesa/commit/?id=c12b774ca39ac848791ebcecb3ef947293c78411 ?
12:46martm: RSpliet: i think the patch is good, honestly i can't read it entirely, this maarten would need to add couple of comments, but it seems some sort of fences are used arbitrate something, if they are, the patch is basically ideal...
12:48martm: i think while calim would had rejected it, if we'd get some comments of how it works, and it follows my logic standarts, i'd accept that solution as the very best imo, it makes sense
12:55martm: what i gather, it should not matter at all, when and how you wait, either for context on single pushbuf, or you wait for a context among several that of screen on a signle bushbuf, it really should work the same
12:55martm: i mean you still wait
13:09martm: i have to patch that code, it almost looks very good imo..lets say i would make minor modification ontop of that solution for the perfromance, by the seems of it, i think the problem could be almost solved
13:11martm: nah i am not an expert yet, but i'd prefer in any cese mlankhorst solution to this issue in any event, imo it's better the other solution is not needed
13:14martm: the perf patch that i think of, would not touch any lines in that code, it's somewhere in the kernel where some of lines are needed
13:18martm: it's like instead of relocating the buffers into a final playlist, i'd instead play with the cache
13:24martm: Nice work guys, i have it almost figured out, the missing pieces, and i am sorry for being a "XYY" at times
13:25martm: RSpliet: i will definitely do mistakes when i get back into this, ideas are all good, there should not be nothing complex, just me rusty, logics say that
13:33martm: if anyone was interested in the performance, i somehwere have the links that say open source model is faster , ioctl vs. mmap or something, generally if we fix multithreading, we add pipelining scheduler zcull and what not, ialso plan a framework that generates a blob games with a cpu's linker, without the last even it's sure that this code will go passed the blob if you handle the clocking stuff
13:33martm: because i do not know anything about this land
13:36martm: the cpu linker acts as precompiler using certain techniques, it would produce code that is card specific, but just enourmously faster at runtime then any other api would allow
13:38martm: now that i remember mlankhorst was the one two thought me how to watch ponies:)
13:40martm: taught..but off to rest now, thanks for listening, cheers at the moment, we'll come back to those things...
16:12karolherbst: okay, there is a third one in a header field which isn't always there...
17:55orbea: cheers to pcsx2, they fixed that blurry hw rendering issue I was talking about the other day :)
21:18hakzsam: 99% of images support on GK104 is now done, I'll submit the series that will enable GL 4.2 in a couple of days :)
21:24karolherbst: and I just sent out my patches :D
21:25karolherbst: mailman screwed up again
21:28karolherbst: funny how small the first version was