00:39imirkin: skeggsb: ping
04:07imirkin: gr. so that grid autosport compute issue is somehow due to the BB splitting that handleSharedATOMNVE4 does... :(
04:07imirkin:hates that function
04:10Tom^: karolherbst: sure
04:10Tom^: imirkin: tell him i can do it. he just needs to tell me what :p
04:11imirkin: Tom^: tell him yourself
04:11Tom^: im working so weird times this weekend i might miss him :(
04:12imirkin: if only there existed some form of asynchronous communication mechanism
04:17imirkin: grrrr. CALIM!!!! gr.
04:17imirkin: with his clever graph impl
04:17imirkin: somehow the cfg iterator manages to get confused
04:45imirkin: well that was fun.
07:21kloofy: can anyone try to make sense of the cuda maximum number of instructions per kernel, it is 512million, it's too big of a number, with what to divide it to get maximum pixel shader instructions?
07:22kloofy: cause 8gb is 8million bytes, instructions can't exceed the memory
07:27kloofy: i can do the 512/2=256 and hence 65532*256=around 1.7million which seems correct this time, and probably is too, but i dunno how those numbers are calculated
07:33karolherbst_work: Tom^: run gputest pixmark_piano once on nouveau full reclocked and once on nvidia and tell me the fps
07:36kloofy: shows that maximum number of instruction slots is just 65536 for sm5.0
07:57kloofy: so anyhow we had the discussion to confirm the theory i'd need to look the schematics of cmos alus, which i looked very long time ago, and i kinda too busy to do that again, but may have to
07:58kloofy: so something that i remember is that digital transistor is a three terminal device in cmos
07:58kloofy: gate source drain
08:00kloofy: i have to relook, but it's something like when g and drain are pulled to 0 and 5v respectively, that opens or closes the channel
08:01kloofy: between probably source and drain...so if the transistors are connected on fpga devices properly, there should be away to open or close any of them
08:03kloofy: so i am not entirely sure, but probably i don't belive that the design is bulkier still having trouble to belive it, but though little bit slower
08:10karolherbst_work: pmoreau: if you have any problems with the script to generate a proper working tree, please let me know :) I still plan to add a travis-ci to at least build test the stuff against a drm tree or so. Anyway, there might be a problem when bens tree gets out of sync with the drm tree :/
08:11kloofy: let's just say i know this is incorrect information, but i can not explain this of course why and how they were mistaken that bulkier design theory
08:16kloofy: and i don't think i will ever be capable of explaining this either the way things have gone, but i will just try to work with those devices
08:18kloofy: it's something about after every while there is storage element like d-type flip-flop which made around 50-100transistor itself too
08:19kloofy: and this is miraclously hard how that all works in the end, but the bitstream is either taken from those regs or memory
08:27kloofy: and just when you send the correct stream i think everything is reprogrammable, depending wether you send the hole or electron through the wire, which was preprogrammed
08:28kloofy: i think this magically complex actually, but i think tools manage this automatically synthesis stuff
08:34kloofy: it's like place and route is crucial there, if one has a good tool, it will place things correctly and wisely
08:38pmoreau: karolherbst_work: I’ll have a try during the week-end, maybe tonight
08:40karolherbst_work: awesome :)
08:41pmoreau: So, in your script, you first create a _tmp branch, then apply all patches on top of it, and then create a new branch ready?
08:41karolherbst_work: I am not quite sure how to handle that in the end
08:41karolherbst_work: I was thinking that I could timestamp the successful runs or something
08:42karolherbst_work: so that you can always switch to a previous generated tree
08:42pmoreau: The problem I see with it, is that if you run the script again, it will most likely fail due to _tmp already existing and having a different HEAD than $NOUVEAU_BASE_COMMIT, won’t it?
08:42karolherbst_work: it doesn't
08:42karolherbst_work: check the trap handler
08:43pmoreau: `trap abort EXIT`? NFC what that does
08:43karolherbst_work: it gets executed on userspce handled signals
08:43karolherbst_work: like if you send SIGTERM the handler gets executed
08:43karolherbst_work: also on execution end
08:44karolherbst_work: so even if you abort while the script is running, everything gets cleaned up
08:44karolherbst_work: the abort function that is
08:44pmoreau: Oh!! I hadn’t seen the abort function! Gotcha
08:44karolherbst_work: trap is a bash builting
08:47karolherbst_work: maybe we need to also add support for patching drm this way
08:47karolherbst_work: for patches which also affect drm like runpm stuff
08:53karolherbst_work: pmoreau: I also gave you push access to that repository and I plan to move it into an organization after we can consider that thing useable/stable enough
08:54pmoreau: Great, thanks!
08:55kloofy: got some tobacco, well technologically it's possible to avoid bulkier design, i can guarantee that, i just do not know how intelligent those tools are
08:56kloofy: when one needs to crawl with instruction register somewhere at reconfiration or synthesis phase, physically some bottom down lower units, the top ones gates can be opened and routing and later settled in when bottom ones have been settled
08:57karolherbst_work: There was once a paper, about testing the error resistance of filesystem drivers against disc corruptions or unexpected responses/reads. they had some smart code instrumenting going on in the kernel. Maybe I can somehow port that over to nouveau and run it in userspace to fetch issues where nouveau crashes/hangs due to unexpected reg reads from the gpu
09:00kloofy: it's just the fact that from ram one can reconfure any combination always
09:02pmoreau: imirkin: Which status do you use for closing bugs that haven’t been updated in a while, despite a request for more info?
09:19kloofy: too difficult subjects i belive but really we're fine on this one, i really trust the tools i know they do very well, when i write the logic it will plase it exactly so, a tool and a logic code is crucial only
11:12Doctors: imirkin; Why was I pinged about memory reclocking?
11:17karolherbst_work: Doctors: bad ping I guess
12:32kloofy: so...so if NIOS II does it 300DMIPS*1757 it's crazy number of MIPS probably
12:33kloofy: says also 3cycle multiply operation etc.
12:33kloofy: it's not possible that fpga vendors do not know how to route their logic for high-performance
12:52kloofy: http://www.roylongbottom.org.uk/dhrystone%20results.htm so the benches are here, 200mhz 300DMIPS is 3000DMIPS with 2GHZ by clocking the logic
12:52kloofy: that makes sense
13:16kloofy: https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/ds/ds_nios2_perf.pdf so anyways it's hard work, but the chips will work if done right
13:16kloofy: in verilog
13:17kloofy: fast utlizes only 840alms on cyclone V and it has more then 13.000 on for the lowest end
13:19kloofy: from there if the calculations are done a bit, that proves and intel and i was right, and internet information is occationally wrong there
13:26kloofy: hard yeah, but this sort info that 10billion transistors = 50million asic gates is obviously imo a big crap
13:53kloofy: anyways prolly start to annoy again , off to a shop the thing is when transistor is used as a wire lead of some sort, there should be no expense in doing so, if those terminal capacitors are precharged, it would be a wire until the capacitor is charged empty again
14:01kloofy: i dunno currently i write my thoughts here, because noone has in great detail understood what i am talking about back in my home
14:02kloofy: sophisticated to find partners, but my friend is over average intelligent, so i hope we can hit the target there with a team that is being formulated
14:19imirkin: pmoreau: RESOLVED INVALID
14:20pmoreau: Ok. I would have picked that one as well, but wanted to check for consistency.
14:25kloofy: karolherbst_work: but yeah the reconfiguration from netlists isn't a big deal, but synthesis of those netlists, without accelerarators yeah this really painful it can take two weeks or so on bigger designs
14:31kloofy: currently not too heavy resources available should write a virus of some kind:)
14:34kloofy: so sm5.0 instruction slot count is 65536, but does it ever count for texture instructions too?
14:35kloofy: unfortunently it's the thing i probably need to know also, though that amount of cache and asic hw logic to keep the tiling/index/offset stuff is okish
14:41kloofy: i am considering the schedulers idea almost stable , but bit of additional info is needed for the sake of
14:41kloofy: well some clean calculation and perf
14:42Tom^: karolherbst_work: im such a nerd, muh new keyboard =D http://i.imgur.com/FZfmfW9.jpg
14:42Tom^: karolherbst_work: also, onto the pixmark_piano tests!
14:44imirkin: nice ... linux keys
14:45kloofy: let's bit hypothesize on that one, normally the variable for the placeholder to read from a texture is specified from LDS i.e like sampler or uniform variable or interpolant
14:54karolherbst_work: Tom^: these days being a nerd is kind of positive, so I say you are such a non-positive-nerd
15:06Tom^: karolherbst_work: 24fps on blob
15:06Tom^: its really laggy
15:08karolherbst_work: nvidia are such cheaters
15:08karolherbst_work: Tom^: original resolution?
15:08karolherbst_work: or fullscreen
15:09Tom^: hold on il make a nouveau benchmark, put it in a pic and imgur it.
15:09karolherbst_work: try the window thing with 1024x640
15:09karolherbst_work: nah, I am really just interested in an nvidia vs nouveau at 1024x640 fps comparison
15:09karolherbst_work: nothing else matters
15:10karolherbst_work: because I get like 63 fps at 1024x640 with nvidia on a GTX 660
15:10karolherbst_work: headless X
15:10kloofy: but when you read from one texture and put that into reg, and read according to that from another texture, then yeah the instruction slot is still the same, it is based off that particular reg
15:11Tom^: karolherbst_work: what is reclocking_part_1 ?
15:11karolherbst_work: stuff :p
15:12kloofy: so basically maximum number of texture instructions could be 1024, i.e the number of different regs available
15:22Tom^: karolherbst_work: http://i.imgur.com/zWqvOKT.jpg http://i.imgur.com/WJ66pGV.jpg
15:23karolherbst_work: 800x600 fullscreen? seriously?
15:23karolherbst_work: or was it like fullhd?
15:23Tom^: no idea why it assumes that
15:23Tom^: but i ran it with /fullscreen tho
15:24karolherbst_work: which is?
15:24Tom^: perhaps it just upscaled 800x600
15:24karolherbst_work: but I would rather have results for 1024x640 on nvidia
15:25Tom^: x640 ?
15:25Tom^: never heard of that res before
15:25karolherbst_work: window mode
15:27kloofy: which brings us back to operand collector
15:27kloofy: i think at any time in the proccess operand collector stores the most recent value of any reg probably
15:35kloofy: now i messed up quite probably , i got jammed i think , that'd be pointless
15:35Tom^: karolherbst_work: http://i.imgur.com/GbSi48Y.jpg http://i.imgur.com/ykGRHt8.jpg
15:36Tom^: karolherbst_work: i wonder if im hitting vsync tho heh
15:36karolherbst_work: 3600 frames in 60 seconds
15:38Tom^: 4514 points, 75fps. yea i was vsyncing on blob
15:54kloofy: mgottschlag: i've done some monologue rally here, which you may check out from logs
15:55kloofy: but now i am trapped, with this texture instructions not counting against maximum instruction slots
16:24kloofy: ouh, again lulz in internet, they count still
16:26imirkin_: hakzsam: let me know if you plan on looking at my nv50/ir cfg iterator fix (you don't have to look at it right now, just let me know if you will). otherwise i'll push it tonight.
16:29kloofy: imirkin_: http://forum.doom9.org/showthread.php?t=157634 maybe you'r comments would be welcome, do the texture instructions count against the mx 65536 limit?
16:30kloofy: suppose i should just try to compile, and leave you guys alone with my stuff
16:37kloofy: yeah i figured it all out
16:37kloofy: ready for rambo actions...,:=}9/
16:48hakzsam: imirkin_, yeah, I will
16:48hakzsam: maybe not today though
16:48imirkin_: weekend's fine
16:58hakzsam: just plugged a gm107, thanks mupuf! :)
17:06Yoshimo: less restrictions, always a plus
17:33karolherbst: mupuf: seems like your GTX 660 is just 10% slower than a gtx 780 ti :p
17:33karolherbst: 75 fps on the 780 ti and 63 fps on reator
17:33karolherbst: well, more like 18% or so
17:34karolherbst: I guess nvidia cheats a bit somewhere
17:44kloofy: when i'm gone, when i'm gooone, you gonna miss me when i'm gone, you gonna miss by hair you gonna miss me everywhere, you gonna miss me when i'm gone:) you gonna miss me by my talks you gonna miss me by my walks..
18:18karolherbst: mupuf: I want to install libxnvctrl on reator on the nvidia side, but it seems like everything is plain outdated there :p
18:19karolherbst: Tom^: do you mind running "for test in fur pixmark_julia_fp32 pixmark_piano pixmark_volplosion plot3d triangle gi; do MESA_GL_VERSION_OVERRIDE=3.3 MESA_GLSL_VERSION_OVERRIDE=330 vblank_mode=0 DISPLAY=:0 ./GpuTest /test=$test /benchmark /no_scorebox /msaa=0 /benchmark_duration_ms=30000 /width=1024 /height=640; done" for me once on nvidia and once on nouveau full reclocked?
18:20karolherbst: tobijk: and then paste me the _geeks3d_gputest_scores.csv file
18:20karolherbst: I meant Tom^
18:23karolherbst: \o/ formal acceptence
18:24karolherbst: but I guess everybody got that :p
19:41mupuf: karolherbst: not everyone
19:41karolherbst: is the 250 word limit a strict one?
19:43karolherbst: anyway, I was thinking about how to add support for libXNVCtrl to env_dump. I think a link time dependency is way too much for this, so I would impement it plugin like
19:43karolherbst: any prefered way how to do this?
19:43karolherbst: I meant more from an internal API perspective
19:44mupuf: ah, right
19:44mupuf: well, maybe it is time I move to cmake?
19:44mupuf: and then we can use cmake to do the right thing
19:44karolherbst: doesn't matter that much though
19:44karolherbst: just some smart symbol exporting
19:45karolherbst: but I could also just dlopen that librariy and have a list of fitting provider for stuff
19:45mupuf: or just adding a define to compile or not nvidia's stuff
19:45karolherbst: that will be messy if people want to start distirbuting it
19:45mupuf: btw, I added support in a branch for cpu monitoring ...
19:46mupuf: then you want to include the headers in env)dump?
19:46karolherbst: not quite sure yet
19:46karolherbst: but we could have several so plugin files which do the actual loading
19:46mupuf: the only problem with the cpu monitoring is ... power management
19:47karolherbst: and env_dump has a map for driver -> plugin or so
19:47mupuf: how would this help?
19:47karolherbst: you don't have to compile it within ezbench
19:47mupuf: ah, I see
19:47mupuf: well, that seems a little bit overkill IMO
19:47karolherbst: it is
19:48karolherbst: that's why I am asking what you would prefer
19:48mupuf:wonders how unigine does it
19:48mupuf: they probably just ship with the headers
19:48karolherbst: static linking
19:48karolherbst: most likely
19:48mupuf:would probably just want the user provide nvidia's library and headers at compile time, if they want support for nvidia
19:49mupuf: it is annoying for distros because of the make dependency
19:49mupuf: but they can also just say: don't care
19:49karolherbst: we could still dlopen it then
19:49mupuf: don't care about nvidia
19:50karolherbst: so we would just have a strict compile time dep, but no runtime dep
19:50mupuf: yes, you can, but what about the headers?
19:50mupuf: yes, exactly
19:50karolherbst: well, if it is there, support will be compiled in
19:50mupuf: and let distros either care or not care about the make dep
19:50karolherbst: but then it isn't your problem anymore
19:52karolherbst: also, I didn't get nvidia to perfom on reator like on a "normal" desktop system within gputest
19:52mupuf: what do you mean?
19:52karolherbst: the excessive performance of nvidia
19:52karolherbst: it's way too high
19:52mupuf: right, waaaaayyyy too high
19:52karolherbst: Tom^ got like 63 fps on his 780 ti
19:53karolherbst: and your 660 had like 55 or so
19:53karolherbst: mhh actually it was 63 vs 75
19:55mupuf: not bad, see?
19:55karolherbst: currently readin the anti harassment policies, "following" seriously? I can't even follow anybody? :p
19:58mupuf: karolherbst: you know what it means. It means following as in, creepily following
19:58karolherbst: that would be "stalking", right?
19:58karolherbst: I think it was meant to be "deliberate following"
19:58mupuf: well, stalking is a bit longer, AFAICT
19:59karolherbst: but it looks a bit oddish on the page
19:59karolherbst: right kind of
19:59karolherbst: I am sure there is a better word for this
20:00karolherbst: like pursuing
20:00imirkin_: so it's ok to follow, as long as you're not creepy-lookin' :)
20:00karolherbst: will be fun when the security starts to follow me
20:00karolherbst: "sorry, please don't "follow" me anymore, I fell uneasy"
20:01imirkin_: what is this about btw?
20:01karolherbst: anit-harassment? not quite sure what that is about :p
20:01karolherbst: ahh context, mhh xdc
20:02karolherbst: I think if you write stuff like that, you should risk that somebody makes fun of such genreal terms, because it might be taking less seriously
20:02karolherbst: or others are making fun of it
20:03karolherbst: anyway,if I could suggest an improvement, I would replace following with pursuing
20:03karolherbst: sounds more serious
20:04mupuf: true. But this is based on what people wrote for big conventions
20:04karolherbst: I know
23:10tobijk: imirkin: do we have something to test on nvdX? (nvd9) i have one around righ now :)
23:11imirkin_: not sure what your question is...
23:12tobijk: general tests to run, open bugs, whatever comes to your mind...
23:13imirkin_: not offhand...