01:09abishai: Hello. Can I ask a question about reclocking support for Kepler family (NVE0). I used 0a mode before 4.4, now I upgraded to 4.4 and I can set 0e mode without a crash. However I receive 2 errors on that (failed to raise voltage and error setting pstate 2). However, glgeas shows 30% perfomance boost. How to interpret these errors ? Card is reclocked, but undervolted ?
01:32RSpliet: abishai: that appears to be the case yes
01:32RSpliet: you can verify the clocks you're running by "cat"ing your pstate file
01:32RSpliet: the bottom line states all the current clocks
01:37abishai: RSpliet, pstate shows the correct 0e mode clocks. I was just curious about these errors. Should I report them?
01:38RSpliet: abishai: this is a known problem
01:40abishai: RSpliet, OK, thank you
01:46loonycyborg: Is a kernel boot option still needed to actually get that pstate file?
01:53abishai: loonycyborg, yes
01:55abishai: loonycyborg, also, you could pass NvClkMode boot option to set clock automatically
01:56loonycyborg: this article misled me then: http://www.phoronix.com/scan.php?page=article&item=nouveau_try_linux316&num=1 :P
01:56loonycyborg: "no special module parameters"
01:57loonycyborg: maybe it doesn't mean what I think it means..
02:05RSpliet: loonycyborg; Linux 4.5RC1 no longer requires this param
02:06RSpliet: to the best of my knowledge it's removed when moving it to debugfs
02:28RSpliet: imirkin_: you might want to mark GL_ARB_shader_atomic_counters as done in gl3.txt under GLES 3.1 too
03:02wvvu: imirkin: if you have any other ideas let me know.
03:15loonycyborg: hmm it seems it doesn't let me change pstate with my gf440
03:17loonycyborg: it shows some pstates but there's no star to show current one
03:17loonycyborg: and it returns operation not supported when I try to write there
03:18karolherbst: loonycyborg: yeah, fermi reclocking isn't supported yet
03:18karolherbst: you could enable core reclocking, but without memory you won't get more than 25% more performance
03:23abishai: yeah, I have the same thing with old nvidia card on my laptop
03:23abishai: My next gpu's will be amd only :P
06:15martm: well let's say i am an outsider considering the development, but pretty much expected amount of code to do one extension is again looking to be in imirkins mesa, those who are concerned about the perf, i think those will ba landed during the year definitely too
06:38martm: so definitely i am preparing to start a war, and i won't work on nouveau, the conflict should not be carried here, i don't really have complaints in programming sense, which is incidentally all i care about here
06:47Tom^: what are you on about o_O
06:58urjaman:is also confused
07:00martm: Tom^: well biggest one is my rights being violated, i am not ok with it, that does not belong here, but the scheduler is a pass against the blob are IR done separately anyways, so there does not seem nothing wrong with the code though
07:01martm: it should be a simpler kind of pass, which i even basically know how to do
07:02urjaman: start from the beginning if you want us randoms to understand
07:07martm: well i don't want get any attention but, as i understand you get the performance when getting rid of pipeline bubbles when shader runs, i think calim just did not deal with tracing interrupts, however that is very easy thing, me being violated here, is life time issue now i have taken away most of the rights, but the trials are not done legitly they did stuff behind my back
07:09martm: urjam: it's a minor pass, i spammed it in pm, soon maybe you would discuss the hack, it should grant all the possible perfmance
07:17martm: urjam: so nothing more currently, in theory soon you'd there should be whole keplers driver which is beatufully stable, done good with a final set of code it also starts to perform like a blob, imirkin has done that great job
07:27martm: urjam: if you're anxious enough or impatient, you generate code so there is one recursive interrupt handler that masks slower instructions in correct sequence easiest being in handler ops reordered from fastsest to slowest
07:28martm: and generate so that there is in the end one predicated write to the clearence of the handler, based of the final color output, depending on which stage you do the pass against
07:28martm: it then just utises threads so that they are always doing work instead of waiting sometimes in lock step
07:32martm: urjam: but can also be done with predication which another control flow construct , one of two which also does not block
07:32imirkin: orbea: you play games in dolphin-emu right? do you have paper mario?
07:33martm: on radeon it's done with predication instead, but trap version is easier
07:35orbea: imirkin: not yet, I haven't used dolphin very extensively yet either, lot of the games I wanted to play on it are a real pain to bind to a 360 controller, stupid wiimote :P
07:36imirkin: orbea: ah ok
07:42martm: urjam: currently i precisely 5-6 instruction that cause heavy bubbles, if i'd research a bit i'd know all of them
07:46MichaelLong: in my head, heavy bubbles are forming the longer I read this gibberish.
07:48martm: Yeah i am heading off, you are anyways as stupid MichaelLong as the arrogant ones who limit my rights so i need someone elses permission to get a passort
07:48martm: MIchaelLong: you are a stupid guy
07:57martm: i hope that just imirkin read my stuff, i was suspecting that enourmous amount in this channel list..whatever, however some should at least here also understand, possibly imirkin at the moment being active here
07:59martm: so good bye, i am not about to waste my time, in fact ilia mirkin has not talked to me 2years in a row i belive, here i just waste my time
07:59martm: and nerves, so when he read it then he will implement it for you
08:58Jayhost: Can anyone explain the mmiotrace bug shadowramin.c ioread32 page fault. Unexpected secondary hit.
09:51orbea: imirkin: when you have a chance can you look at this apitrace for the reicast libretro core and soul calibur? Notice the text doesn't display correctly in the options menu where it lets you choose arcade and such and also when in a match the timer and other textat the top flashes. http://ks392457.kimsufi.com/orbea/stuff/apitrace/retroarch-reicast.trace.xz
09:52imirkin: orbea: ah cool. i've had a lot of trouble getting reicast working =/
09:53orbea: the libretro core was easy to get working, just lots of missing features like saves...
09:53imirkin: orbea: i assume you've verified that the trace renders correctly with e.g. llvmpipe?
09:53imirkin: (or blob)
09:53orbea: I did apitrace replace, it showed the error, is there anything else I should do?
09:54imirkin: well ideally there's some indication that it's in fact a nouveau issue and not an issue in, say, reicast
09:54orbea: I dont have llvm compiled into mesa though...
09:55imirkin: oh man. soul calibur. it's been such a long time
09:56orbea: yea, its nice how playable it still is :D
09:57imirkin: ok, well i get the same corruption on llvmpipe
09:58imirkin: which of course could just mean that st/mesa is broken
09:58imirkin: perhaps someone with an intel chip can replay that trace?
09:58imirkin: [mine's not handy]
09:58orbea: I'll do it in a moment on my laptop
10:02orbea: yea, I guess its a reicast issue...
10:02imirkin: well, could be a mesa-wide issue too
10:02imirkin: but definitely points a bit more in the direction of reicast
10:03orbea: actually, the timer doesn't flash in intel, only the text part looks broken
10:12imirkin: hrm. this qbo macro turned out a lot shorter than i thought it would. and a LOT shorter than the nvidia one....
10:12imirkin: makes me think i'm missing something
11:23imirkin: mwk: the macro engine's carry stuff is all unsigned right? i.e. 0 - 1 should set the carry bit?
11:44imirkin: mwk: can you think of a difference between "mov $r4 (adc 0x0 0x0)" and "mov $r4 (sbb 0x0 0x0)", if the $r4 gets used as an arg to "braz"? it seems like sbb works while adc doesn't. i think there might be 2 separate carry bits, one for add, one for sub.
12:41mwk: imirkin: carry flag may have opposite sense for subtract... it's actually quite common
12:42mwk: ie sbb(a, b) == a + ~b + CF
12:42mwk: which effectively means CF is used as not-borrow flag
12:42imirkin: makes sense
12:43mwk: iirc the g80 ISA works like that too
12:43mwk: matter of fact, that's next in the queue...
12:44mwk:powers up the G80
12:44imirkin: the macro isa doesn't have a "compare"
12:44imirkin: so i resorted to subtract + check the borrow bit
12:44imirkin: really annoying =/
12:45mwk: you want to compare less-than?
12:45imirkin: or gt... doesn't matter :)
12:45mwk: yeah, hm... the ISA is kind of craptastic
12:46imirkin: anyways, it works
12:47imirkin: oh just realized i left like 20 serialize's in from all my debugging
12:47mwk: I wonder how you'd do a signed less-than compare
12:47imirkin: very painfully. thankfully i just want unsigned.
12:48mwk: perhaps you just do it the VP1 way... "signed overflows are rare, right?"
12:49mwk: hmm, actually a signed compare is just an unsigned compare with MSB flipped on both inputs
12:49mwk: still... unpretty
13:17mwk: imirkin: yep, g80 add works exactly that way
13:18mwk: matter of fact, it doesn't have a sub-with-borrow instruction, you're supposed to use not+addc
13:18mwk: you can only use sub for the first step
14:10martm: MichaelLong: as i said, it's all based off what i have read, so those instructions are sure http://stackoverflow.com/questions/9886002/how-does-the-cuda-warp-scheduler-issue-2-instructions-at-a-time-for-a-warp
14:11martm: so there is imad fmad loops if else if memory load memory store and derivatives, all that is sure, because i've confirmed that and read about their specs, just the other instructions there could be more
14:11martm: i have not specifically searched them, this is allready a big bunch of them
14:18martm: kitten or not this is sure, those instructions need to be thread masked in recursive loop that isn't itself blocking any thread, then some threads will skip over it instead of scratching nuts and waiting
14:18martm: once the 4.3 is ready i read about other instructions too
14:21martm: of course there are other slow instructions i just have not read about them, i dunno what is the reason why are they slow, how many threads they can work with what is their schematic etc.
14:36martm: so anyways who do you think tou try to humiliate here, the court denies my facts, well i don't make up the facts i say take it up to cambridge university where they test cadavers nerve locations, if its done against the rules that is not my failt then, cheers
14:41mupuf: imirkin: Ilia, we should stop talking to you for a month and you would get opengl 4.5 ready :D
14:41karolherbst: imirkin: if you are always that productive if we leave you alone...
14:41mupuf: karolherbst: that was my joke!
14:41karolherbst: mupuf: hey, I wanted to write something like that
14:41mupuf: now we are both giggling
14:41imirkin: haha. more like on weekends :p
14:43imirkin: and most of it was sending out work i had previously done but just did a tiny bit of polishing on
14:43karolherbst: by the way I decided somehow to take care of the vulkan stuff, cause I have some games which should get vulkan support really fast
14:47Helios747: imirkin: do you know why some "envytools" bot is sending me notices about repo updates I've never even heard of or contributed to?
14:47imirkin: it's sending them to this channel
14:47imirkin: your irc client is broken i guess
14:47imirkin: should only ever appear in here
14:47karolherbst: imirkin: I actually found a issue with the constant folding post-ra thing
14:48karolherbst: it seems that the emiter can only emit signed mads with either 0 or -1 as the immediate
14:48mwk: exp = s1 > s2 ^ op2 >> 29 & 1 ? s1 : s2
14:48mwk: beautiful, not a parenthesis in sight!
14:48imirkin: karolherbst: IMAD32I should do it all...
14:50karolherbst: imirkin: I hit some assertions today
14:53karolherbst: run: ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp:341: void nv50_ir::CodeEmitterNVC0::setImmediate(const nv50_ir::Instruction*, int): Assertion `(u32 & 0xfff00000) == 0 || (u32 & 0xfff00000) == 0xfff00000' failed.
14:54imirkin: right, that's just because it doesn't support the IMAD32I form
14:54imirkin: where dst==src2
14:54karolherbst: with it you mean the emiter or also the actual hardware?
14:54imirkin: it = emitter
14:55karolherbst: k, so the hardware would support it
14:55imirkin: i think so, yes. check envydis.
14:55imirkin: and/or play with nvdisasm
15:33mwk: ah, I managed to forget how crazy the g80 shift instructions are
15:33mwk: good times...
15:34mariogrip: is 4k/hidpi supported nouveau? (GeForce GTX 980M)
15:37jeremySal: imirkin: I'm actually confused now about the conservative raster extension: https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_conservative_raster.txt I looked at the opengl spec and it seems this is exactly the same as antialiasing?
15:37karolherbst: mariogrip: do you plan to get such a gpu?
15:38jeremySal: For comparison here is the opengl spec on antialiasing: https://www.opengl.org/documentation/specs/version1.1/glspec1.1/node56.html
15:41karolherbst: mariogrip: otherwise you should just try it out and report back if it isn't
15:41mariogrip: karolherbst: I already own one, i'm using proprietary drivers now since I could not get 4k working (I bought 2 4k screens). but I would like to use opensource drivers so I was wondering if 4k is supported at all or if i did something wrong
15:41karolherbst: mariogrip: but if thats a mobile chip you will use the intel gpu for your main display anyway
15:41karolherbst: mariogrip: it _may_ work, trying it out is the best way to find out
15:42karolherbst: nouveau doesn't handle all corner cases yet and if something is messed up, well
15:42karolherbst: mariogrip: 980m is maxwell 1gen
15:42karolherbst: ohh, it seems to be 2gen already
15:43karolherbst: yeah, that practically means no hw accel because of missing firmware blobs nvidia didn't released yet
15:43karolherbst: mariogrip: is there actually any intel gpu on your system?
15:44mariogrip: Intel® Core™ i7-4790K CPU @ 4.00GHz × 8
15:44karolherbst: ohh thats a desktop system isn't it?
15:45mariogrip: it's a laptop with desktop cpu
15:45karolherbst: ohh okay
15:45karolherbst: and the intel gpu is kind of usable or is it completly disabled?
15:45karlmag: hmm... 980M isn't listed on http://nouveau.freedesktop.org/wiki/CodeNames/
15:46mariogrip: I haven't tested the intel one, but i'll give it a try.
15:46mariogrip: yeah i saw that too
15:46karolherbst: mariogrip: actually I don't think you would be happy with nouveau with this nvidia gpu, because with 2gen maxwell signed nvidia firmware is _required_
15:47karolherbst: mariogrip: if you use the intel one to driver your displays (it actually depends how the ports are wired in the end)
15:47karolherbst: but then you need to use stuff like bumblebee to be able to use the nvidia gpus, which has advantages and disadvantages
15:48mariogrip: the main problem is that I wanted to use/try wayland/mir
15:48imirkin: jeremySal: goooood question. sorry, not 100% sure
15:48karolherbst: yeah, but no I guess
15:49karolherbst: mariogrip: you won't get any opengl at all through your gpu
15:49imirkin: jeremySal: i suspect the difference is that regular antialiasing looks for covered samples, whereas this is for *any* intersection? dunno.
15:49karolherbst: because of the messed up stage 2gen maxwell is
15:49karolherbst: sadly we can't do anything about it
15:49imirkin: jeremySal: might want to check the GL spec for details, those docs pages are notoriously inaccurate around the edges
15:49jeremySal: imirkin: the spec I linked says explicitly any intersection
15:49imirkin: yeah i saw
15:49imirkin: but that's not authoritative
15:50imirkin: check the actual GL Spec
15:50jeremySal: Where is that?
15:50jeremySal: I thought this was the spec :3
15:50mariogrip: karolherbst: well, ok... damn nvidia...
15:50imirkin: jeremySal: e.g. https://www.opengl.org/registry/doc/glspec45.core.pdf
15:50karolherbst: mariogrip: actually nvidia should support wayland/mir in a few months
15:51karolherbst: mariogrip: they did quite a lot in that direction
15:51karolherbst: maybe it will take two years
15:51imirkin: mariogrip: if it's a laptop, chances are you will drive your screens from the intel chip, so the nvidia one won't matter...
15:51karolherbst: who knows, but they are on the right track
15:51jeremySal: imirkin: one thing I was thinking was that maybe it had to do with multisampling
15:52karolherbst: mariogrip: yeah, try to find out which ports are driven by the intel gpu and which are nvidia only
15:52karolherbst: mariogrip: you still want to use bumblebee with the nvidia driver then ;) but I am also using it and this isn't a problem, except you play a game on both 4K displays :/
15:53karolherbst: the pcie overhead should be _huge_ on this
15:53mariogrip: karolherbst: I don't do much gaming anymore so, but I would like to use my 4k screens
15:54karolherbst: mariogrip: yeah, check if they can be driven by the intel gpu or not
15:54karolherbst: mariogrip: usually enabling the intel gpu (in the bios?) is enough and we can check this out really easy
15:55mariogrip: karolherbst: I'll try, Thanks! :)
15:55karolherbst: mariogrip: maybe it is already enabled, I don't know this :)
15:56karolherbst: mariogrip: at least you shouldn't hear your gpu fans anymore if you use the intel one :D
15:56karolherbst: well if nouveau is loadeed and turns of the gpu that is
16:05glennk: jeremySal, conservative raster basically means intersect the geometry with a pixel (or fragment with MSAA) and if any part touches it, generate the fragment
16:06glennk: for normal rasterization its just checking if the point sample is inside the geometry
16:06imirkin: glennk: his point is that that sounds like the same thing as polygon aa
16:06glennk: smooth tris is a very different thing
16:06jeremySal: It seems like it's about multisample rasterization
16:07jeremySal: polygon AA is only for when you don't have multisample aa?
16:08glennk: polygon AA applies only on the edges of a polygon, and its more or less a hack where the alpha value is modulated depending on <magic>
16:08glennk: afaik on current hardware polygon AA is all done with shaders
16:08jeremySal: wouldn't conservative rasterization only affect the edges of polygons?
16:08mwk: jeremySal: indeed it does
16:09glennk: its about the rule for when to light each pixel (or fragment if you use MSAA)
16:09glennk: aka the pixel "fill rule"
16:10jeremySal: glennk: I'm still confused. When I'm looking at the opengl spec section 14.6.3 says "olygon antialiasing rasterizes a polygon by producing a fragment wherever the
16:10jeremySal: interior of the polygon intersects that fragment’s square"
16:11glennk: well, the detail here is what they mean by intersection
16:11jeremySal: is it that polygon AA also modifies values based on coverage?
16:13mariogrip: karolherbst: humm there is no option in bios, and the os cannot find the intel one
16:13karolherbst: yeah, I already feared so much
16:13glennk: polygon AA is a separate thing, it could be coverage based, but could also be looking at the edge slopes etc for computing its values
16:14glennk: could be any wild hack that smooths the edges
16:14jeremySal: glennk: the only difference I can seem to find is that conservative raster also supports circular pixels if POINT_SPRITE is disabled
16:14mariogrip: so I guess ill have to wait for nvidia to support mir then :P hope it dosn't take to long...
16:14jeremySal: glennk: in terms of what intersection means
16:19jeremySal: glennk: what do you mean looking at the edge slopes?
16:19karolherbst: mhh anybody heard of glNamedStringARB?
16:20imirkin: that sounds like the ARB_shading_language_include thing
16:20karolherbst: divinity seems to use it
16:21karolherbst: actually doesn't sound _that_ complicated, just like a bunch of crap nobody wants to deal with
16:22karolherbst: this would basically mean the glsl part of mesa would store the text and include it in the shaders where referenced
16:22imirkin: nobody implements it... except maybe nvidia
16:22karolherbst: yeah well, now we have a game using it
16:22glennk: jeremySal, just using the slope value through <magic> to assign a coverage value
16:23imirkin: karolherbst: even amd blob doesn't implement it
16:23jeremySal: glennk: sorry, I don't know what a slope refers to in this scenario? the slope of the edge of the polygon?
16:23karolherbst: the game isn't supported by amd either
16:23glennk: yeah, the edge slope
16:23karolherbst: it "should come
16:23karolherbst: at some point"
16:23glennk: but again, ignore polygon AA, its completely separate from actual rasterization
16:24imirkin: karolherbst: ok, so i'm not terribly worried about it
16:24karolherbst: imirkin: I mean, the idea doesn't sounds terribly simple, maybe a kind of kind of right implementation would be quite okayish
16:24jeremySal: glennk: I'm very confused because the spec seems to explicitly say that it does rasterize every pixel intersecting the polygon
16:24jeremySal: glennk: And then modifies the alpha based on the coverage
16:25karolherbst: imirkin: ohh it seems to add a bunch of C style stuff to glsl :/
16:26glennk: jeremySal, only applies with GL_POLYGON_SMOOTH, you can totally ignore that section for all other rasterization
16:26jeremySal: glennk: I see, it's separate modes
16:26glennk: right, it applies on top of normal rasterization
16:27glennk: some implementations may light up more pixels than others for that case
16:28glennk: normal rasterization basically for each fragment checks if its sample point is inside the polygon
16:28glennk: conservative checks the fragment square against the polygon
16:29glennk: so one consequence is you get overlapping seams if you have two triangles that share an edge with conservative raster
16:29jeremySal: glennk: Ok, thank you. I think I got confused because the spec specifies "exact aliasing in the protoypical case", which may not at all match the way it's really implemented.
16:30glennk: yeah, the spec can be overly general sometimes
16:31jeremySal: Does anyone know of an example of the SubpixelPrecisionBiasNV function being used from the conservative rasterization extension? The gameworks example nvidia released doesns't include it.
16:35glennk: not sure there are any, its a pretty fresh off the presses extension
16:35jeremySal: glennk: Ok, thanks
16:36glennk: maybe one of the gameworks binary things uses it?
16:39jeremySal: glennk: do you mean this? https://github.com/NVIDIAGameWorks/OpenGLSamples
16:40glennk: they had some global illumination library that i think used conservative raster
16:45jeremySal: ah but they don't supply source?
16:46glennk: yeah, under nda only i think for that
16:46glennk: but can probably grab an apitrace off a demo
16:50jeremySal: glennk: yeah, I was planning to write a test myself, but I was hoping looking at some examples might help me understand what it's doing
16:52glennk: i think that particular function is probably just whacking the value directly into some specific register
16:52karolherbst: imirkin: could you tell me where you see the problems in https://github.com/karolherbst/mesa/commit/243b515df8942e8c0b7133282b0e8105e11edaf3 because then I would clean this up. Also you might want to look at https://github.com/karolherbst/mesa/commit/e1584c1bf15ddeabcec95131f2129f6853fefbc6
16:52karolherbst: on my way home tomorrow
16:53imirkin: karolherbst: that first one looks good
16:54imirkin: karolherbst: second looks good too
16:54karolherbst: okay, thanks
16:54karolherbst: ohh wait, I also have this if you don't find anything in the above ones: https://github.com/karolherbst/mesa/commit/e84573f705ef36a41166b3670594367c0041aae7
16:54karolherbst: but well
16:55karolherbst: I will then clean up my pow lowering tomorrow I suppose, because that's actually useful
16:56karolherbst: ohh wait I think I will to the def replace thing in the max(abs one too, because it is much cleaner and I don't have to deal with the convert stuff at all
17:02jeremySal: imirkin: how to you compile a new c based piglit test?
17:02imirkin: piglit uses cmake, which is really annoying =/
17:02imirkin: just look at some other directories
17:03imirkin: you have to add a CMakeLists.txt, CMakeLists.gl.txt and add it to the tests/CMakeLists.txt
17:03imirkin: just... copy stuff
17:03imirkin: don't try to get creative
17:05karolherbst: imirkin: when I will make the nouveau vulkan stuff I will use cmake :p just for you. except anybody else volunteers
17:05glennk: cmake, because someone saw msvc 6 project files and thought it's syntax was awesome :-p
17:08karolherbst: ohh msvc 6 didn't use xml?
17:09jeremySal: I tried copying the Cmake files from the tests/general folder
17:10jeremySal: "Unknown CMake command "piglit_include_target_api".
17:10jeremySal: The only thing I changed was the piglit_add_executable lines
17:11jeremySal: I also added an entry to include the new directory in the tests/CMake files
17:12imirkin: jeremySal: insufficient copying of cmake files
17:12imirkin: karolherbst: thus ensuring i work on my own project :p
17:13imirkin: jeremySal: have a look at piglit commit aebf599f793bce
17:13karolherbst: imirkin: right
17:14jeremySal: imirkin: Do I need to modify all.py?
17:15jeremySal: imirkin: may have just been cmake cache messing things up.
17:16imirkin: jeremySal: not necessarily
17:16imirkin: jeremySal: but look at all the cmake bs i added in there -- htat's all necessary
17:16jeremySal: imirkin: I think I copied exactly that same stuff
17:16jeremySal: I will double check
17:25jeremySal: imirkin: thanks, got it
17:29imirkin: adding to all.py is necessary to make it actually run when you run the full testsuite
17:29imirkin: so i'd advise it, but it's not necessary to get it to build
17:36jeremySal: imirkin: okay, thanks
17:56jeremySal: imirkin: I have the traces for conservative_raster and fill_rectangle. Should I upload them for you?
17:58imirkin: jeremySal: that'd be awesome
18:00jeremySal: imirkin: http://columbia.edu/~jas2312/conservative_rect.tar.xz
18:06jeremySal: I should note that it doesn't test the subpixel precision extension for conservative rasterization
18:07imirkin: jeremySal: so for these RE style tests
18:07imirkin: it's good to have the thing off, draw, then turn the thing on, and draw again
18:07imirkin: that usually narrows down the list of possible junk it might be
18:08jeremySal: are you referring to the conservative rasterization test?
18:08imirkin: actually the other one
18:08jeremySal: or the rectangle fill
18:08jeremySal: oh I see
18:09imirkin: i'm guessing 1148 = conservative raster
18:11jeremySal: so this is based on PB: 0x80000452 GM204_3D.0x1148 = 0
18:11imirkin: and it's set to 1 earlier
18:11jeremySal: so it's red because it's not recognized?
18:11jeremySal: what about 0xf14?
18:13imirkin: it's never set to 1
18:13imirkin: only ever set to 0 :)
18:13jeremySal: haha so you just ignore it?
18:13imirkin: f14 is somehow related though
18:13imirkin: i see a macro set it
18:14jeremySal: so, this is a memory address?
18:14imirkin: no, it's a method
18:14jeremySal: as in a packet with this id is sent to the driver
18:14imirkin: no like a gpu method
18:15imirkin: the idea is that you have these classes
18:15imirkin: which present an interface for various functionality on the gpu
18:15imirkin: in this case it's the GM204_3D class
18:15imirkin: which as you might imagine, provides various 3d functionality
18:15imirkin: this class is composed of methods
18:16imirkin: each method is at a word address
18:16imirkin: when invoking a method, it can do whatever
18:16imirkin: 99.9999% of methods just take the value you give it and write it somewhere in the context
18:16imirkin: 0.000001% of methods actually do things, like draw
18:16jeremySal: so how does that match up with the dump format?
18:16jeremySal: it seems like the dump format is A=B
18:17imirkin: that's just how it's printed
18:17imirkin: think of it as A(B)
18:17imirkin: where the majority of the time, the implementation is actually A = B
18:17jeremySal: is each line a method call?
18:17jeremySal: or each "size 3"
18:17imirkin: that's related to how commands are communicated over the fifo
18:17imirkin: the basic idea is that there's a fifo
18:18imirkin: which takes in commands and decodes them and sends them to the right engine
18:18imirkin: there's some info on details here: http://envytools.readthedocs.org/en/latest/hw/fifo/dma-pusher.html
18:18imirkin: although it's largely irrelevant -- that stuff works :)
18:18imirkin: so you don't *really* have to understand it
18:19imirkin: at least not at first
18:19jeremySal: so for example GM204_3D.GRAPH.MACRO[0x3d] = 0
18:19jeremySal: GM204_3D.GRAPH is the class?
18:19imirkin: not really... GM204_3D is the class
18:19jeremySal: hmm honestly maybe the macros act differently
18:20imirkin: ok, so with macros
18:20imirkin: the idea is that you can call them
18:20imirkin: and then they can read additional parameters in
18:20imirkin: and then do more complex things
18:20imirkin: based on those parameters
18:20imirkin: now, you could do all this in the driver
18:20imirkin: but that would often include synchronization between the CPU and GPU, which you don't want
18:21imirkin: and on occasion, macros can do things that you can't do from the CPU
18:21imirkin: (at least not easily)
18:21imirkin: but... don't worry about macros :)
18:21imirkin: anyways, pretty sure that 1148 is the conservative raster bit
18:22imirkin: i'm gonna document it ;)
18:22jeremySal: ok nice
18:22imirkin: and once it's documented, it must be correct :)
18:22jeremySal: how would it be implemented?
18:22jeremySal: in nouveau
18:23imirkin: oh, just pipe the state through, stick it in the rasterizer atom, and presto - done.
18:23imirkin: if you know what you're doing, the whole change shouldn't take more than 15 mins
18:23imirkin: if you don't, i suspect it should take 1-2 days.
18:26jeremySal: should i get a trace of the rectangle fill when it is enabled/disabled?
18:26imirkin: yes please
18:28imirkin: of course it's all academic until we have nouveau actually running on GM204
18:29jeremySal: why is it not running on the GM204?
18:30imirkin: it requires signed firmware
18:30jeremySal: Does nouveau use its own firmware?
18:31imirkin: normally yeah, but that's not an option here
18:31imirkin: and due to the loading process for the secure firmware, we haven't been able to trace it
18:31imirkin: so we can't just take what the blob does and reuse it
18:31imirkin: at least not easily
18:31jeremySal: Why would they require signed firmware?
18:31jeremySal: to lock out nouveau?
18:32imirkin: i thoroughly doubt we're a big enough thorn in their side to cause such hardware overengineering
18:32imirkin: flattering though it might be :)
18:32jeremySal: GPU viruses?
18:32jeremySal: I'm confused why they'd put in the effort
18:32imirkin: i think there are official reasons like gpu-resident things
18:32imirkin: but i suspect unofficially it's to lock out the hw fakers
18:34imirkin: (they were having problems with people taking a lower end card and flashing it to make it appear as a high end card)
18:44jeremySal: imirkin: what is different about the loading process?
18:45imirkin: ok, i think the fillmode thing is 113c
18:45imirkin: but the value's all funky
18:46imirkin: er wait, no it's not funky. it's just "2". i was thinking of another thoery.
18:47imirkin: does it work if you set it on just front or just back?
18:48jeremySal: I uploaded a version that disables then enables
18:49jeremySal: same file
18:49imirkin: jeremySal: can you do one which sets GL_FRONT to something different than GL_BACK?
18:49imirkin: (and then flips them)
18:50imirkin: yeah, so it's definitely 113c.
18:50imirkin: question is what the values mean...
18:52imirkin: right now it's just setting it to 2
18:54jeremySal: ok... so I'm seeing incorrect behavior I think
18:56jeremySal: I cannot make heads or tails of what's showing up
18:56jeremySal: when I use GL_FRONT and GL_BACK
18:56imirkin: do you basically understand what it's supposed to do?
18:56jeremySal: yeah, absoultely
18:56jeremySal: front facing polygons
18:56imirkin: ok cool
18:57jeremySal: are supposed to be rendered in this mode
18:57imirkin: but then remember, FILL_RECTANGLE_NV is funky :)
18:57imirkin: i didn't check how you had things set up
18:57jeremySal: so....if i do front_and_back
18:57jeremySal: it works
18:57jeremySal: but if I only set the front mode
18:57jeremySal: or only the back mode
18:58jeremySal: it is entirely black
18:58imirkin: well, you're not exactly drawing millions of triangles here...
18:58imirkin: let's see
18:58jeremySal: It does work if I set front and back modes independently
18:58imirkin: but both to fill_rectangle?
18:59imirkin: i _think_ your rectangle is front-facing
18:59imirkin: An INVALID_OPERATION error is generated by Begin or any Draw command if
18:59imirkin: only one of the front and back polygon mode is FILL_RECTANGLE_NV.
18:59imirkin: problem solved.
19:00jeremySal: there we go
19:00jeremySal: I was so confused because it doesn't mention that restriction except that sentence
19:01imirkin: is this a gm20x ext, or is it on all nvidia gpu's?
19:02jeremySal: not sure
19:16imirkin: jeremySal: thanks for doing the traces :)
19:20jeremySal: imirkin: no problem, I'm gonna try to do the rest of them
19:21imirkin: very cool
19:21imirkin: and feel free to ask any questions about how the hw operates, etc
19:21jeremySal: i mean, is there any way I could contribute to getting nouveau working on the gm204?
19:22imirkin: there's at least *one* piece of code that i *know* needs to be written
19:22imirkin: which is that GM20x has a new texture descriptor format
19:22imirkin: it was graciously documented by nvidia
19:22imirkin: but i never got around to implementing it since i don't actually have any maxwell hw
19:23imirkin: (GM107 supports both it and the old one, while GM204 supports only the new one)
19:23jeremySal: hmm I will take a look at it
19:23imirkin: this is the "TIC" descriptor
19:23imirkin: have a look at nvc0_create_texture_view
19:24imirkin: that will basically need a new variant for GM107+ (we should just always enable it on maxwell)
19:24imirkin: the docs for it live here: https://github.com/envytools/envytools/blob/master/rnndb/graph/gm200_texture.xml
19:25jeremySal: where is the released nvidia documentation?
19:25imirkin: the link above has it
19:26imirkin: it was contributed directly to rnndb
19:26imirkin: by gnurou (who hangs out here)
19:26jeremySal: oh cool
19:26imirkin: i've been leaving implementing it to another day, but i would have no problem if someone else invested the time into it now
19:27imirkin: (that PACK_COMPONENTS thing is trippy btw... i wonder if we can make good use of it...)
19:28imirkin: you can basically ignore all the colorkey stuff
19:28imirkin: that's just crazytalk
19:29imirkin: take a look at gm200_tic_header_version
19:29imirkin: there are 5 variants
19:29imirkin: of which 2 are foo_colorkey
19:29imirkin: you can ignore those
19:30imirkin: we never use any of that colorkey junk
19:30imirkin: i don't even know what it does tbh
19:30jeremySal: So, the texture format
19:31jeremySal: is the format created to send to the hardware?
19:31imirkin: you have memory
19:31imirkin: the texture descriptor explains to the sampler hardware how to interpret that memory
19:31jeremySal: I see
19:31jeremySal: and it needs to be converted from some other format
19:32jeremySal: that I assume opengl uses?
19:32imirkin: don't worry about that :)
19:32jeremySal: I don't know about this
19:32imirkin: by the time it gets to nvc0_create_texture_view all the format stuff has been worked out
19:32imirkin: the data's in a fixed format
19:32imirkin: you just have to create the 8-byte sequence that explains how to read the texture to the hardware
19:32imirkin: er, 8-word
19:33jeremySal: so the hardware is really accepting a whole class of texture formats
19:33imirkin: which includes a number of texturing parameters as well
19:33jeremySal: and it has been converted into one of those classes
19:33imirkin: that's right
19:33jeremySal: and so nouveau is just converting a fixed description of which element of that class is being used
19:33jeremySal: to the TIC2 format?
19:34imirkin: something like that. have a look at the existing function, it should become a bit more apparent what's going on
19:34jeremySal: what's a swizzle?
19:34jeremySal: it sounds like a marketing term for teens
19:34imirkin: ARB_texture_swizzle i assume
19:34imirkin: but swizzling is also sometimes used as the old term for what is now referred to as "tiling"
19:39imirkin: another thing that needs to be worked out on maxwell is the tessellation control shader output situation
19:39imirkin: a few people looking to help with maxwell stuff have disappeared over that one
19:39imirkin: so i'm hesitant to hand it out as a task :)
19:40imirkin: but the basic problem is that i ran out of steam on dealing with tess stuff when implementing it... and never finished it on maxwell
19:40imirkin: the only thing left i think is figuring out how to read/write tess control shader outputs
19:41imirkin: they totally switched schemes compared to kepler
19:41jeremySal: I see
19:41imirkin: and even kepler switched schemes compared to fermi (which i only realized later on... it was a very minor switch)
19:41jeremySal: I'm having trouble understanding the code for the texture view
19:42imirkin: be specific
19:42jeremySal: on line 101, it reads view->tic
19:42jeremySal: but I cannot see anywhere that view->tic is set
19:42imirkin: that's the buffer for the 8 dwords
19:42imirkin: it's an array embedded in the nv50_tic_entry i think
19:43jeremySal: wait, I see
19:43jeremySal: it's writing not reading
19:43imirkin: and then later on those are written out when that texture needs to be set for a slot...
19:43jeremySal: so it's the pipe_context that it's reading from
19:43imirkin: the task is to create a sampler view
19:44imirkin: it's given the texture resource as well as a sampler view template
19:44imirkin: which contains all the various view parameters
19:44imirkin: so what it needs to do is based on those, compute the 8 dwords that will eventually be uploaded into the TIC table
20:06jeremySal: curious why code for nvidia texture formats is in mesa
20:06jeremySal: I thought mesa was the CPU implementation of opengl
20:26urjaman: mesa kinda is the implementation of opengl (regardless of the accel/target)...