09:47starstuff: Hi I just installed Debian 'Stretch' with an nvidia 1070 and it fails to get to the desktop with an error message "nouveau: unknown chipset". Can anyone help please?
09:48starstuff: and I am confused because I just installed the stable version of Debian (Jessie) on my brother's computer and he has an nvidia 1070 also (different manufacturer though).
09:50Yoshimo: which kernel/mesa versions would that map to?
09:50starstuff: Yoshimo: I may need a little guidance but I'll try to get that info. I'm on the console right now on the affected system.
09:50Yoshimo: uname -an should give the kernel
09:51Yoshimo: lspci -v gives the pci id of the card involved
09:51starstuff: Linux system 4.7.0-1-amd64 #1 SMP Debian 4.7.8-1 (2016-10-19) x86_64 GNU/Linux
09:54starstuff: VGA Compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1) (prog-if 00 [VGA controller])
09:55starstuff: Yoshimo: ^
09:58Yoshimo: 4.8 was initial pascal support by ben skeggs according to the log, so upgrading that one would be me first try
10:00starstuff: oh, I guess I didn't realize nouveau was at intial support for the 1070
10:00Yoshimo: acceleration for maxwell&pascal is out of reach right now so using nouveau for anything challenging is a waste. Serious stuff that really makes use of your recent card should be done on the bianry
10:00Yoshimo: binary driver
10:00starstuff: I'm OK with using the binary but I don't see any instructions for Debian 'Stretch'
10:01starstuff: I'll ask on #debian thx
10:33pmoreau: Yoshimo: Hardware acceleration on Maxwell should work, as NVIDIA released the gr firmwares. Reclocking is not advertised there, so it will still end up being a waste, you are right. :-)
10:34Yoshimo: i am so tired to tell people "nothing to see here, go use the binary if you really want to use your expensive card"
10:36pmoreau: Well… there isn’t much else you could say, besides "maybe at some point in the future, it will work better with Nouveau"
10:37pmoreau: Or do like imirkin and say "Buy AMD next time!" :-D
10:38Yoshimo: it will work better for sure, but will it work "well enough" and "in time before the card becomes obsolete" is not clear yet ;)
10:38Yoshimo: i am not that frustrated to reccommend the competitor
10:43pmoreau: NVIDIA did recently release the PMU firmware for GM20B (IIRC), along with some rework of the firmware handling code. Hopefully we will see the remaining firmwares for Maxwell and Pascal land in 2017.
10:43pmoreau: funfunctor: Hello
10:44funfunctor: mwk: about ?
10:44funfunctor: Need to talk to some RE guns.. I tried ##re and those folks have no idea
10:45CGI453: my intel tells that RSpliet and Lekensteyn and mupuf, are the intelligent ones here, where mupuf and RSpliet understand what i have been plannig, where Lekensteyn seems to be good at acpi issues, well Lekensteyn, you'd have to look at how shuffle and amd equivalent ds_swizzle works to understand how to do it on amd hw, amd docs say shared registers reduce register pressure, where shuffle docs, say only active lanes participate
10:46mwk: funfunctor: give me 0.5 hour
10:46ph8: Hi all
10:46CGI453: hence there is a hack to point instruction operands with indirect addressing to one that you'll modify with shuffle or ds_swizzle, and free up all the idle alus
10:48ph8: I'm liking nouveau, especially because it just works - the nvidia proprietary drivers cause all kinds of hassle for me
10:48ph8: I am getting some dropped frames when trying to watch a video on a third screen though, can anyone suggest where i should start looking?
10:51CGI453: this is very big performance boost given in responce, and a very thoughtful way by designers and also utterly easy to do it
11:06CGI453: so in the job application that i filled in, because most others did not understand the logic based of pseudo code i provided, i was very straight saying the fact that AMD cards get around 4x perf boost easily in addition, if they translate me being crazy cause of that, that is definitely their own misunderstanding, because that is what they report in the pdf exactly
11:10CGI453: also my method is bit better then on NVIDIA cause it would account with cache misses
11:11CGI453: would free up most of the lanes on a memory operation with cache.miss, where NVIDIA has a room for another improvement in theory, so the 4x improvement will end up being minimal boost
11:15CGI453: how SIMD works is that internal hw scheduler has SIMD lanes/waves as SIMD vector of free ones, to schedule according to i-buffer fifo contents, every free vector is eligable to be scheduled by it, the vector composes of per warp backing regs
11:16CGI453: and it is contigous
11:30CGI453: but in the the end it is all ok, it is considered to be solved, though i have not received any collaborating thinkers, it is very easy for me to do still, i just pinged AMD emplyment departement, since i am low on resources, if they'd appreciate my knowledge working on their stack before, if i got negative resonces probably the code will arrive regardless but with a delay
11:35CGI453: what shuffle does is intelligent in a sence that i can target those lane backing regs i.e warp lanes with absolute addressing, in hunks that you need, sort of intercepting the schedulers work if you'd see it this way
11:43mwk: funfunctor: ok, here I am
11:53funfunctor: mwk: hey
11:54funfunctor: So I am RE a Blackmagic Design PCIe capture card
11:54mwk: black magic, eh
11:54funfunctor: the driver is implemented as a C++ library and the ko is just a bunch of shims and callbacks into said library.
11:54mwk: sounds good already
11:54funfunctor: lol :)
11:55funfunctor: the C++ library has debug symbols which is nice but C++ is obviously a heap not stack language so it makes it basically impossible? to just RE that.. So i've gone the iommtrace route
11:56mwk: it's never impossible to RE... how big is that thing?
11:57CGI453: sound only or video too i.e v4l or dvb*?
11:59CGI453: heap and stack procedures are nothing but mmap in the kernel, those are just areas of memory connected to cpu and handled by kernel
11:59CGI453: so obviously it would not make sense much that there is difference between stack and heap languages, since they are handled the same
12:03CGI453: funfunctor: so you would want to sort of get the procedures of debug symbols, there are debuggers for that..you need to step through the code with gdb or lldb!
12:04CGI453: also dissassembler would show you the debug info too
12:04CGI453: disassembler even
12:07funfunctor: mwk: 864K blackmagic.a
12:07funfunctor: CGI453: it does not use v4l2
12:09CGI453: what is the goal of the mission though, getting the code out of dwarf section should be very easy? did you discover some bug that you would want to fix?
12:09funfunctor: CGI453: I want to implement a v4l2 compliant driver
12:10funfunctor: they just ported the windows video api to linux with their one
12:11funfunctor: CGI453: the important thing I would like to be able to extract is register names
12:11CGI453: this is quite tough though i tend to belive, but if you are very talented like those in revenging here, and have lots of time, i dunno maybe you succeed
12:11funfunctor: or rahter a indication of them..
12:12CGI453: i just google a bit, how was it possible to dump the linker sequence and module information
12:12funfunctor: is there a place I can put this .a for you mwk ?
12:12funfunctor: CGI453: show me what you came up with? I myself tries to lift it to LLVM IR
12:12funfunctor: that didn't really work out well
12:13funfunctor: It seemed like far less work to use iommtrace
12:13funfunctor: So that is why I wanted to open a discussion with mwk for advice
12:14CGI453: funfunctor: i remember you can split the a it is an archive into original modules
12:14CGI453: then inspect their linkage with something like nm perhaps
12:15CGI453: and the code might be disasassembled from all of the seperate modules linked together
12:15CGI453: but the llvm IR , is very complex to beginners, there are tools that use debug information and do it automatically
12:16funfunctor: CGI453: your thinking of a userspace archive
12:16funfunctor: $ file blackmagic.a
12:16funfunctor: blackmagic.a: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
12:17funfunctor: CGI453: I was one of the early people working on llvm ;)
12:17funfunctor: I know its IR
12:17funfunctor: That isn't the problem here and I know binutils stuff
12:17CGI453: what is the problem though then?
12:18funfunctor: What I am interested to hear is about are techniques to comprehend registers from iommtraces
12:18CGI453: aah getting IR, well i know reveng did it with debug information with qemu, and there are bunch of new tools
12:19CGI453: aah, ok, it uses pagefault handler in the kernel, and dumps the reads writes into log file , i.e configuration space or mmio accesses , i proposed to do it lot faster, but it is not important
12:20CGI453: just look at the docs, it reiomaps the area and captures the pagefault
12:21mwk: funfunctor: well, you can always mail it
12:25CGI453: page-fault handling requires a page walking, so it is not particularly fast, but with some of pointer hacks it can be done lot faster, using debug registers
12:26CGI453: it will slow down your execution of the program quite immensively, but that is how it is done by nouveau dudes, it generally still works though
12:27CGI453: hate to ask, but when you discover the mmio write and read methods , probably they use same functions to do it, allthough it can be done with pointers too
12:27CGI453: then you can allready see what they send from the code
12:27funfunctor: its plenty fast enough
12:28funfunctor: mwk: emailed
12:29funfunctor: One other idea would be to write a wrapper program that exercises certain paths in this library-driver thing
12:29CGI453: i head off, it can be done, but only if you have skills, and it seems you have had some LLVM ones..interesting to hear about that
12:36CGI453: i think if it is written with pointers one would have to use, normal procedure is to cast it the address written to into a pointer, writing to the address owned by mmio backing reg
12:37funfunctor: CGI453: "written with pointers" what are you talking about?
12:37CGI453: this is like memory address it's phyiscal address is pointed in kernel against the io
12:38CGI453: funfunctor: exactly yeah, just see some stackoverflow stuff, how to write to the memory address
12:39CGI453: char *a = (char*) 0x1000;
12:40CGI453: do you understand what this line does?
12:40funfunctor: CGI453: wtf, I know how to program in C man..
12:41funfunctor: mwk: did that come though ok?
12:43CGI453: so why the mmiotrace, would be convenient but, you only need to inspect the memory map, if there is no api call methods, and they write mmios with pointers
12:43CGI453: so if that is virtual address space memory from the kernels memory map, you know that those values are used to load into those regs
12:44funfunctor: CGI453: do you know how PCI works?
12:44CGI453: debug symbols is giving you c++ code
12:44CGI453: round about yeah
12:45CGI453: as i told you pci regs are ioremapped
12:45funfunctor: CGI453: 1.) C++ is a heap based language not stack
12:45funfunctor: 2.) you don't just set pointers to addresses and deference them.. This isn't AVR firmware
12:46funfunctor: CGI453: yes, you have remapped virtual memory
12:46CGI453: you can , since ioremap is handled in kernel, it normally means that there is a handler for getting mmap addresses against to those
12:46funfunctor: the base is read out of PCI config space and the kernel provides a framework to own that region and do read/write ops with it
12:47CGI453: yes with syscalls, you could map the whole region, and use pointers , no api calls
12:47funfunctor: at the end of the day, you have a C++ library that callback some wrapper C functions that call the usual kernel API for PCI read/write access
12:47funfunctor: syscalls? this isn't userland
12:47funfunctor: I think your very confused
12:48funfunctor: the C++ library *is* the driver, with the hardware specifics *in kernel space*
12:49CGI453: no ioremap is probably a kernel function that gives you kernel virtual address of the memory that goes against io
12:49CGI453: it has a handler for userspace to map this
12:49funfunctor: its a kernel module written in C++ with a C shim around it to massage it into Linux kernel module compliance
12:49CGI453: ok, pretty much the same, what is shim though?
12:50funfunctor: what do you mean what is a shim?
12:53CGI453: ok, how to you communicate with the kernel, stop arguing i know how it is done, sycall = sw interrupt in idt, that is one of the ways to do it, of course there is filesystem way too though
12:53CGI453: so for instance what nouveau and other drivers do is get a memory of the ioremap with mmap handler in the kernel
12:58CGI453: i.e when you touch that memory the kernel writes it to the device, you can do read and write operations via pointers , that is what memory maped io is about
13:00CGI453: it would be more readable if you'd use an api call for all the mmio read and writes though, like nouveau does, it is just then easier to read
13:03funfunctor: I really dont understand what your getting at or how it helps me
13:04funfunctor: iommtraces already tells me what register offset from the PCI BAR is getting read or written to and with what value in a very clear way
13:04funfunctor: why would I try and re-implement that
13:07CGI453: you do not, but you have plain c or cplusplus in the debug section too, this one is even easier to read
13:08funfunctor: frankly I am not following what your point is? what problem are you trying to address in the whole effort, what is the bottom line
13:08CGI453: maybe the best for you would be use to match mmiotrace stuff to the code though
13:09funfunctor: my original question still stands on how to cross reference that with the binary to infer the meaning of the registers
13:10CGI453: funfunctor: aaaah only this question you had, well you can not do that
13:10CGI453: cause the meaning of the regs is not the same as for nouveau in their envytools database
13:10funfunctor: well I can because the offset will be in the binary if its a known register
13:11funfunctor: yea and how were they derived originally?
13:12CGI453: yes yes exactly, only if you have comments in the code what that reg would do, then yeah
13:14CGI453: that is why revenging is really hard work, and i am quite ammused, how mwk does it, the reverseengineer must be so intelligent to match the meaning of those regs himself
13:15CGI453: by evaluating roughly what the hw does according to those writes
13:15funfunctor: CGI453: comments don't survive the compiler
13:16funfunctor: CGI453: its not magic, its just finding the right approach for a particular problem
13:16CGI453: funfunctor: you are right, but the variable names would
13:16funfunctor: my problem is different than for mwk but I am interested to hear about his experience with his problem
13:16funfunctor: CGI453: that depends, not always
13:17funfunctor: and registers are not usually in variables with names
13:17funfunctor: There is usually a #define .. header generated from the hw team
13:17funfunctor: or a doc
13:18pmoreau: funfunctor: IIRC, registers in the beginning came from the old nv driver: https://www.x.org/wiki/nv/. Though it never got 3D acceleration, and targeted old hardware so most registers in rnndb come from RE'ing.
13:18funfunctor: CGI453: so in my case I would be using radare2 to look for the offsets and dissemble around them
13:19funfunctor: pmoreau: ok but I am interested to hear about the exact experience of deriving those registers
13:20funfunctor: I can produce iommtrace's at will and exercise the driver to get somewhat directed traces
13:20pmoreau: I would also be interested in hearing mwk's experience :-)
13:20funfunctor: my question is - put most simply - 'then what'
13:21funfunctor: obviously I have my own ideas
13:21funfunctor: but I would like to hear fresh ones ;)
13:23CGI453: and if you say that it is analogue tv signal card probably, maybe they have some standarts where some of the regs are standartised, however i am not expert in this stuff
13:24mwk: huh, a nice "header"
13:26mwk: and a .a file that's not actually an archive
13:31mwk: funfunctor: well, it looks decompilable in a pinch
13:32mwk: it's C++ with symbol names, which includes type info
13:32mwk: I don't suppose you have access to IDA Pro?
13:33CGI453: yeah but funfunctor: what is the motivation behind getting this driver as v4l2 one since it allready work, i'd personally woun't deal with this sort of issue, maybe i am not as big role model to the others anyways
13:34mwk: if you do, it might be a good idea to just go through it and decompile things... if not, well, doing this by hand will be painful and I'd try the mmiotrace route
13:35CGI453: there are several LLVM ones too, i'd never use commercial software to do those tasks, i do not really also have enough knowledge and trial and run experience on those tools, barely have read how they work
13:54CGI453: funfunctor: i wait until i could get a job, and then maybe i
13:55CGI453: would have more interesting things to offer for you to try to work on
13:59funfunctor: mwk: I _had_ access to IDA Pro and tried it but couldn't get a decent decompile.. Can you?
14:04CGI453: ah i am done, well when you have debug symbols that means you allready have all possible code in dwarf section, my friend does not visit me today, so can you send me the binary archive?
14:32mwk: thank you.
14:34imirkin: skeggsb: pq: perhaps you can add some more people to the chanserv list? i suggest mwk and hakzsam.
14:34imirkin: er wait. pq can't do it. but marcheu can.
14:51funfunctor: imirkin was he just trolling me?
14:51imirkin: not just you, but yes.
14:52imirkin: it's joss... he pops up in graphics channels and annoys the bejesus out of everyone in them.
14:52funfunctor: I felt a bit annoyed by the general communication
14:52imirkin: some channels just ban estonia outright. and web proxies. i didn't think that was the right direction.
14:52funfunctor: juxtaposition of random technical terms into plausible sounding things piss me off :p
14:53imirkin: that's his thing.
14:54funfunctor: ok, glad to know it wasn't just me
14:54funfunctor: I got a pm from a nick called jeasnsf_ so I assume that is his alias as well
14:55imirkin: he changes nicks a lot. pretty easy to tell when it's him though...
14:55funfunctor: maybe just some mental health thing.. any ways..
14:56imirkin: perhaps. i'm in no position to evaluate, and unfortunately, in no position to help. have to preserve my own sanity though.
14:56funfunctor: yea ;) lol
14:56funfunctor: alright so back to RE
14:57funfunctor: mwk: did you have access to IDA Pro on your machine?
14:57funfunctor: I assume you do
15:19mwk: funfunctor: not on mine, but we do have a copy at work
15:20funfunctor: mwk: well I am looking with radare2 if you used that before?
15:21funfunctor: I still think the iommtrace's are probably the better way to go about understanding the hardware?
15:21mwk: yeah, I'd check mmiotrace and see if any of it makes sense
15:24imirkin: funfunctor: the basic idea is to look at the written values and try to recognize what they are. if you know what a value is, that gives a strong indication of what the written register means. often registers aren't placed willy-nilly but in some logical arrangement, so you can use proximity in mmio space as an indicator of functional relationship. etc.
15:31funfunctor: mwk: sure it does https://paste.fedoraproject.org/514889/83064792/ https://paste.fedoraproject.org/514890/64863148/
15:31funfunctor: imirkin the problem is exactly that; I currently have nothing to start with on where these values come from
15:32funfunctor: from the above trace which was from a modprobe I can see there are many read/tweak/write backs
15:35funfunctor: imirkin I am looking for something to extract out the binary that can help me form a mental primer to build a understand of the values
15:36funfunctor: right now I have none
15:37imirkin: do you have software that operates this thing?
15:37imirkin: is the kernel shim open-source?
15:38imirkin: i.e. could you present a hw facade and call the library yourself and see what it does in response to various things?
15:48funfunctor: imirkin I have all those things so I am in a reasonably good position
15:48funfunctor: I am more just trying to collect some ideas not to wast huge volumes of time
16:02funfunctor: imirkin mwk ok now we are talking... https://paste.fedoraproject.org/515100/31137271/
16:08imirkin_: funfunctor: there's basically no way to do RE and not waste huge volumes of time. it's all based on observation. in order to do that you have to generate data :)
16:21funfunctor: imirkin_: well you know what I mean... wast unnecessary time
16:21funfunctor: imirkin_: what do you think this method looked like? https://paste.fedoraproject.org/515104/11483814/
16:21funfunctor: some case statement?
16:26mwk: sounds like a simple load of assignments
16:28funfunctor: ah yes sorry your right - I am just trying to picture the actual C++ though
16:29funfunctor: mwk: those values look like the many of the offsets from the PCI BAR unless I am mistaken in the trace?
16:31mwk: that's likely, yes
16:31mwk: IMO it's just a long function with lots of assignments
16:32funfunctor: mwk: the question is, what is it assigning and are they interesting
16:32funfunctor: notice the method and class names
17:28austriancoder: my brother donated me his old nvidia based gpu and I run into troubles with under linux: https://hastebin.com/ikovusafuw.go
17:30imirkin_: austriancoder: that feels like KDE5, right?
17:31austriancoder: imirkin_: yep.. could that be the problem?
17:31imirkin_: unfortunately nouveau + kde5 plasma shell is a bit of a fail
17:31imirkin_: they use GL concurrently from multiple threads, and nouveau doesn't support that.
17:31imirkin_: i thought that the qt/kde guys put in a workaround of sorts, but i guess i'm not sure
17:32imirkin_: that said, that specific TRAP thing is unfamiliar to me
17:32imirkin_: i don't know what it means without RTFS
17:32austriancoder: bah.. thats quite bad (for a kde guy)
17:33imirkin_: nouveau wasn't really prepared to handle the influx of regular applications all of a sudden starting to do GL
17:34imirkin_: my recommendation would be to run all that stuff with LIBGL_ALWAYS_SOFTWARE=1
17:34imirkin_: and then disabling that for when you *actually* want GL accel
17:34austriancoder: maybe kwin has some configuration option
17:35imirkin_: well, it's not just kwin
17:35imirkin_: it's qt5
17:35imirkin_: i thought they detected nouveau now
17:35imirkin_: and disable GL accel
17:35imirkin_: but i don't use qt or kde, so i'm not super sure
17:36imirkin_: qt/kde developers have never reached out to us about it, and i don't think any nouveau devs have tried talking to the qt/kde folks. and users are stuck in the middle.
17:36imirkin_: that said, there is work ongoing to allow multiple threads to call into nouveau at the same time, but it's a major effort
17:37imirkin_: i also had some hackpatches that kinda-sorta worked around the issue, but then some distros started adding them to their releases, so i had to take them down
17:37austriancoder: for the moment I can life with LIBGL_ALWAYS_SOFTWARE=1 workaround and hope the situation will look better with a newer qt5 version
17:38imirkin_: if you don't need GL accel at all, you can just remove nouveau_dri.so. you'll still get the 2D accel via X
17:40imirkin_: (separately, with kernel 4.10 reclocking on your GPU should be mostly functional, so you should be able to get an appreciable fraction of the perf out of it with nouveau)
17:40austriancoder: from time to time I look at piglit results from that system
17:43imirkin_: hmmm... actually it's unclear that you're hitting that issue
17:43imirkin_: we just don't know about trap & 4
17:43imirkin_: let's see if the gk20a headers have anything about it...
17:45imirkin_: hmm. bit 2 is "PD"
17:45imirkin_: which could mean ... page directory? or primitive dsomething
17:46imirkin_: austriancoder: you could try a drm-next kernel, which should be 4.9 + the drm stuff in 4.10
17:47austriancoder: imirkin_: sure
17:47imirkin_: austriancoder: ben had a bunch of fixes around concurrency which could be affecting you as wel.
17:47imirkin_: since usually the mesa-side concurrency issues present themselves somewhat differently
17:47imirkin_: read/write errors, etc
17:48imirkin_: i'm esp thinking about https://github.com/skeggsb/nouveau/commit/b3816f34944ad4824d345b98c323a30710f492d4
17:49imirkin_: note that this will also bring you atomic modesetting on nouveau
17:51imirkin_: (as well as fixed voltage setting when reclocking, which should allow you to hit the higher core clock rates)
17:54austriancoder: I did not have done any research on how well this gpu is supported as my brother got a new one and I wanted to replace my current one (GT 220 based one) for ages. Whats the general state of GK104?
17:55imirkin_: other than this concurrency thing, it's pretty good, imho
17:55imirkin_: with 4.10, you get almost-always-working manual reclocking to change between power states
17:55imirkin_: with semi-recent mesa, you get GL 4.3 + all the GL 4.4 and 4.5 exts also available, as well as ES 3.1
17:56imirkin_: (and most of AEP / ES 3.2 exts)
17:56imirkin_: the newest games tend not to work for a variety of reasons
17:57imirkin_: [some due to the multithreading mess, as games seem to have started doing this recently too, others due to unknown issues in nouveau]
17:57imirkin_: it's a bit behind radeonsi/i965 in terms of GL-CTS conformance, but that's largely due to me not having access to the GL-CTS tests.
17:58austriancoder: fine.. looks the new card will be useful (for a non game player)
17:58imirkin_: although that GT 220 should have been decently supported as wel
17:58imirkin_: sadly, there are a handful of nv50 bugs i've never been able to identify
18:00austriancoder: the GT 220 has some wired problems under windows (where I do lot of music work like recording, editing and mixing) and that was the reason it to kick it out
18:00imirkin_: ah ok
18:00imirkin_: well the GK104 should be a strict improvement over the GT218 or whatever it was
18:00imirkin_: in terms of speed and in terms of features
18:01imirkin_: you even get semi-working vdpau if you extract the firmware from the blob (there's a script)
18:02imirkin_: [of course you had that on the GT218 as well... see https://nouveau.freedesktop.org/wiki/VideoAcceleration/ ]
18:03austriancoder: will give it a try
18:12imirkin_: mwk: so i need to add support for reading from the framebuffer in nouveau. i'm thinking of attaching it via BIND_TSC2/BIND_TIC2 on fermi. do you think that's a terrible idea?
18:13imirkin_: [and on tesla as well]
18:17imirkin_: such a binding would happen at most once per FB change.
18:17imirkin_: [and ideally never, since use of the feature should be quite rare]
18:28mwk: imirkin_: how do you plan to deal with cache coherency?
18:35imirkin_: mwk: i don't - not needed
18:36imirkin_: mwk: this is for KHR_blend_equation_advanced
18:36imirkin_: you're supposed to call glBlendBarrier() to deal with it
18:36imirkin_: which will just do the regular texture_barrier flush
18:37mwk: I'd throw in a wrcache flush as well
18:37mwk: but sounds good
18:37imirkin_: well, it's entirely analogous to the ARB_texture_barrier situation
18:37imirkin_: which talks about binding the fb as a texture
18:38imirkin_: there's also a _coherent version of that ext, which we will not be implementing.
19:44austriancoder: imirkin_: https://hastebin.com/eviyacihog.hs
19:46imirkin_: austriancoder: huh. good one.
19:46imirkin_: that looks more like the concurrency issues :)
19:47imirkin_: although i thought ben fixed something related... hm
19:47imirkin_: maybe it didn't make it to drm-next
19:47imirkin_: skeggsb: --^
19:47imirkin_: [i assume he's on vacation though]
19:51austriancoder: imirkin_, skeggsb: if there are patches to test just throw them at me :)
19:51imirkin_: austriancoder: i'd just test this tree: https://github.com/skeggsb/nouveau/
19:51imirkin_: it builds against semi-recent upstream kernels
19:52imirkin_: the idea is that you clone it, then "cd drm; make"
19:52imirkin_: which will produce a fresh nouveau.ko for you
19:52imirkin_: but tbh those log messages seem like the usual "mesa messed shit up" things
19:52imirkin_: than your earlier TRAP 0x4 messages
19:54imirkin_: anyways, my time (and desire?) to work on nouveau has been greatly reduced of late, but hopefully skeggsb will be able to help you
20:00austriancoder: imirkin_: no problem...
20:01austriancoder: imirkin_: did you found more interesting areas to work in mesa land or something completely new?
20:02imirkin_: eh, i've been hacking a bit on swr, will probably hack some on freedreno later if i ever get the a5xx board...
20:04austriancoder: yeah have seen your swr patches
20:04imirkin_: i need to go back and finish my xfb work on swr...
20:06airlied: imirkin_: vulkan a5xx driver :-)
20:06imirkin_: airlied: more like work on a nouveau vk driver, dunno
20:09imirkin_: airlied: or maybe i'll go back to trying to fix nv30
20:10imirkin_: airlied: a4xx/a5xx is missing too many features before a reasonable vk driver can be made... no ssbo, no images, etc
20:11imirkin_: i'd rather focus on ES 3.1 on there.
20:12imirkin_: i was originally stymied by ttn, but now that rob has made a direct nir passthrough, it may be time to look at it again.
20:17airlied: indeed looking at qualcomm a530 seems like a pretty small vulkan driver would be required :)
20:17airlied: means you'd be finished in a week or two :-P
20:23imirkin_: well still have to figure out how all the memory/image ops work
20:38mooch: mwk: you forgot pgraph_class_kelvin,cc
21:05mwk: mooch: right, thanks
21:06mwk: alright, so... all 2d tests passing on Kelvin
21:06mwk: guess nothing changed...
21:07imirkin_: are you testing blits and so on?
21:08mwk: not yet, non-drawing methods only so far
21:08mwk: except a few tests on NV
21:08imirkin_: ah k
21:08imirkin_: those should be ... fun
21:08mwk: oh hell yes.
21:09mwk: I think I'm going to try drawing something on NV4 soon
21:09imirkin_: even something like SIFC
21:09mwk: are you kidding me? *IFC are probably the most complex objects there are
21:09mwk: well, the most complex 2d objects
21:09mwk: I already attempted NV1 IFC and gave up after a week or so
21:09imirkin_: i guess with the scaling
21:10mwk: and that one doesn't scale
21:10imirkin_: were you at least close?
21:10mwk: I thought so.
21:10imirkin_: i mean ... presumably if you at least use it as it's intended it should be easy to model
21:10mwk: NV1 IFC has two possible rasterization algorithms, chosen depending on the width
21:11mwk: I *thought* I figured out one of them
21:11imirkin_: what's it rasterizing?
21:11mwk: but then I improved my random state generator, and lots of test failures resulted
21:11mwk: pixels, obviously
21:11imirkin_: what geometric shape?
21:11mwk: I don't know about SIFC, but IFC draws stripes
21:12imirkin_: ultimately it's just a rectangle, no?
21:12mwk: suppose you're submitting Y8 data
21:12mwk: uh, no
21:12mwk: that's the problem, it's *not* a rectangle the way it's drawn
21:12mwk: remember that every submitted data word is effectively a single draw call
21:13imirkin_:could never remember what all that Y8/etc stuff was tbh
21:13mwk: so, if you have Y8 format, this means 4 pixels per data word
21:13mwk: oh, Y8 is just 8-bit single-channel format
21:13mwk: I think GL calls it L8
21:13mooch: 8 bits of intensity
21:13imirkin_: or I8, depending on what's in alpha
21:14mwk: anyhow, 3x3 square, Y8 format, you're submitting the second word. what you're drawing is:
21:14imirkin_: L = x,x,x,1, I = x,x,x,x
21:14mwk: that shape is not a rectangle.
21:14mwk: so what IFC really draws is stripes
21:14imirkin_: but it still has to go sequentially inside the overall rectangle
21:15imirkin_: otherwise that makes no sense
21:15imirkin_: so you know your bounds, you know your raster position, you know the input format
21:16mwk: oh yes, that would be easy, but the point of hwtest is to find how the hardware works :p
21:16imirkin_: i just have a hard time imagining that it works any other way
21:16mwk: so I'm launching IFC draws for all kinds of inconsistent inputs
21:16imirkin_:has a limited imagination
21:16mwk: if it worked that way, it wouldn't be lockupable :)
21:17imirkin_: ok, but your lockups are from you specifying all kinds of bs inputs right?
21:17imirkin_: if you use it in "normal" ways, it's fully predictable
21:17mwk: well, the lockups are predictable too :)
21:17mwk: but yes
21:18imirkin_:would just stop doing that in hwtests and move on
21:18imirkin_: i.e. force the hwtests to only do things that make sense
21:18imirkin_: i get what you're doing... and it's cool... but you're gonna go nuts, if you haven't already
21:19mwk: no comment on that last one.
21:19mooch: but i'm using his hwtests for my emulation!
21:19mooch: I WANNA BE ABLE TO EMULATE THE LOCKUPS
21:20mwk: and FWIW I am rejecting a lot of things that are really hopeless
21:21mwk: lots of these in 2d ROPs... try enabling blending with Y8 format, or use an invalid pattern shape
21:21imirkin_: it's like all these people who were reporting that doing X or Y in nouveau causes a lockup, and it's a security issue that userspace can lock up the box
21:21mwk: well tbh.... it is a security issue
21:21imirkin_: uh huh
21:22imirkin_: i agree
21:22mwk: it just happens not to be fixable
21:22imirkin_: but the fact is that nouveau uses the driver in pretty much the only way that doesn't always lock up the box
21:22imirkin_: whereas pretty much everything else does. and sometimes nouveau too.
21:23imirkin_: i.e. locking up the box with nouveau isn't some huge feat. so treating it as a security issue that if you allocate some gynormous texture then things go bad ... i say, wtvr
21:23imirkin_: same deal with the hw tests - sure it'd be nice to properly emulate the nonsensical inputs one can provide
21:23imirkin_: but imho it's worth focusing on the sensical ones.
21:24mwk: sure, and I do that where it makes sense
21:24imirkin_: ok cool :)
21:24mwk: but my default is to test all possible inputs
21:24imirkin_: fair enough
21:25mwk: IFC just happens to be quite damn complex, and treating it as a blackbox would require me to use a completely different testing framework
21:25mwk: and I'd rather avoid *that*
21:25imirkin_: hehe fair enough
21:27mwk: and IFC is the biggest 2d mess there is, together with its variants
21:27mwk: on NV10, it has like 4 different rasterization algorithms
21:28mwk: oh, btw
21:28mwk: hwtests already found a funny feature of the IFC
21:28nyef: So, I *finally* managed to get enough stuff installed into my test environment that I can load the blob, start X, suspend-to-ram, resume, have a working display, and use mmiotrace. Bloody Gentoo, bloody NFS, bloody nVidia. /-:
21:29mwk: with the IFC_NV0 and TFC_NV0 classes, you have to upload things in 8-byte units
21:29mwk: if you submit an unpaired word, it'll go ignored
21:29nyef: Now I just need to get an actual decent trace.
21:29imirkin_: nyef: in case it's not painfully obvious, make a mmiotrace of what post-resume nvidia does
21:29mooch: how the fuck are you supposed to do that on a 32-bit bus?
21:29imirkin_: nyef: i think you can do this - load nvidia; unload nvidia; suspend/resume; start mmiotrace; load nvidia
21:30nyef: I don't know that that works. If I suspend without having X running it doesn't resume properly.
21:30mwk: mooch: in two moves, obviously
21:30imirkin_: (make sure to start X as well, since nvidia doesn't actually do anything on load)
21:30mwk: the thing is, you can't stop the upload after an odd amount of words
21:31nyef: Hrm... So, grab a pre-suspend trace of starting X, then grab a post-suspend trace of starting X, and compare the two?
21:31imirkin_: nyef: more like compare the post-suspend one to what nouveau does.
21:32imirkin_: nyef: to see how it gets the panel going again
21:33mwk: TFC is even more restricted, in that the width of the rectangle and the dst position both have to be multiples of 8 for some reason
21:33mwk: but TFC is a weird class and I still have no idea why it exists
21:33nyef: imirkin_: That too, but comparing what it does when the panel is already going to what it does when the panel isn't going would also be informative.
21:34imirkin_: nyef: sure, but might be difficult to actually do that
21:34nyef: The first trick is going to be getting the log buffer large enough that it doesn't drop events, of course, but I have the instructions for that already.
21:36mwk: that's one of the great unanswered questions for hwtest, btw
21:36mwk: figure out wtf is TFC and why it exists
21:37mwk: up until recently, it just looked like IFC with all interesting bits cut out
21:37mwk: but then I learned it triggers a different (and new on NV10) rasterization algorithm
22:34pmoreau: Meh… I messed up the passing of arguments to a function call, and the retrieving of its result.
23:45nyef: Lovely. A 346M trace log.
23:45imirkin_: that's about right
23:45imirkin_: xz -9 is nice.
23:45nyef: ... Which I now get to copy over NFS, because it's too slow to look at it remotely.